Home All Groups Group Topic Archive Search About

Fastest way to translate data for normalization in MS SQL

Author
3 Aug 2006 8:18 PM
sdwebguy
I am running SQL 2005.

I am importing data from a 3rd party that contains categories and
subcategories. The data looks like this (the :: are just for visual
purposes, representing the different columns in my table):

categoryPathID :: categoryLevel :: categoryName :: categoryPath
1 :: Cat1 :: California :: California
1 :: Cat2 :: San Diego :: California / San Diego
1 :: Cat3 :: Downtown :: California / San Diego / Downtown
1 :: Cat4 :: Gaslamp :: California / San Diego / Downtown / Gaslamp
2 :: Cat1 :: California :: California
2 :: Cat2 :: Inland Empire :: California / Inland Empire
2 :: Cat3 :: San Bernardino :: California / Inland Empire / San
Bernardino
2 :: Cat4 :: University Heights :: California / Inland Empire / San
Bernardino / University Heights

I want to transform this data into the following format:
categoryID :: categoryName :: parentCategoryID
1 :: California :: NULL
2 :: San Diego :: 1
3 :: Downtown :: 2
4 :: Gaslamp :: 3
5 :: Inland Empire :: 1
6 :: San Bernardino :: 5
7 :: University Heights :: 6

The solution I have in place now works by looping over all the
categoryPathIDs, a subloop over each category level with that pathID,
then inserting the data in the new table. It works, but because I have
70,000 records in the original data, it can take up to 45 minutes to
run.

Is there a more elegant way to translate this data that might speed up
the process a bit?

Author
3 Aug 2006 8:32 PM
Stu
How are you importing/translating it?  Are you using SSIS, or are just
dumping it into a table and then running SQL against it?

I don't have a lot of experience with SSIS, but in DTS you could do
this sort of thing using an inline VBScript transformatin, and it would
be relatively quick.

Stu

sdwebguy wrote:
Show quote
> I am running SQL 2005.
>
> I am importing data from a 3rd party that contains categories and
> subcategories. The data looks like this (the :: are just for visual
> purposes, representing the different columns in my table):
>
> categoryPathID :: categoryLevel :: categoryName :: categoryPath
> 1 :: Cat1 :: California :: California
> 1 :: Cat2 :: San Diego :: California / San Diego
> 1 :: Cat3 :: Downtown :: California / San Diego / Downtown
> 1 :: Cat4 :: Gaslamp :: California / San Diego / Downtown / Gaslamp
> 2 :: Cat1 :: California :: California
> 2 :: Cat2 :: Inland Empire :: California / Inland Empire
> 2 :: Cat3 :: San Bernardino :: California / Inland Empire / San
> Bernardino
> 2 :: Cat4 :: University Heights :: California / Inland Empire / San
> Bernardino / University Heights
>
> I want to transform this data into the following format:
> categoryID :: categoryName :: parentCategoryID
> 1 :: California :: NULL
> 2 :: San Diego :: 1
> 3 :: Downtown :: 2
> 4 :: Gaslamp :: 3
> 5 :: Inland Empire :: 1
> 6 :: San Bernardino :: 5
> 7 :: University Heights :: 6
>
> The solution I have in place now works by looping over all the
> categoryPathIDs, a subloop over each category level with that pathID,
> then inserting the data in the new table. It works, but because I have
> 70,000 records in the original data, it can take up to 45 minutes to
> run.
>
> Is there a more elegant way to translate this data that might speed up
> the process a bit?
Author
3 Aug 2006 8:44 PM
sdwebguy
The original data comes to me in a flat file. I use bcp to import it
into my SQL table and that process is very quick (a few seconds). I
didnt use SSIS because it was easier for me to customize the bcp
command on the fly as needed (if the filename changes, or the location
of the file, or the format of the file, etc).

How would the VCScript transformation work using DTS? I might be able
to get the idea of the transformation and recode it somehow in T-SQL?

Tim


Stu wrote:
Show quote
> How are you importing/translating it?  Are you using SSIS, or are just
> dumping it into a table and then running SQL against it?
>
> I don't have a lot of experience with SSIS, but in DTS you could do
> this sort of thing using an inline VBScript transformatin, and it would
> be relatively quick.
>
> Stu
>

AddThis Social Bookmark Button