Home All Groups Group Topic Archive Search About
Author
2 Mar 2006 6:21 PM
brenenger via SQLMonster.com
I am importing data from a .csv. I have noticed that it copies this
information over and over when I import the data. I only want it to update
anything that is different and add any new lines in the .csv file. I have
looked at setting up a primary key but this won't work.

ProjectID, Phase, Unit,Tract, Release, UnitPlan, UnitOpt

As long as one of these fields is different I want it to allow it to be
entered. My problem is some of them may be null. Which is ok.

KB5IR,10,405,,,4516,,
KB5IR,10,406,,4516,REV,

I am not sure how to go about doing this!


Author
2 Mar 2006 6:01 PM
David Portas
brenenger via SQLMonster.com wrote:
Show quote
> I am importing data from a .csv. I have noticed that it copies this
> information over and over when I import the data. I only want it to update
> anything that is different and add any new lines in the .csv file. I have
> looked at setting up a primary key but this won't work.
>
> ProjectID, Phase, Unit,Tract, Release, UnitPlan, UnitOpt
>
> As long as one of these fields is different I want it to allow it to be
> entered. My problem is some of them may be null. Which is ok.
>
> KB5IR,10,405,,,4516,,
> KB5IR,10,406,,4516,REV,
>
> I am not sure how to go about doing this!
>
> --
> Message posted via SQLMonster.com
> http://www.sqlmonster.com/Uwe/Forums.aspx/sql-server-programming/200603/1

Do you have an existing data model or are you designing one? Don't make
the mistake of trying to mimic a file in a table. Relational design
principles and file storage are not the same!

Assuming you have or create a suitably normalized data model there are
basically 2 possible approaches:

1. Transform the file as you load it to the normalized schema (using
DTS or Integration Services or some other tool for example)

2. Load the file as-is to an intermediate "staging" table and then
transform the data to your real data model using SQL. The staging table
isn't used for anything other than the load process and the data is
usually deleted some time afterwards.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
Author
2 Mar 2006 6:35 PM
tonysell@nospam.nospam
Here are a couple of options to look at-

use of a unique index with the ignore dupe option (it will just not enter
the data for duplicate insert attempts) .  This would not work with the
updates you mention however.   Another option would be to import into a load
table then use a procedure containing the merge logic to update/insert as
appropriate.

HTH
--Tony

Show quote
"brenenger via SQLMonster.com" wrote:

> I am importing data from a .csv. I have noticed that it copies this
> information over and over when I import the data. I only want it to update
> anything that is different and add any new lines in the .csv file. I have
> looked at setting up a primary key but this won't work.
>
> ProjectID, Phase, Unit,Tract, Release, UnitPlan, UnitOpt
>
> As long as one of these fields is different I want it to allow it to be
> entered. My problem is some of them may be null. Which is ok.
>
> KB5IR,10,405,,,4516,,
> KB5IR,10,406,,4516,REV,
>
> I am not sure how to go about doing this!
>
> --
> Message posted via SQLMonster.com
> http://www.sqlmonster.com/Uwe/Forums.aspx/sql-server-programming/200603/1
>
Author
3 Mar 2006 6:51 PM
brenenger via SQLMonster.com
Hi everyone. I am sorry about this. I am VERY new to SQL and I am having a
hard time trying to figure out what to do. I got Access configured to link to
SQL Server and I think I'll be able to get the rest of this figured out. My
only problem now is the import. I need to UPDATE/INSERT the .csv information.
Here is the transformation that I got help creating.

Function Main()

IF DTSSource("Col001") <>0 Then
       ' add a 0 to the front, and keep only the 6 right characters.
'21706  ->  021706  ->  021706
      '100106  -> 0100106  -> 100106

        str = DTSSource("Col001")
                 iMo = CInt( Mid( str, 1, 1) )
                 iDay = CInt( Mid(  str, 2, 2) )
                 iYear = CInt( Mid(  str, 4, 2) )
                DTSDestination("DelDate") = DateSerial( iYear, iMo, iDay)
Else
'This will not return a value for the date
'which means it will be NULL if the col001 = "0"

'If you want to specify  a date then uncomment the next line
'DTSDestination("DelDate") = "19000101"

End If


        DTSDestination("ProjectID") = DTSSource("Col002")
        DTSDestination("Phase") = DTSSource("Col003")
        DTSDestination("Unit") = DTSSource("Col004")

If DTSSource("Col005")  = Null Then
DTSDestination ("Tract") = " "
Else
End If

If DTSSource("Col006")  = Null Then
DTSDestination ("Release") = " "
Else
End IF
        DTSDestination("UnitPlan") = DTSSource("Col007")
        DTSDestination("UnitOpt") = DTSSource("Col008")
        DTSDestination("POComp") = DTSSource("Col009")
        DTSDestination("PrjFrm") = DTSSource("Col010")
        DTSDestination("OrderNo") = DTSSource("Col011")
        DTSDestination("OrderStat") = DTSSource("Col012")
        DTSDestination("Boxes") = DTSSource("Col013")
        Main = DTSTransformStat_OK
End Function

I cannot create a PK because I can have null values. As long as ProjectID,
Phase, Unit, Tract, Release, UnitPlan, UnitOpt are unique all together it
should be ok. Here is an example of what my .csv looks like. I need to be
able to import this periodically. I would like for it to update anything that
has changed and insert anything that is missing. The way it is now, it just
adds the import to the table. So I could have this stuff listed over and over.
Which I don't want.


21706    KBMP   NEW  200  2031                    Y  Y    64149    SCHED    17
21706    KBMP   NEW  201  2031                    Y  Y    64150    SCHED    8
21706    KBMP   NEW  201  2031  OPT04    Y  Y    64151    SCHED    13
21706    KBMP   NEW  201 2031   OPT136    Y  Y    64151    SCHED    0
21706    KBMP   NEW  201 2031   OPT142    Y  Y    64151    SCHED    0
21706    KBMP   NEW  201 2031   OPT143    Y  Y    64151    SCHED    0
21706    KBMP   NEW  201 2031   OPT144    Y  Y    64151    SCHED    0

Author
3 Mar 2006 7:36 PM
David Portas
brenenger via SQLMonster.com wrote:
Show quote
> Hi everyone. I am sorry about this. I am VERY new to SQL and I am having a
> hard time trying to figure out what to do. I got Access configured to link to
> SQL Server and I think I'll be able to get the rest of this figured out. My
> only problem now is the import. I need to UPDATE/INSERT the .csv information.
> Here is the transformation that I got help creating.
>
> Function Main()
>
> IF DTSSource("Col001") <>0 Then
>        ' add a 0 to the front, and keep only the 6 right characters.
> '21706  ->  021706  ->  021706
>       '100106  -> 0100106  -> 100106
>
>         str = DTSSource("Col001")
>                  iMo = CInt( Mid( str, 1, 1) )
>                  iDay = CInt( Mid(  str, 2, 2) )
>                  iYear = CInt( Mid(  str, 4, 2) )
>                 DTSDestination("DelDate") = DateSerial( iYear, iMo, iDay)
> Else
> 'This will not return a value for the date
> 'which means it will be NULL if the col001 = "0"
>
> 'If you want to specify  a date then uncomment the next line
> 'DTSDestination("DelDate") = "19000101"
>
> End If
>
>
>         DTSDestination("ProjectID") = DTSSource("Col002")
>         DTSDestination("Phase") = DTSSource("Col003")
>         DTSDestination("Unit") = DTSSource("Col004")
>
> If DTSSource("Col005")  = Null Then
> DTSDestination ("Tract") = " "
> Else
> End If
>
> If DTSSource("Col006")  = Null Then
> DTSDestination ("Release") = " "
> Else
> End IF
>         DTSDestination("UnitPlan") = DTSSource("Col007")
>         DTSDestination("UnitOpt") = DTSSource("Col008")
>         DTSDestination("POComp") = DTSSource("Col009")
>         DTSDestination("PrjFrm") = DTSSource("Col010")
>         DTSDestination("OrderNo") = DTSSource("Col011")
>         DTSDestination("OrderStat") = DTSSource("Col012")
>         DTSDestination("Boxes") = DTSSource("Col013")
>         Main = DTSTransformStat_OK
> End Function
>
> I cannot create a PK because I can have null values. As long as ProjectID,
> Phase, Unit, Tract, Release, UnitPlan, UnitOpt are unique all together it
> should be ok. Here is an example of what my .csv looks like. I need to be
> able to import this periodically. I would like for it to update anything that
> has changed and insert anything that is missing. The way it is now, it just
> adds the import to the table. So I could have this stuff listed over and over.
> Which I don't want.
>
>
> 21706    KBMP   NEW  200  2031                    Y  Y    64149    SCHED    17
> 21706    KBMP   NEW  201  2031                    Y  Y    64150    SCHED    8
> 21706    KBMP   NEW  201  2031  OPT04    Y  Y    64151    SCHED    13
> 21706    KBMP   NEW  201 2031   OPT136    Y  Y    64151    SCHED    0
> 21706    KBMP   NEW  201 2031   OPT142    Y  Y    64151    SCHED    0
> 21706    KBMP   NEW  201 2031   OPT143    Y  Y    64151    SCHED    0
> 21706    KBMP   NEW  201 2031   OPT144    Y  Y    64151    SCHED    0
>
> --
> Message posted via SQLMonster.com
> http://www.sqlmonster.com/Uwe/Forums.aspx/sql-server-programming/200603/1

I think the easiest approach will be to load this data to a temporary
staging table that matches the file structure. Then use UPDATE and
INSERT statements to load the data into actual table(s) in your
database. In the staging table you can use a file name, date and/or row
number as the key.

You will find it VERY hard and maybe even impossible to maintain the
integrity of the changing data unless you first implement the correct
data model with keys in each table. I can't help you to do that just
based on a list of column names and a snapshot of your data file. The
data model should be based on your business rules and knowledge of your
business environment. There is little point in basing your data model
on the format that happens to have been used in this file.

If you don't already know about relational design principles like
Normalization and the normal forms then you should master those
concepts before you attempt a final design.

On the other hand if you don't care about design right now and just
want to see some data in a table then you could create a unique index
on all columns using the IGNORE_DUP_KEY option. That will eliminate any
duplicates but won't help you any further than that:

CREATE UNIQUE NONCLUSTERED INDEX idx_tbl
ON tbl (col1, col2, col3)
WITH IGNORE_DUP_KEY ;

IMPORTANT: I do NOT recommend this for a live production environment.

Hope this helps.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
Author
8 Mar 2006 7:11 PM
brenenger via SQLMonster.com
Ok. I am importing a .csv file in it's own table. I want this table to be
overwritten each time it is updated. I am using DTS to import. Once this
table is populated, I need to run a Stored Procedure that will look at that
table and tblUnitImport. This is the code I have so far, I am not sure if I
am doing this right. If the row is not listed in tblUnitImport then I need to
INSERT it, otherwise I need to update the DELDATE and/or ORDERSTAT field if
it is different in m#ds001o1. Am I on the right track?

BEGIN INSERT INTO tblUnitImport
                                      (Deldate, ProjectID, Phase, Unit, Tract,
Release, UnitPlan, UnitOpt, POComp, PrjFrm, OrderNo, OrderStat, Boxes)
               SELECT      Col001, Col002, Col003,Col004, Col005, Col006,
Col007, Col008, Col009, Col010, Col011, Col012, Col013
               FROM          m#ds001o1 AS M
               WHERE      NOT EXISTS
                                          (SELECT     *
                                            FROM        tblUnitImport
                                            WHERE      tblUnitImport.
ProjectId = m#ds001o1.Col002
                                                              tblUnitImport.
Phase = m#ds001o1.Col003
                                                              tblUnitImport.
Unit = m#ds001o1.Col004
                                                              tblUnitImport.
Tract = m#ds001o1.Col005
                                                              tblUnitImport.
Release = m#ds001o1.Col006
                                                              tblUnitImport.
UnitPlan = m#ds001o1.Col007
                                                              tblUnitImport.
UnitOpt = m#ds001o1.Col008);
                                          UPDATE    tblUnitImport
                                           SET              col1 =
                                                                      (SELECT
col001, col012
                                                                        FROM
m#ds001o1
                                                                        WHERE
tblUnitImport.ProjectId = m#ds001o1.Col002

tblUnitImport.Phase = m#ds001o1.Col003

tblUnitImport.Unit = m#ds001o1.Col004

tblUnitImport.Tract = m#ds001o1.Col005

tblUnitImport.Release = m#ds001o1.Col006

tblUnitImport.UnitPlan = m#ds001o1.Col007

tblUnitImport.UnitOpt = m#ds001o1.Col008)


                                           WHERE     EXISTS
                                                                      (SELECT
*
                                                                        FROM
Foobar
                                                                        WHERE
Foobar.keycol = Merge_table.keycol); END;

--
Message posted via http://www.sqlmonster.com

AddThis Social Bookmark Button