Home All Groups Group Topic Archive Search About

Inserting Non-duplicate rows?

Author
18 Aug 2005 5:11 PM
Rich
say tbl1 is a data pickup table and tbl2 is the archive table.  tbl1 and tbl2
have almost the exact same structure (except tbl2 has an identiy column and
tbl1 does not) - these are denormalized tables - to ensure that all the data
is retrieved - errors/duplicates and everything (cleanup after the fact). 

I appologize in advance if this scenario sounds too amateur - and I am open
to suggestions.

tbl1 picks up about 10,000 records a day and archives them to tbl2.  But
only 5000 of the 10,000 records are new - just all mixed in.  The key can be
comprised of IDcol, NameCol, DateCol.  These 3 fields would make a record
unique.   Note: there are duplicate IDcol values in this data and duplicate
everything else.  But these 3 fields can uniquely identify a record (this is
not actual - just conceptual - thus no DDL).  So I want to insert only the
5000 new records into tbl2 from the 10,000 records in tbl1 (the old records
of the 10,000 are already in tbl2).   What would the insert statement look
like?  Here is my pseudo code

Insert Into tbl2 Select IDcol, NameCol, DateCol, col1, col2, col3 from tbl1
Where tbl1.IDcol <> tbl2.IDcol And tbl1.NameCol <> tbl2.NameCol And
tlb1.DateCol <> tbl2.DateCol

My other option is to insert all the records from tbl1 into tbl2 and then
search for duplicates and remove the duplicates.

Thanks,
Rich

Author
18 Aug 2005 5:16 PM
Itzik Ben-Gan
Rich, try,

INSERT INTO Target(col_list)
  SELECT col_list FROM Source AS S
  WHERE filter_expression
    AND NOT EXISTS
      (SELECT * FROM Target AS T WHERE T.key1 = S.key1 AND T.key2 = S.key2
AND etc.);

--
BG, SQL Server MVP
www.SolidQualityLearning.com


Show quote
"Rich" <R***@discussions.microsoft.com> wrote in message
news:0BEDAF58-6713-43AD-A8A4-93D28922754E@microsoft.com...
> say tbl1 is a data pickup table and tbl2 is the archive table.  tbl1 and
> tbl2
> have almost the exact same structure (except tbl2 has an identiy column
> and
> tbl1 does not) - these are denormalized tables - to ensure that all the
> data
> is retrieved - errors/duplicates and everything (cleanup after the fact).
>
> I appologize in advance if this scenario sounds too amateur - and I am
> open
> to suggestions.
>
> tbl1 picks up about 10,000 records a day and archives them to tbl2.  But
> only 5000 of the 10,000 records are new - just all mixed in.  The key can
> be
> comprised of IDcol, NameCol, DateCol.  These 3 fields would make a record
> unique.   Note: there are duplicate IDcol values in this data and
> duplicate
> everything else.  But these 3 fields can uniquely identify a record (this
> is
> not actual - just conceptual - thus no DDL).  So I want to insert only the
> 5000 new records into tbl2 from the 10,000 records in tbl1 (the old
> records
> of the 10,000 are already in tbl2).   What would the insert statement look
> like?  Here is my pseudo code
>
> Insert Into tbl2 Select IDcol, NameCol, DateCol, col1, col2, col3 from
> tbl1
> Where tbl1.IDcol <> tbl2.IDcol And tbl1.NameCol <> tbl2.NameCol And
> tlb1.DateCol <> tbl2.DateCol
>
> My other option is to insert all the records from tbl1 into tbl2 and then
> search for duplicates and remove the duplicates.
>
> Thanks,
> Rich
Author
18 Aug 2005 6:02 PM
AK
also look up IGNORE_DUP_KEY option of CREATE INDEX statement
Author
18 Aug 2005 6:11 PM
Rich
Thank you all for your replies.  I wil try each suggestion.

Show quote
"Rich" wrote:

> say tbl1 is a data pickup table and tbl2 is the archive table.  tbl1 and tbl2
> have almost the exact same structure (except tbl2 has an identiy column and
> tbl1 does not) - these are denormalized tables - to ensure that all the data
> is retrieved - errors/duplicates and everything (cleanup after the fact). 
>
> I appologize in advance if this scenario sounds too amateur - and I am open
> to suggestions.
>
> tbl1 picks up about 10,000 records a day and archives them to tbl2.  But
> only 5000 of the 10,000 records are new - just all mixed in.  The key can be
> comprised of IDcol, NameCol, DateCol.  These 3 fields would make a record
> unique.   Note: there are duplicate IDcol values in this data and duplicate
> everything else.  But these 3 fields can uniquely identify a record (this is
> not actual - just conceptual - thus no DDL).  So I want to insert only the
> 5000 new records into tbl2 from the 10,000 records in tbl1 (the old records
> of the 10,000 are already in tbl2).   What would the insert statement look
> like?  Here is my pseudo code
>
> Insert Into tbl2 Select IDcol, NameCol, DateCol, col1, col2, col3 from tbl1
> Where tbl1.IDcol <> tbl2.IDcol And tbl1.NameCol <> tbl2.NameCol And
> tlb1.DateCol <> tbl2.DateCol
>
> My other option is to insert all the records from tbl1 into tbl2 and then
> search for duplicates and remove the duplicates.
>
> Thanks,
> Rich

AddThis Social Bookmark Button