Home All Groups Group Topic Archive Search About

Preventing the loading of duplicate data to a table – Best Option

Author
1 Sep 2005 9:29 PM
SJM
Hello.

I have a situation where the potential exists for duplicate data to be
loaded to a table and I need to prevent this from occurring.

The process starts with a file being created on a mainframe system and
downloaded to SQL Server. If everything was going correctly, the data
would only have new data with a unique primary key of two columns. The
primary key is presently enforcing the uniqueness business requirement.
This could have also been done with a unique constraint, but creating
the key was chosen instead.

The desired result is that if any duplicate data were to be discovered
during the load, that the whole load fail and be rolled back.

I am interested in opinions on the best way to achieve this desired
result and comments on the existing method.


*** Sent via Developersdex http://www.developersdex.com ***

Author
1 Sep 2005 9:41 PM
--CELKO--
Start at the file level; sort it, and scrub it with a 3GL program
before you load it.   I would also look into Sunopsis.  This is an ELT
tool -- it uses native SQL tools to move data rather than adding yet
another ETL language on top of everything.
Author
2 Sep 2005 4:37 PM
SJM
Thanks, I appreciate the tip; however, the constraints of the project
require that this be done in SQL Server. I am interested to know if this
(Primary Key or Unique Constraint) is the best way to do this within the
restrictions that are in place.


*** Sent via Developersdex http://www.developersdex.com ***
Author
2 Sep 2005 5:02 PM
David Portas
A PK or UNIQUE constraint is the obvious way to generate the exception
condition. It's probably useful also to report the invalid data so you
may want to load to a staging table, without the constraint, and then
write a query using HAVING COUNT(*)>1 to find the duplicate rows.

--
David Portas
SQL Server MVP
--

AddThis Social Bookmark Button