|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Preventing the loading of duplicate data to a table – Best OptionI have a situation where the potential exists for duplicate data to be loaded to a table and I need to prevent this from occurring. The process starts with a file being created on a mainframe system and downloaded to SQL Server. If everything was going correctly, the data would only have new data with a unique primary key of two columns. The primary key is presently enforcing the uniqueness business requirement. This could have also been done with a unique constraint, but creating the key was chosen instead. The desired result is that if any duplicate data were to be discovered during the load, that the whole load fail and be rolled back. I am interested in opinions on the best way to achieve this desired result and comments on the existing method. *** Sent via Developersdex http://www.developersdex.com *** Start at the file level; sort it, and scrub it with a 3GL program
before you load it. I would also look into Sunopsis. This is an ELT tool -- it uses native SQL tools to move data rather than adding yet another ETL language on top of everything. Thanks, I appreciate the tip; however, the constraints of the project
require that this be done in SQL Server. I am interested to know if this (Primary Key or Unique Constraint) is the best way to do this within the restrictions that are in place. *** Sent via Developersdex http://www.developersdex.com *** A PK or UNIQUE constraint is the obvious way to generate the exception
condition. It's probably useful also to report the invalid data so you may want to load to a staging table, without the constraint, and then write a query using HAVING COUNT(*)>1 to find the duplicate rows. -- David Portas SQL Server MVP -- |
|||||||||||||||||||||||