Home All Groups Group Topic Archive Search About

Locating Data Discrepencies

Author
5 Sep 2006 9:47 PM
jmancuso via SQLMonster.com
I think this will be my first time posting on SQL Monster, I've heard only
good things though and was hoping that before i dive into a project I may
gain some advice on the task at hand.

What I'm trying to do is develop some sort of exception report which isolates
inconsistent data between two or more similar records

i.e.
I have two records, let's assume the following fields for each record.

id, ImportDate, TargetValue, value1, value2, value3, value4, and value5

Sample data is as such:
ABC, 9/5/2006, 100, A, B, C, D, E
ABC, 9/1/2006, 101, A, C, C, D, E

the fields value1:Value5 impact the difference between the field TargetValue

Understanding this, looking at field Value2 one can quickly determine the
difference here is what caused the change between the TargetValue values
between 9/1/2006 and 9/5/2006.

With this example isolating the differences is not that difficult and I could
probably handle in a fairly straight forward procedure, most likely joining
on each of the values until there is a variance.

However, it isn't this easy as I'm going to have to isolate differences
between records on nearly a 150 fields on up to thousands of records. 

Ideally I would like to create (don't know that i can do this exclusively in
SQL or not, thus the question) a report that will identify any area where the
TargetValue is different between two similar records as well as only the
fields where the value(s) responsible for the change are reported. 

Any ideas, is this feasable?

Any help or ideas are greatly appreciated.

Thanks.


Author
9 Sep 2006 9:45 PM
Hugo Kornelis
On Tue, 05 Sep 2006 21:47:09 GMT, jmancuso via SQLMonster.com wrote:

(snip)
>With this example isolating the differences is not that difficult and I could
>probably handle in a fairly straight forward procedure, most likely joining
>on each of the values until there is a variance.
>
>However, it isn't this easy as I'm going to have to isolate differences
>between records on nearly a 150 fields on up to thousands of records. 

Hi jmancuso,

If you have 150 columns, you'll have to consider seriously if your table
design is normalized. Note that I'm not saying it's not: there are
examples of normalized tables with lots of columns. For instance a
complete physical might inculde measurements of length, weight, heart
rate, blood pressure, lung capacity, hemoglobin, etc, etc. All different
attributes that can all be combined into a single table.

But even though such situations do exist, I'm afraid that the majority
of 50+-column tables are just bad design. If, for example, the 150
columns are 5 different measurements taken on each day of the month,
then you should add day number as an extra column, include it in the
primary key, and remove all but 1 column for each of the 5 measurements.

>Ideally I would like to create (don't know that i can do this exclusively in
>SQL or not, thus the question) a report that will identify any area where the
>TargetValue is different between two similar records as well as only the
>fields where the value(s) responsible for the change are reported. 

You pprobably can do it in SQL, but remember that SQL was designed as a
language for manipulating large quantities of data, not as a language
for creating reports. Almost every language will be better suited for
this task than SQL. Just fetch the rough data from SQL Server into the
front end, compare the data there and create the report based on the
comparison results.

And remember to handle NULL properly in your comparison. If the data can
allow NULL, then you won't find all differences with
  WHERE a.Column1 <> b.Column1
You'll have to cater for changes from NULL to non-NULL and vice versa as
well:
  WHERE a.Column1 <> b.Column1
  OR   (a.Column1 IS NULL AND b.Column1 IS NOT NULL)
  OR   (a.Column1 IS NOT NULL AND b.Column1 IS NULL)
How you do that in your reporting language depends on how NULL is
represented in that language and how the language deals with comparisons
involving NULL.

--
Hugo Kornelis, SQL Server MVP
Author
14 Sep 2006 4:13 PM
knightconsulting
try firefly sql compare. i wrote it for this purpose, it's free.

http://www.getfirefly.net/
http://www.getfirefly.net/FF_Screen_1_Small_4.png

AddThis Social Bookmark Button