Home All Groups Group Topic Archive Search About
Author
15 Sep 2005 5:26 PM
tshad
I have a query that I used that was based on the following query that Hugo
gave me yesterday (I added CheckNumber in for the problem).

I had a problem with finding duplicate check numbers in the table
(voids,re-issued checks etc).  It does show in my results, but out of
4300+checks it is hard to find all the duplicates for auditing purposes
while setting up my script.

How do I go about changing my query to show ONLY duplicates (or vice versa)?

Here is the data:

drop table EmployeePay
drop table PayDetail
CREATE TABLE [dbo].[EmployeePay] (
[EmployeePayID] [int]  IDENTITY (1, 1),
[ClientID] [int] Not NULL ,
[CheckNumber] [int] Not Null,
[PayDate] [smalldatetime] Not NULL,
PRIMARY KEY (EmployeePayID)
) ON [PRIMARY]
CREATE TABLE [dbo].[PayDetail] (
[PayDetailID] [int]  IDENTITY (1, 1),
[ClientID] [int] Not NULL ,
[PayDate] [smalldatetime] Not NULL ,
[Code] [varchar] (5) Not NULL ,
[Hours] [int] Not NULL ,
[Amount] [money] Not NULL,
PRIMARY KEY (PayDetailID)
) ON [PRIMARY]
insert EmployeePay values (1,1234,'01-01-05')
insert EmployeePay values (1,2255,'02-03-05')
insert EmployeePay values (2,1331,'01-01-05')
insert EmployeePay values (2,3011,'03-06-05')
insert EmployeePay values (2,1331,'05-06-05')
insert PayDetail values (1,'01-01-05','5',20,200.50)
insert PayDetail values (1,'01-01-05','6',25,10.50)
insert PayDetail values (1,'01-01-05','13',50,120.25)
insert PayDetail values (1,'02-03-05','5',5,110)
insert PayDetail values (1,'02-03-05','9',18,250.50)
insert PayDetail values (2,'01-01-05','5',50,120.25)
insert PayDetail values (2,'01-01-05','44',10,320.32)
insert PayDetail values (2,'01-01-05','32',50,120.25)
insert PayDetail values (2,'03-06-05','46',18,235.75)
insert PayDetail values (2,'05-06-05','5',-50,-120.25)
insert PayDetail values (2,'05-06-05','44',-10,-320.32)
insert PayDetail values (2,'05-06-05','32',-50,-120.25)


and the script I am using:

CREATE VIEW PayDetailWithRank
AS
SELECT pd1.ClientID, pd1.PayDate, pd1.Code, pd1.Hours, pd1.Amount,
      (SELECT COUNT(*)
       FROM   PayDetail AS pd2
       WHERE  pd2.ClientID = pd1.ClientID
       AND    pd2.PayDate = pd1.PayDate
       AND    pd2.Code < pd1.Code) AS Rank
FROM   PayDetail AS pd1
go
SELECT     e.ClientID, e.PayDate, e.CheckNumber,
           d1.Code, d1.Hours, d1.Amount,
           d2.Code, d2.Hours, d2.Amount,
           d3.Code, d3.Hours, d3.Amount
FROM       EmployeePay AS e
INNER JOIN PayDetailWithRank AS d1
      ON   d1.ClientID = e.ClientID
      AND  d1.PayDate  = e.PayDate
      AND  d1.Rank % 3 = 0
LEFT  JOIN PayDetailWithRank AS d2
      ON   d2.ClientID = e.ClientID
      AND  d2.PayDate  = e.PayDate
      AND  d2.Rank / 3 = d1.Rank / 3
      AND  d2.Rank % 3 = 1
LEFT  JOIN PayDetailWithRank AS d3
      ON   d3.ClientID = e.ClientID
      AND  d3.PayDate  = e.PayDate
      AND  d3.Rank / 3 = d1.Rank / 3
      AND  d3.Rank % 3 = 2
ORDER BY   e.ClientID, e.PayDate, d1.Rank
go
DROP VIEW PayDetailWithRank
go


This will give you 1 duplicate check.  Can I change this to show only
duplicate, somehow?

Thanks,

Tom

Author
15 Sep 2005 6:40 PM
tshad
Never mind.

I figure out how.  I have to modify the statement slightly

I need to change the Order by to Group By and add a "Having Count(*) > 1"
clause.

Tom

Show quote
"tshad" <tscheider***@ftsolutions.com> wrote in message
news:%23tVBdqhuFHA.1472@TK2MSFTNGP15.phx.gbl...
>I have a query that I used that was based on the following query that Hugo
>gave me yesterday (I added CheckNumber in for the problem).
>
> I had a problem with finding duplicate check numbers in the table
> (voids,re-issued checks etc).  It does show in my results, but out of
> 4300+checks it is hard to find all the duplicates for auditing purposes
> while setting up my script.
>
> How do I go about changing my query to show ONLY duplicates (or vice
> versa)?
>
> Here is the data:
>
> drop table EmployeePay
> drop table PayDetail
> CREATE TABLE [dbo].[EmployeePay] (
> [EmployeePayID] [int]  IDENTITY (1, 1),
> [ClientID] [int] Not NULL ,
> [CheckNumber] [int] Not Null,
> [PayDate] [smalldatetime] Not NULL,
> PRIMARY KEY (EmployeePayID)
> ) ON [PRIMARY]
> CREATE TABLE [dbo].[PayDetail] (
> [PayDetailID] [int]  IDENTITY (1, 1),
> [ClientID] [int] Not NULL ,
> [PayDate] [smalldatetime] Not NULL ,
> [Code] [varchar] (5) Not NULL ,
> [Hours] [int] Not NULL ,
> [Amount] [money] Not NULL,
> PRIMARY KEY (PayDetailID)
> ) ON [PRIMARY]
> insert EmployeePay values (1,1234,'01-01-05')
> insert EmployeePay values (1,2255,'02-03-05')
> insert EmployeePay values (2,1331,'01-01-05')
> insert EmployeePay values (2,3011,'03-06-05')
> insert EmployeePay values (2,1331,'05-06-05')
> insert PayDetail values (1,'01-01-05','5',20,200.50)
> insert PayDetail values (1,'01-01-05','6',25,10.50)
> insert PayDetail values (1,'01-01-05','13',50,120.25)
> insert PayDetail values (1,'02-03-05','5',5,110)
> insert PayDetail values (1,'02-03-05','9',18,250.50)
> insert PayDetail values (2,'01-01-05','5',50,120.25)
> insert PayDetail values (2,'01-01-05','44',10,320.32)
> insert PayDetail values (2,'01-01-05','32',50,120.25)
> insert PayDetail values (2,'03-06-05','46',18,235.75)
> insert PayDetail values (2,'05-06-05','5',-50,-120.25)
> insert PayDetail values (2,'05-06-05','44',-10,-320.32)
> insert PayDetail values (2,'05-06-05','32',-50,-120.25)
>
>
> and the script I am using:
>
> CREATE VIEW PayDetailWithRank
> AS
> SELECT pd1.ClientID, pd1.PayDate, pd1.Code, pd1.Hours, pd1.Amount,
>      (SELECT COUNT(*)
>       FROM   PayDetail AS pd2
>       WHERE  pd2.ClientID = pd1.ClientID
>       AND    pd2.PayDate = pd1.PayDate
>       AND    pd2.Code < pd1.Code) AS Rank
> FROM   PayDetail AS pd1
> go
> SELECT     e.ClientID, e.PayDate, e.CheckNumber,
>           d1.Code, d1.Hours, d1.Amount,
>           d2.Code, d2.Hours, d2.Amount,
>           d3.Code, d3.Hours, d3.Amount
> FROM       EmployeePay AS e
> INNER JOIN PayDetailWithRank AS d1
>      ON   d1.ClientID = e.ClientID
>      AND  d1.PayDate  = e.PayDate
>      AND  d1.Rank % 3 = 0
> LEFT  JOIN PayDetailWithRank AS d2
>      ON   d2.ClientID = e.ClientID
>      AND  d2.PayDate  = e.PayDate
>      AND  d2.Rank / 3 = d1.Rank / 3
>      AND  d2.Rank % 3 = 1
> LEFT  JOIN PayDetailWithRank AS d3
>      ON   d3.ClientID = e.ClientID
>      AND  d3.PayDate  = e.PayDate
>      AND  d3.Rank / 3 = d1.Rank / 3
>      AND  d3.Rank % 3 = 2
> ORDER BY   e.ClientID, e.PayDate, d1.Rank
> go
> DROP VIEW PayDetailWithRank
> go
>
>
> This will give you 1 duplicate check.  Can I change this to show only
> duplicate, somehow?
>
> Thanks,
>
> Tom
>
Author
15 Sep 2005 6:46 PM
John Bell
Hi

Changing the first left outer join to be an inner join will ensure at least
one duplicate.

SELECT     e.ClientID, e.PayDate, e.CheckNumber,
           d1.Code, d1.Hours, d1.Amount, d1.rank,
           d2.Code, d2.Hours, d2.Amount, d2.rank,
           d3.Code, d3.Hours, d3.Amount, d3.rank
FROM       EmployeePay AS e
JOIN PayDetailWithRank AS d1
      ON   d1.ClientID = e.ClientID
      AND  d1.PayDate  = e.PayDate
      AND  d1.Rank % 3 = 0
JOIN PayDetailWithRank AS d2
      ON   d2.ClientID = e.ClientID
      AND  d2.PayDate  = e.PayDate
      AND  d2.Rank / 3 = d1.Rank / 3
      AND  d2.Rank % 3 = 1
LEFT  JOIN PayDetailWithRank AS d3
      ON   d3.ClientID = e.ClientID
      AND  d3.PayDate  = e.PayDate
      AND  d3.Rank / 3 = d1.Rank / 3
      AND  d3.Rank % 3 = 2
ORDER BY   e.ClientID, e.PayDate, d1.Rank

What are you expecting if there are 4 duplicates?

John

Show quote
"tshad" <tscheider***@ftsolutions.com> wrote in message
news:%23tVBdqhuFHA.1472@TK2MSFTNGP15.phx.gbl...
>I have a query that I used that was based on the following query that Hugo
>gave me yesterday (I added CheckNumber in for the problem).
>
> I had a problem with finding duplicate check numbers in the table
> (voids,re-issued checks etc).  It does show in my results, but out of
> 4300+checks it is hard to find all the duplicates for auditing purposes
> while setting up my script.
>
> How do I go about changing my query to show ONLY duplicates (or vice
> versa)?
>
> Here is the data:
>
> drop table EmployeePay
> drop table PayDetail
> CREATE TABLE [dbo].[EmployeePay] (
> [EmployeePayID] [int]  IDENTITY (1, 1),
> [ClientID] [int] Not NULL ,
> [CheckNumber] [int] Not Null,
> [PayDate] [smalldatetime] Not NULL,
> PRIMARY KEY (EmployeePayID)
> ) ON [PRIMARY]
> CREATE TABLE [dbo].[PayDetail] (
> [PayDetailID] [int]  IDENTITY (1, 1),
> [ClientID] [int] Not NULL ,
> [PayDate] [smalldatetime] Not NULL ,
> [Code] [varchar] (5) Not NULL ,
> [Hours] [int] Not NULL ,
> [Amount] [money] Not NULL,
> PRIMARY KEY (PayDetailID)
> ) ON [PRIMARY]
> insert EmployeePay values (1,1234,'01-01-05')
> insert EmployeePay values (1,2255,'02-03-05')
> insert EmployeePay values (2,1331,'01-01-05')
> insert EmployeePay values (2,3011,'03-06-05')
> insert EmployeePay values (2,1331,'05-06-05')
> insert PayDetail values (1,'01-01-05','5',20,200.50)
> insert PayDetail values (1,'01-01-05','6',25,10.50)
> insert PayDetail values (1,'01-01-05','13',50,120.25)
> insert PayDetail values (1,'02-03-05','5',5,110)
> insert PayDetail values (1,'02-03-05','9',18,250.50)
> insert PayDetail values (2,'01-01-05','5',50,120.25)
> insert PayDetail values (2,'01-01-05','44',10,320.32)
> insert PayDetail values (2,'01-01-05','32',50,120.25)
> insert PayDetail values (2,'03-06-05','46',18,235.75)
> insert PayDetail values (2,'05-06-05','5',-50,-120.25)
> insert PayDetail values (2,'05-06-05','44',-10,-320.32)
> insert PayDetail values (2,'05-06-05','32',-50,-120.25)
>
>
> and the script I am using:
>
> CREATE VIEW PayDetailWithRank
> AS
> SELECT pd1.ClientID, pd1.PayDate, pd1.Code, pd1.Hours, pd1.Amount,
>      (SELECT COUNT(*)
>       FROM   PayDetail AS pd2
>       WHERE  pd2.ClientID = pd1.ClientID
>       AND    pd2.PayDate = pd1.PayDate
>       AND    pd2.Code < pd1.Code) AS Rank
> FROM   PayDetail AS pd1
> go
> SELECT     e.ClientID, e.PayDate, e.CheckNumber,
>           d1.Code, d1.Hours, d1.Amount,
>           d2.Code, d2.Hours, d2.Amount,
>           d3.Code, d3.Hours, d3.Amount
> FROM       EmployeePay AS e
> INNER JOIN PayDetailWithRank AS d1
>      ON   d1.ClientID = e.ClientID
>      AND  d1.PayDate  = e.PayDate
>      AND  d1.Rank % 3 = 0
> LEFT  JOIN PayDetailWithRank AS d2
>      ON   d2.ClientID = e.ClientID
>      AND  d2.PayDate  = e.PayDate
>      AND  d2.Rank / 3 = d1.Rank / 3
>      AND  d2.Rank % 3 = 1
> LEFT  JOIN PayDetailWithRank AS d3
>      ON   d3.ClientID = e.ClientID
>      AND  d3.PayDate  = e.PayDate
>      AND  d3.Rank / 3 = d1.Rank / 3
>      AND  d3.Rank % 3 = 2
> ORDER BY   e.ClientID, e.PayDate, d1.Rank
> go
> DROP VIEW PayDetailWithRank
> go
>
>
> This will give you 1 duplicate check.  Can I change this to show only
> duplicate, somehow?
>
> Thanks,
>
> Tom
>
Author
15 Sep 2005 7:53 PM
--CELKO--
>>  had a problem with finding duplicate check numbers in the table (voids,re-issued checks etc).  <<

I have a better question for you: why are you creating duplicate check
numbers?  Do you think that perhaps not having a proper relational key
could be part of the problem?

If you will learn to write proper DDL, you will save a lot of complex
DML.

Also, look up the ISO-8601 format for dates and times.
Author
15 Sep 2005 8:40 PM
tshad
"--CELKO--" <jcelko***@earthlink.net> wrote in message
news:1126814005.385526.122020@z14g2000cwz.googlegroups.com...
>>>  had a problem with finding duplicate check numbers in the table
>>> (voids,re-issued checks etc).  <<
>
> I have a better question for you: why are you creating duplicate check
> numbers?  Do you think that perhaps not having a proper relational key
> could be part of the problem?

As I had mentioned, this is not my database.  I built a table to mimic that
tables I am extracting information from to create a CSV file that I can use
to import into our system.  I am not creating Duplicate Check numbers - that
is what is in their system and I have to deal with it.  I have over 5000
checks and as it turned out 5 duplicate check numbers.  I used my group
by/having solutions to find them.  As I mentioned, they were voids (hence
the duplicate check number) as well as a check that was re-issued (we don't
know why yet).

Are you suggesting I should spend my time creating a PROPER database schema,
even if it doesn't mirror what I am trying to accomplish?

And BTW, you have pounded incessantly about the evils of the IDENTITY.

I for one am grateful they used it, as you would have seen in my other post.
Hugos solution with the Views worked great, but hadn't taken into account
multiple codes on one check (which there would have been no way for him to
know this).  I found that their tables used identities and that was the only
field I could replace the Code field with to make it work.  It had to be
sequential and unique - Gaps would not have been an issue.

Tom
Show quote
>
> If you will learn to write proper DDL, you will save a lot of complex
> DML.
>
> Also, look up the ISO-8601 format for dates and times.
>
Author
15 Sep 2005 9:46 PM
Hugo Kornelis
On Thu, 15 Sep 2005 13:40:38 -0700, tshad wrote:

(snip)
>Hugos solution with the Views worked great, but hadn't taken into account
>multiple codes on one check (which there would have been no way for him to
>know this).  I found that their tables used identities and that was the only
>field I could replace the Code field with to make it work.  It had to be
>sequential and unique - Gaps would not have been an issue.

Hi Tom,

It would also have been solveable without the identity column. If there
was no single column to make each row unique within Client/Paydate, I'd
have used a combination of two columns. Or three. Or ... well, you get
the point. The code would have been more messy, but it would have
worked.

The only thing I would not have been able to handle were complete
duplicates (i.e. two or more rows with the same value in each of their
columns). But if you had those, you'd have much bigger problems to solve
than generating a kind-of-crosstab report. <g>

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)
Author
15 Sep 2005 10:07 PM
tshad
Show quote
"Hugo Kornelis" <hugo@pe_NO_rFact.in_SPAM_fo> wrote in message
news:5oqji11o59rufd33auh0psimi867s65rj7@4ax.com...
> On Thu, 15 Sep 2005 13:40:38 -0700, tshad wrote:
>
> (snip)
>>Hugos solution with the Views worked great, but hadn't taken into account
>>multiple codes on one check (which there would have been no way for him to
>>know this).  I found that their tables used identities and that was the
>>only
>>field I could replace the Code field with to make it work.  It had to be
>>sequential and unique - Gaps would not have been an issue.
>
> Hi Tom,
>
> It would also have been solveable without the identity column. If there
> was no single column to make each row unique within Client/Paydate, I'd
> have used a combination of two columns. Or three. Or ... well, you get
> the point. The code would have been more messy, but it would have
> worked.

I agree.

But in this case, there was no other set of columns (that I could make out)
that would have solved the problem.  I am not sure you could have made
multiple columns work with a surrogate key, in this case.  All the tables
(earnings, taxes and deductions) all could have multiple records with the
same code, date, employee number (SSN) etc on the same check.  That was what
I was looking for, I just happen to notice that they have an identity field
there and it solved the problem.

Thanks,

Tom
Show quote
>
> The only thing I would not have been able to handle were complete
> duplicates (i.e. two or more rows with the same value in each of their
> columns). But if you had those, you'd have much bigger problems to solve
> than generating a kind-of-crosstab report. <g>
>
> Best, Hugo
> --
>
> (Remove _NO_ and _SPAM_ to get my e-mail address)

AddThis Social Bookmark Button