|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Find duplicatesgave me yesterday (I added CheckNumber in for the problem). I had a problem with finding duplicate check numbers in the table (voids,re-issued checks etc). It does show in my results, but out of 4300+checks it is hard to find all the duplicates for auditing purposes while setting up my script. How do I go about changing my query to show ONLY duplicates (or vice versa)? Here is the data: drop table EmployeePay drop table PayDetail CREATE TABLE [dbo].[EmployeePay] ( [EmployeePayID] [int] IDENTITY (1, 1), [ClientID] [int] Not NULL , [CheckNumber] [int] Not Null, [PayDate] [smalldatetime] Not NULL, PRIMARY KEY (EmployeePayID) ) ON [PRIMARY] CREATE TABLE [dbo].[PayDetail] ( [PayDetailID] [int] IDENTITY (1, 1), [ClientID] [int] Not NULL , [PayDate] [smalldatetime] Not NULL , [Code] [varchar] (5) Not NULL , [Hours] [int] Not NULL , [Amount] [money] Not NULL, PRIMARY KEY (PayDetailID) ) ON [PRIMARY] insert EmployeePay values (1,1234,'01-01-05') insert EmployeePay values (1,2255,'02-03-05') insert EmployeePay values (2,1331,'01-01-05') insert EmployeePay values (2,3011,'03-06-05') insert EmployeePay values (2,1331,'05-06-05') insert PayDetail values (1,'01-01-05','5',20,200.50) insert PayDetail values (1,'01-01-05','6',25,10.50) insert PayDetail values (1,'01-01-05','13',50,120.25) insert PayDetail values (1,'02-03-05','5',5,110) insert PayDetail values (1,'02-03-05','9',18,250.50) insert PayDetail values (2,'01-01-05','5',50,120.25) insert PayDetail values (2,'01-01-05','44',10,320.32) insert PayDetail values (2,'01-01-05','32',50,120.25) insert PayDetail values (2,'03-06-05','46',18,235.75) insert PayDetail values (2,'05-06-05','5',-50,-120.25) insert PayDetail values (2,'05-06-05','44',-10,-320.32) insert PayDetail values (2,'05-06-05','32',-50,-120.25) and the script I am using: CREATE VIEW PayDetailWithRank AS SELECT pd1.ClientID, pd1.PayDate, pd1.Code, pd1.Hours, pd1.Amount, (SELECT COUNT(*) FROM PayDetail AS pd2 WHERE pd2.ClientID = pd1.ClientID AND pd2.PayDate = pd1.PayDate AND pd2.Code < pd1.Code) AS Rank FROM PayDetail AS pd1 go SELECT e.ClientID, e.PayDate, e.CheckNumber, d1.Code, d1.Hours, d1.Amount, d2.Code, d2.Hours, d2.Amount, d3.Code, d3.Hours, d3.Amount FROM EmployeePay AS e INNER JOIN PayDetailWithRank AS d1 ON d1.ClientID = e.ClientID AND d1.PayDate = e.PayDate AND d1.Rank % 3 = 0 LEFT JOIN PayDetailWithRank AS d2 ON d2.ClientID = e.ClientID AND d2.PayDate = e.PayDate AND d2.Rank / 3 = d1.Rank / 3 AND d2.Rank % 3 = 1 LEFT JOIN PayDetailWithRank AS d3 ON d3.ClientID = e.ClientID AND d3.PayDate = e.PayDate AND d3.Rank / 3 = d1.Rank / 3 AND d3.Rank % 3 = 2 ORDER BY e.ClientID, e.PayDate, d1.Rank go DROP VIEW PayDetailWithRank go This will give you 1 duplicate check. Can I change this to show only duplicate, somehow? Thanks, Tom Never mind.
I figure out how. I have to modify the statement slightly I need to change the Order by to Group By and add a "Having Count(*) > 1" clause. Tom Show quote "tshad" <tscheider***@ftsolutions.com> wrote in message news:%23tVBdqhuFHA.1472@TK2MSFTNGP15.phx.gbl... >I have a query that I used that was based on the following query that Hugo >gave me yesterday (I added CheckNumber in for the problem). > > I had a problem with finding duplicate check numbers in the table > (voids,re-issued checks etc). It does show in my results, but out of > 4300+checks it is hard to find all the duplicates for auditing purposes > while setting up my script. > > How do I go about changing my query to show ONLY duplicates (or vice > versa)? > > Here is the data: > > drop table EmployeePay > drop table PayDetail > CREATE TABLE [dbo].[EmployeePay] ( > [EmployeePayID] [int] IDENTITY (1, 1), > [ClientID] [int] Not NULL , > [CheckNumber] [int] Not Null, > [PayDate] [smalldatetime] Not NULL, > PRIMARY KEY (EmployeePayID) > ) ON [PRIMARY] > CREATE TABLE [dbo].[PayDetail] ( > [PayDetailID] [int] IDENTITY (1, 1), > [ClientID] [int] Not NULL , > [PayDate] [smalldatetime] Not NULL , > [Code] [varchar] (5) Not NULL , > [Hours] [int] Not NULL , > [Amount] [money] Not NULL, > PRIMARY KEY (PayDetailID) > ) ON [PRIMARY] > insert EmployeePay values (1,1234,'01-01-05') > insert EmployeePay values (1,2255,'02-03-05') > insert EmployeePay values (2,1331,'01-01-05') > insert EmployeePay values (2,3011,'03-06-05') > insert EmployeePay values (2,1331,'05-06-05') > insert PayDetail values (1,'01-01-05','5',20,200.50) > insert PayDetail values (1,'01-01-05','6',25,10.50) > insert PayDetail values (1,'01-01-05','13',50,120.25) > insert PayDetail values (1,'02-03-05','5',5,110) > insert PayDetail values (1,'02-03-05','9',18,250.50) > insert PayDetail values (2,'01-01-05','5',50,120.25) > insert PayDetail values (2,'01-01-05','44',10,320.32) > insert PayDetail values (2,'01-01-05','32',50,120.25) > insert PayDetail values (2,'03-06-05','46',18,235.75) > insert PayDetail values (2,'05-06-05','5',-50,-120.25) > insert PayDetail values (2,'05-06-05','44',-10,-320.32) > insert PayDetail values (2,'05-06-05','32',-50,-120.25) > > > and the script I am using: > > CREATE VIEW PayDetailWithRank > AS > SELECT pd1.ClientID, pd1.PayDate, pd1.Code, pd1.Hours, pd1.Amount, > (SELECT COUNT(*) > FROM PayDetail AS pd2 > WHERE pd2.ClientID = pd1.ClientID > AND pd2.PayDate = pd1.PayDate > AND pd2.Code < pd1.Code) AS Rank > FROM PayDetail AS pd1 > go > SELECT e.ClientID, e.PayDate, e.CheckNumber, > d1.Code, d1.Hours, d1.Amount, > d2.Code, d2.Hours, d2.Amount, > d3.Code, d3.Hours, d3.Amount > FROM EmployeePay AS e > INNER JOIN PayDetailWithRank AS d1 > ON d1.ClientID = e.ClientID > AND d1.PayDate = e.PayDate > AND d1.Rank % 3 = 0 > LEFT JOIN PayDetailWithRank AS d2 > ON d2.ClientID = e.ClientID > AND d2.PayDate = e.PayDate > AND d2.Rank / 3 = d1.Rank / 3 > AND d2.Rank % 3 = 1 > LEFT JOIN PayDetailWithRank AS d3 > ON d3.ClientID = e.ClientID > AND d3.PayDate = e.PayDate > AND d3.Rank / 3 = d1.Rank / 3 > AND d3.Rank % 3 = 2 > ORDER BY e.ClientID, e.PayDate, d1.Rank > go > DROP VIEW PayDetailWithRank > go > > > This will give you 1 duplicate check. Can I change this to show only > duplicate, somehow? > > Thanks, > > Tom > Hi
Changing the first left outer join to be an inner join will ensure at least one duplicate. SELECT e.ClientID, e.PayDate, e.CheckNumber, d1.Code, d1.Hours, d1.Amount, d1.rank, d2.Code, d2.Hours, d2.Amount, d2.rank, d3.Code, d3.Hours, d3.Amount, d3.rank FROM EmployeePay AS e JOIN PayDetailWithRank AS d1 ON d1.ClientID = e.ClientID AND d1.PayDate = e.PayDate AND d1.Rank % 3 = 0 JOIN PayDetailWithRank AS d2 ON d2.ClientID = e.ClientID AND d2.PayDate = e.PayDate AND d2.Rank / 3 = d1.Rank / 3 AND d2.Rank % 3 = 1 LEFT JOIN PayDetailWithRank AS d3 ON d3.ClientID = e.ClientID AND d3.PayDate = e.PayDate AND d3.Rank / 3 = d1.Rank / 3 AND d3.Rank % 3 = 2 ORDER BY e.ClientID, e.PayDate, d1.Rank What are you expecting if there are 4 duplicates? John Show quote "tshad" <tscheider***@ftsolutions.com> wrote in message news:%23tVBdqhuFHA.1472@TK2MSFTNGP15.phx.gbl... >I have a query that I used that was based on the following query that Hugo >gave me yesterday (I added CheckNumber in for the problem). > > I had a problem with finding duplicate check numbers in the table > (voids,re-issued checks etc). It does show in my results, but out of > 4300+checks it is hard to find all the duplicates for auditing purposes > while setting up my script. > > How do I go about changing my query to show ONLY duplicates (or vice > versa)? > > Here is the data: > > drop table EmployeePay > drop table PayDetail > CREATE TABLE [dbo].[EmployeePay] ( > [EmployeePayID] [int] IDENTITY (1, 1), > [ClientID] [int] Not NULL , > [CheckNumber] [int] Not Null, > [PayDate] [smalldatetime] Not NULL, > PRIMARY KEY (EmployeePayID) > ) ON [PRIMARY] > CREATE TABLE [dbo].[PayDetail] ( > [PayDetailID] [int] IDENTITY (1, 1), > [ClientID] [int] Not NULL , > [PayDate] [smalldatetime] Not NULL , > [Code] [varchar] (5) Not NULL , > [Hours] [int] Not NULL , > [Amount] [money] Not NULL, > PRIMARY KEY (PayDetailID) > ) ON [PRIMARY] > insert EmployeePay values (1,1234,'01-01-05') > insert EmployeePay values (1,2255,'02-03-05') > insert EmployeePay values (2,1331,'01-01-05') > insert EmployeePay values (2,3011,'03-06-05') > insert EmployeePay values (2,1331,'05-06-05') > insert PayDetail values (1,'01-01-05','5',20,200.50) > insert PayDetail values (1,'01-01-05','6',25,10.50) > insert PayDetail values (1,'01-01-05','13',50,120.25) > insert PayDetail values (1,'02-03-05','5',5,110) > insert PayDetail values (1,'02-03-05','9',18,250.50) > insert PayDetail values (2,'01-01-05','5',50,120.25) > insert PayDetail values (2,'01-01-05','44',10,320.32) > insert PayDetail values (2,'01-01-05','32',50,120.25) > insert PayDetail values (2,'03-06-05','46',18,235.75) > insert PayDetail values (2,'05-06-05','5',-50,-120.25) > insert PayDetail values (2,'05-06-05','44',-10,-320.32) > insert PayDetail values (2,'05-06-05','32',-50,-120.25) > > > and the script I am using: > > CREATE VIEW PayDetailWithRank > AS > SELECT pd1.ClientID, pd1.PayDate, pd1.Code, pd1.Hours, pd1.Amount, > (SELECT COUNT(*) > FROM PayDetail AS pd2 > WHERE pd2.ClientID = pd1.ClientID > AND pd2.PayDate = pd1.PayDate > AND pd2.Code < pd1.Code) AS Rank > FROM PayDetail AS pd1 > go > SELECT e.ClientID, e.PayDate, e.CheckNumber, > d1.Code, d1.Hours, d1.Amount, > d2.Code, d2.Hours, d2.Amount, > d3.Code, d3.Hours, d3.Amount > FROM EmployeePay AS e > INNER JOIN PayDetailWithRank AS d1 > ON d1.ClientID = e.ClientID > AND d1.PayDate = e.PayDate > AND d1.Rank % 3 = 0 > LEFT JOIN PayDetailWithRank AS d2 > ON d2.ClientID = e.ClientID > AND d2.PayDate = e.PayDate > AND d2.Rank / 3 = d1.Rank / 3 > AND d2.Rank % 3 = 1 > LEFT JOIN PayDetailWithRank AS d3 > ON d3.ClientID = e.ClientID > AND d3.PayDate = e.PayDate > AND d3.Rank / 3 = d1.Rank / 3 > AND d3.Rank % 3 = 2 > ORDER BY e.ClientID, e.PayDate, d1.Rank > go > DROP VIEW PayDetailWithRank > go > > > This will give you 1 duplicate check. Can I change this to show only > duplicate, somehow? > > Thanks, > > Tom > >> had a problem with finding duplicate check numbers in the table (voids,re-issued checks etc). << I have a better question for you: why are you creating duplicate checknumbers? Do you think that perhaps not having a proper relational key could be part of the problem? If you will learn to write proper DDL, you will save a lot of complex DML. Also, look up the ISO-8601 format for dates and times. "--CELKO--" <jcelko***@earthlink.net> wrote in message As I had mentioned, this is not my database. I built a table to mimic that news:1126814005.385526.122020@z14g2000cwz.googlegroups.com... >>> had a problem with finding duplicate check numbers in the table >>> (voids,re-issued checks etc). << > > I have a better question for you: why are you creating duplicate check > numbers? Do you think that perhaps not having a proper relational key > could be part of the problem? tables I am extracting information from to create a CSV file that I can use to import into our system. I am not creating Duplicate Check numbers - that is what is in their system and I have to deal with it. I have over 5000 checks and as it turned out 5 duplicate check numbers. I used my group by/having solutions to find them. As I mentioned, they were voids (hence the duplicate check number) as well as a check that was re-issued (we don't know why yet). Are you suggesting I should spend my time creating a PROPER database schema, even if it doesn't mirror what I am trying to accomplish? And BTW, you have pounded incessantly about the evils of the IDENTITY. I for one am grateful they used it, as you would have seen in my other post. Hugos solution with the Views worked great, but hadn't taken into account multiple codes on one check (which there would have been no way for him to know this). I found that their tables used identities and that was the only field I could replace the Code field with to make it work. It had to be sequential and unique - Gaps would not have been an issue. Tom Show quote > > If you will learn to write proper DDL, you will save a lot of complex > DML. > > Also, look up the ISO-8601 format for dates and times. > On Thu, 15 Sep 2005 13:40:38 -0700, tshad wrote:
(snip) >Hugos solution with the Views worked great, but hadn't taken into account Hi Tom,>multiple codes on one check (which there would have been no way for him to >know this). I found that their tables used identities and that was the only >field I could replace the Code field with to make it work. It had to be >sequential and unique - Gaps would not have been an issue. It would also have been solveable without the identity column. If there was no single column to make each row unique within Client/Paydate, I'd have used a combination of two columns. Or three. Or ... well, you get the point. The code would have been more messy, but it would have worked. The only thing I would not have been able to handle were complete duplicates (i.e. two or more rows with the same value in each of their columns). But if you had those, you'd have much bigger problems to solve than generating a kind-of-crosstab report. <g> Best, Hugo -- (Remove _NO_ and _SPAM_ to get my e-mail address)
Show quote
"Hugo Kornelis" <hugo@pe_NO_rFact.in_SPAM_fo> wrote in message I agree.news:5oqji11o59rufd33auh0psimi867s65rj7@4ax.com... > On Thu, 15 Sep 2005 13:40:38 -0700, tshad wrote: > > (snip) >>Hugos solution with the Views worked great, but hadn't taken into account >>multiple codes on one check (which there would have been no way for him to >>know this). I found that their tables used identities and that was the >>only >>field I could replace the Code field with to make it work. It had to be >>sequential and unique - Gaps would not have been an issue. > > Hi Tom, > > It would also have been solveable without the identity column. If there > was no single column to make each row unique within Client/Paydate, I'd > have used a combination of two columns. Or three. Or ... well, you get > the point. The code would have been more messy, but it would have > worked. But in this case, there was no other set of columns (that I could make out) that would have solved the problem. I am not sure you could have made multiple columns work with a surrogate key, in this case. All the tables (earnings, taxes and deductions) all could have multiple records with the same code, date, employee number (SSN) etc on the same check. That was what I was looking for, I just happen to notice that they have an identity field there and it solved the problem. Thanks, Tom Show quote > > The only thing I would not have been able to handle were complete > duplicates (i.e. two or more rows with the same value in each of their > columns). But if you had those, you'd have much bigger problems to solve > than generating a kind-of-crosstab report. <g> > > Best, Hugo > -- > > (Remove _NO_ and _SPAM_ to get my e-mail address) |
|||||||||||||||||||||||