Home All Groups Group Topic Archive Search About

Selecting Duplicates - all data

Author
13 Jan 2006 8:50 PM
Simon
Hi all,

I have found many solutions for selecting duplicates with the most
popular being:

SELECT ColA, COUNT(ColA)
FROM SomeTable
GROUP BY ColA
HAVING count(ColA) > 1

However, what I am after is a query that returns all of columns and all
of the duplicate rows where duplicates exist on a number of columns.

eg

MYID    FirstName    LastName    DOB
1    Jon        Smith        12/12/2000
2    Brian        Smith        12/12/2000

For my logic, duplicates are defined as any rows where LastName and DOB
are the same.

I need to return a result set that displays all rows that are duplicates
including all the fields in these rows. Not just a group by scenario
where I get the last name and a count of how many times it is duplicated.

Hope this makes sense, and thanks in advance.

Simon

Author
13 Jan 2006 9:18 PM
Trey Walpole
you could use a derived table, e.g.


select <column list>
from yourtable
join (
select lastname, dob, count(*) as dup_count
from yourtable
group by lastname, dob
having count(*)>1
) x
on yourtable.lastname = x.lastname
and yourtable.dob = x.dob
order by yourtable.lastname, yourtable.dob

Simon wrote:
Show quote
> Hi all,
>
> I have found many solutions for selecting duplicates with the most
> popular being:
>
> SELECT ColA, COUNT(ColA)
> FROM SomeTable
> GROUP BY ColA
> HAVING count(ColA) > 1
>
> However, what I am after is a query that returns all of columns and all
> of the duplicate rows where duplicates exist on a number of columns.
>
> eg
>
> MYID    FirstName    LastName    DOB
> 1    Jon        Smith        12/12/2000
> 2    Brian        Smith        12/12/2000
>
> For my logic, duplicates are defined as any rows where LastName and DOB
> are the same.
>
> I need to return a result set that displays all rows that are duplicates
> including all the fields in these rows. Not just a group by scenario
> where I get the last name and a count of how many times it is duplicated.
>
> Hope this makes sense, and thanks in advance.
>
> Simon
Author
13 Jan 2006 9:41 PM
Jim Underwood
There may be an easier way to do this... but this should do the trick...

Select * from SomeTable
where ColA in
(
SELECT ColA
FROM SomeTable
GROUP BY ColA
HAVING count(ColA) > 1
)
or ColB in
(
SELECT ColB
FROM SomeTable
GROUP BY ColB
HAVING count(ColB) > 1
)

Now just add the additional "Or Col# in (subquery)" for each additional
duplicate that you are looking for.  The OR statements and (possibly) large
in clauses may cause performance problems, so you may want to change teh
subqueries to exists or joins...

The exist would look like...

Select * from SomeTable Tab
where exists
(
SELECT count(ColA)
FROM SomeTable Tab1
where Tab1.ColA = Tab.ColA
GROUP BY ColA
HAVING count(ColA) > 1
)
or exists
(
SELECT count(ColB)
FROM SomeTable Tab2
where Tab2.ColB = Tab.ColB
GROUP BY ColB
HAVING count(ColB) > 1
)


Show quote
"Simon" <simon_nospam_rigby@nodomain.com> wrote in message
news:e1u2LMIGGHA.2064@TK2MSFTNGP09.phx.gbl...
> Hi all,
>
> I have found many solutions for selecting duplicates with the most
> popular being:
>
> SELECT ColA, COUNT(ColA)
> FROM SomeTable
> GROUP BY ColA
> HAVING count(ColA) > 1
>
> However, what I am after is a query that returns all of columns and all
> of the duplicate rows where duplicates exist on a number of columns.
>
> eg
>
> MYID FirstName LastName DOB
> 1 Jon Smith 12/12/2000
> 2 Brian Smith 12/12/2000
>
> For my logic, duplicates are defined as any rows where LastName and DOB
> are the same.
>
> I need to return a result set that displays all rows that are duplicates
> including all the fields in these rows. Not just a group by scenario
> where I get the last name and a count of how many times it is duplicated.
>
> Hope this makes sense, and thanks in advance.
>
> Simon
Author
13 Jan 2006 10:37 PM
Louis Davidson
It is almost the same idea, just use a derived table via an EXISTS criteria.
I used my own table names since you didn't give us a create script.  The
idea is that I am checking the last two columns for dups, and returning all
rows:

declare @findDups table
(
    findDupsId  int primary key,
    somecolumn  varchar(10),
    checkColumn1 int,
    checkColumn2 int
)
insert into @findDups
select 1,'barney', 1,1
union all
select 2,'barney', 1,2
union all
select 3,'bam-bam',1,1
union all
select 4,'pebbles',1,3
union all
select 5,'pebbles',1,4
union all
select 6,'fred',1,2

--Something like:

select *
from   @findDups as mainTable
where  exists (   select *
                  from  (
                            select checkColumn1, checkColumn2
                            from  @findDups
                            group by checkColumn1, checkColumn2
                            having count(*) > 1) as dups
                  where dups.checkColumn1 = mainTable.checkColumn1
                      and dups.checkColumn2 = mainTable.checkColumn2)
order by checkColumn1, checkColumn2

Which returns:

findDupsId somecolumn checkColumn1 checkColumn2
----------- ----------     ------------     ------------
1               barney           1                     1
3               bam-bam       1                    1
6               fred               1                     2
2               barney           1                     2

--
----------------------------------------------------------------------------
Louis Davidson - http://spaces.msn.com/members/drsql/
SQL Server MVP
"Arguments are to be avoided: they are always vulgar and often convincing."
(Oscar Wilde)

Show quote
"Simon" <simon_nospam_rigby@nodomain.com> wrote in message
news:e1u2LMIGGHA.2064@TK2MSFTNGP09.phx.gbl...
> Hi all,
>
> I have found many solutions for selecting duplicates with the most popular
> being:
>
> SELECT ColA, COUNT(ColA)
> FROM SomeTable
> GROUP BY ColA
> HAVING count(ColA) > 1
>
> However, what I am after is a query that returns all of columns and all of
> the duplicate rows where duplicates exist on a number of columns.
>
> eg
>
> MYID FirstName LastName DOB
> 1 Jon Smith 12/12/2000
> 2 Brian Smith 12/12/2000
>
> For my logic, duplicates are defined as any rows where LastName and DOB
> are the same.
>
> I need to return a result set that displays all rows that are duplicates
> including all the fields in these rows. Not just a group by scenario where
> I get the last name and a count of how many times it is duplicated.
>
> Hope this makes sense, and thanks in advance.
>
> Simon
Author
13 Jan 2006 11:27 PM
Simon
Thanks to all of the respondents. Just what I was looking for and the
problem is solved.

Thanks

Simon

Louis Davidson wrote:
Show quote
> It is almost the same idea, just use a derived table via an EXISTS criteria.
> I used my own table names since you didn't give us a create script.  The
> idea is that I am checking the last two columns for dups, and returning all
> rows:
>
> declare @findDups table
> (
>     findDupsId  int primary key,
>     somecolumn  varchar(10),
>     checkColumn1 int,
>     checkColumn2 int
> )
> insert into @findDups
> select 1,'barney', 1,1
> union all
> select 2,'barney', 1,2
> union all
> select 3,'bam-bam',1,1
> union all
> select 4,'pebbles',1,3
> union all
> select 5,'pebbles',1,4
> union all
> select 6,'fred',1,2
>
> --Something like:
>
> select *
> from   @findDups as mainTable
> where  exists (   select *
>                   from  (
>                             select checkColumn1, checkColumn2
>                             from  @findDups
>                             group by checkColumn1, checkColumn2
>                             having count(*) > 1) as dups
>                   where dups.checkColumn1 = mainTable.checkColumn1
>                       and dups.checkColumn2 = mainTable.checkColumn2)
> order by checkColumn1, checkColumn2
>
> Which returns:
>
> findDupsId somecolumn checkColumn1 checkColumn2
> ----------- ----------     ------------     ------------
> 1               barney           1                     1
> 3               bam-bam       1                    1
> 6               fred               1                     2
> 2               barney           1                     2
>

AddThis Social Bookmark Button