Home All Groups Group Topic Archive Search About

how to identify only the duplicates of duplicate rows?

Author
7 Jul 2005 4:30 PM
Rich
I want to mark the bit field for only the duplicates of duplicate rows below:

CREATE TABLE tbl1(
rownum int,
fname varchar(50),
lname varchar(50),
yesno bit
)

set nocount on
insert into tbl1(rownum, fname, lname) values(1, 'joe', 'smith')
insert into tbl1(rownum, fname, lname) values(2, 'joe', 'smith')
insert into tbl1(rownum, fname, lname) values(3, 'joe', 'smith')
insert into tbl1(rownum, fname, lname) values(4, 'joe', 'mon')
insert into tbl1(rownum, fname, lname) values(5, 'tom', 'jones')
insert into tbl1(rownum, fname, lname) values(6, 'tom', 'jones')
insert into tbl1(rownum, fname, lname) values(7, 'sue', 'kent')
insert into tbl1(rownum, fname, lname) values(8, 'sue', 'kent')
insert into tbl1(rownum, fname, lname) values(9, 'sue', 'sue')
insert into tbl1(rownum, fname, lname) values(10, 'mark', 'allen')

SELECT t1.rownum, t1.fname, t1.lname, t1.yesno
FROM tbl1 t1
WHERE t1.fname In (SELECT t2.fname FROM tbl1 t2 GROUP BY t2.fname, t2.lname
HAVING Count(*)>1  And t2.lname = t1.lname)

--this yields
1    joe    smith    NULL
2    joe    smith    NULL
3    joe    smith    NULL
5    tom    jones    NULL
6    tom    jones    NULL
7    sue    kent    NULL
8    sue    kent    NULL

I want to mark rows 2, 3, 6, 8 with a 1;  I do not want to mark rows 1, 5,
and 7.  Rows 1, 5, and 7 and the first rows in the duplicates.  How can I do
this?

Thanks,
Rich

Author
7 Jul 2005 4:53 PM
Daniel Wilson
I have not tested this exact solution, but have done this sort of thing many
times.

Update T1 set YesNo = 1 From
    Tbl1 T1 Inner Join Tbl1 T2 ON
        T1.fname = T2.fname and T1.lname = T2.lname And T1.RowNum >
T2.RowNum

hth,
--
Daniel Wilson
Senior Software Solutions Developer
Embtrak Development Team
http://www.Embtrak.com
DVBrown Company

Show quote
"Rich" <R***@discussions.microsoft.com> wrote in message
news:D4FFF77B-52A9-4389-9C0B-5748A0F7A561@microsoft.com...
> I want to mark the bit field for only the duplicates of duplicate rows
below:
>
> CREATE TABLE tbl1(
> rownum int,
> fname varchar(50),
> lname varchar(50),
> yesno bit
> )
>
> set nocount on
> insert into tbl1(rownum, fname, lname) values(1, 'joe', 'smith')
> insert into tbl1(rownum, fname, lname) values(2, 'joe', 'smith')
> insert into tbl1(rownum, fname, lname) values(3, 'joe', 'smith')
> insert into tbl1(rownum, fname, lname) values(4, 'joe', 'mon')
> insert into tbl1(rownum, fname, lname) values(5, 'tom', 'jones')
> insert into tbl1(rownum, fname, lname) values(6, 'tom', 'jones')
> insert into tbl1(rownum, fname, lname) values(7, 'sue', 'kent')
> insert into tbl1(rownum, fname, lname) values(8, 'sue', 'kent')
> insert into tbl1(rownum, fname, lname) values(9, 'sue', 'sue')
> insert into tbl1(rownum, fname, lname) values(10, 'mark', 'allen')
>
> SELECT t1.rownum, t1.fname, t1.lname, t1.yesno
> FROM tbl1 t1
> WHERE t1.fname In (SELECT t2.fname FROM tbl1 t2 GROUP BY t2.fname,
t2.lname
> HAVING Count(*)>1  And t2.lname = t1.lname)
>
> --this yields
> 1 joe smith NULL
> 2 joe smith NULL
> 3 joe smith NULL
> 5 tom jones NULL
> 6 tom jones NULL
> 7 sue kent NULL
> 8 sue kent NULL
>
> I want to mark rows 2, 3, 6, 8 with a 1;  I do not want to mark rows 1, 5,
> and 7.  Rows 1, 5, and 7 and the first rows in the duplicates.  How can I
do
> this?
>
> Thanks,
> Rich
>
>
Author
7 Jul 2005 4:58 PM
Stu
Try:

UPDATE t1
SET yesno = 1
FROM tbl1 t1
WHERE t1.rownum NOT IN (SELECT MIN(rownum)
            FROM tbl1 t1
            GROUP BY fname, lname)

HTH,
Stu
Author
7 Jul 2005 5:28 PM
Rich
Thank you all for your replies.  Problem solved.

Rich

Show quote
"Rich" wrote:

> I want to mark the bit field for only the duplicates of duplicate rows below:
>
> CREATE TABLE tbl1(
> rownum int,
> fname varchar(50),
> lname varchar(50),
> yesno bit
> )
>
> set nocount on
> insert into tbl1(rownum, fname, lname) values(1, 'joe', 'smith')
> insert into tbl1(rownum, fname, lname) values(2, 'joe', 'smith')
> insert into tbl1(rownum, fname, lname) values(3, 'joe', 'smith')
> insert into tbl1(rownum, fname, lname) values(4, 'joe', 'mon')
> insert into tbl1(rownum, fname, lname) values(5, 'tom', 'jones')
> insert into tbl1(rownum, fname, lname) values(6, 'tom', 'jones')
> insert into tbl1(rownum, fname, lname) values(7, 'sue', 'kent')
> insert into tbl1(rownum, fname, lname) values(8, 'sue', 'kent')
> insert into tbl1(rownum, fname, lname) values(9, 'sue', 'sue')
> insert into tbl1(rownum, fname, lname) values(10, 'mark', 'allen')
>
> SELECT t1.rownum, t1.fname, t1.lname, t1.yesno
> FROM tbl1 t1
> WHERE t1.fname In (SELECT t2.fname FROM tbl1 t2 GROUP BY t2.fname, t2.lname
> HAVING Count(*)>1  And t2.lname = t1.lname)
>
> --this yields
> 1    joe    smith    NULL
> 2    joe    smith    NULL
> 3    joe    smith    NULL
> 5    tom    jones    NULL
> 6    tom    jones    NULL
> 7    sue    kent    NULL
> 8    sue    kent    NULL
>
> I want to mark rows 2, 3, 6, 8 with a 1;  I do not want to mark rows 1, 5,
> and 7.  Rows 1, 5, and 7 and the first rows in the duplicates.  How can I do
> this?
>
> Thanks,
> Rich
>
>
Author
7 Jul 2005 6:18 PM
--CELKO--
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. This thing has no keys!  And since everything is
NULL-able, it cannot ever have a key.

And SQL does not have row numbers because there is no ordering to a
table; is this a column with a realllllly bad name?  Was it supposed to
be the key?  We also do not use bits in SQL; that is assemblyl
language.

Try something like this

SELECT first_name, last_name, MIN(row_nbr), COUNT(*)
  FROM Foobar
GROUP BY first_name, last_name;

then throw out your non-table and start over:

CREATE TABLE Foobar
(last_name VARCHAR(30) NOT NULL,
first name VARCHAR(30) NOT NULL,
occurrences INTEGER DEFAULT 1 NOT NULL
  CHECK (occurrences > 0),
PRIMARY KEY (last_name, first name ));
Author
7 Jul 2005 11:17 PM
Rich
I inherited the actual table which contains about 180 fields - and nothing -
no combination of uniqueness.  I can take it or leave it, but as long as I
get a paycheck - I take it.  As for the bit, If sql server provides it, it
must be there for a reason.  At least the creators of the table thought so. 
If you don't like a mess and politics, don't come work at my place.  When it
is high noon over here, and they say it is night time - IT's NIGHT TIME.  I
have to make due with what I have.

Rich

Show quote
"--CELKO--" wrote:

> Please post DDL, so that people do not have to guess what the keys,
> constraints, Declarative Referential Integrity, data types, etc. in
> your schema are. This thing has no keys!  And since everything is
> NULL-able, it cannot ever have a key.
>
> And SQL does not have row numbers because there is no ordering to a
> table; is this a column with a realllllly bad name?  Was it supposed to
> be the key?  We also do not use bits in SQL; that is assemblyl
> language.
>
> Try something like this
>
> SELECT first_name, last_name, MIN(row_nbr), COUNT(*)
>   FROM Foobar
>  GROUP BY first_name, last_name;
>
> then throw out your non-table and start over:
>
> CREATE TABLE Foobar
> (last_name VARCHAR(30) NOT NULL,
>  first name VARCHAR(30) NOT NULL,
>  occurrences INTEGER DEFAULT 1 NOT NULL
>   CHECK (occurrences > 0),
>  PRIMARY KEY (last_name, first name ));
>
>
Author
8 Jul 2005 6:48 PM
--CELKO--
>> I  inherited the actual table which contains about 180 fields - and nothing - no combination of uniqueness. <<

I'd think about using a file for staging this mess before it goes into
the database.  It is easier to scrub raw data at the bits and bytes
level outside of an RDBMS, sorting is faster, you have allo kidns of
scripting languages for edits, and it will not clog up the database
server.

>>  As for the bit, If sql server provides it, it must be there for a reason. <<

Historical reasons, like your appendix.  Ever work with assembly
language or C on UNIX about 30 years ago?

>> If you don't like a mess and politics, don't come work at my place. <<

Been there, done that.  Always keep an updated resume and a llist of
headhunter's email addresses on a diskette in such places.
Author
8 Jul 2005 3:55 PM
Rich
I see your point about referential integrity.  The thing with this table is
that it collects raw data on an unattended bassis from a free text source -
the goal being to put this free text data into a relational system.  So the
integrity of the data/pull is at the mercy of the elements (the network). 
The goal is to collect the data.  I have to massage the data afterwards when
we retrieve that data for usage.   The data usage tables do have referential
integrity and performance mechanisms in place, primary keys, relationships,
indexes, etc

Rich

Show quote
"--CELKO--" wrote:

> Please post DDL, so that people do not have to guess what the keys,
> constraints, Declarative Referential Integrity, data types, etc. in
> your schema are. This thing has no keys!  And since everything is
> NULL-able, it cannot ever have a key.
>
> And SQL does not have row numbers because there is no ordering to a
> table; is this a column with a realllllly bad name?  Was it supposed to
> be the key?  We also do not use bits in SQL; that is assemblyl
> language.
>
> Try something like this
>
> SELECT first_name, last_name, MIN(row_nbr), COUNT(*)
>   FROM Foobar
>  GROUP BY first_name, last_name;
>
> then throw out your non-table and start over:
>
> CREATE TABLE Foobar
> (last_name VARCHAR(30) NOT NULL,
>  first name VARCHAR(30) NOT NULL,
>  occurrences INTEGER DEFAULT 1 NOT NULL
>   CHECK (occurrences > 0),
>  PRIMARY KEY (last_name, first name ));
>
>

AddThis Social Bookmark Button