Home All Groups Group Topic Archive Search About

Deleting Duplicate Data

Author
1 Sep 2005 11:22 AM
Ghulam Farid
Hi to All!

I have a table with 60 columns and more than one million rows. i have to
implement composite primary key but it gives me error of duplicate data.
i have used following query to detect the duplicate rows

select NID,output_No from tbl_Data
group by NID,output_No
having count(*) > 1

it gives me 2526 duplicate dows. Now i wants to delete the duplicate
rows what will be the query for deleting the duplicate records.


Thanx



*** Sent via Developersdex http://www.developersdex.com ***

Author
1 Sep 2005 11:30 AM
Uri Dimant
This script has written by Itzik Ben-Gan
CREATE TABLE #Demo (
    idNo int identity(1,1),
    colA int,
    colB int
)

INSERT INTO #Demo(colA,colB) VALUES (1,6)
INSERT INTO #Demo(colA,colB) VALUES (1,6)
INSERT INTO #Demo(colA,colB) VALUES (2,4)
INSERT INTO #Demo(colA,colB) VALUES (3,3)
INSERT INTO #Demo(colA,colB) VALUES (4,2)
INSERT INTO #Demo(colA,colB) VALUES (3,3)
INSERT INTO #Demo(colA,colB) VALUES (5,1)
INSERT INTO #Demo(colA,colB) VALUES (8,1)

PRINT 'Table'
SELECT * FROM #Demo

PRINT 'Duplicates in Table'
SELECT * FROM #Demo
WHERE idNo IN
  (SELECT B.idNo
     FROM #Demo A JOIN #Demo B
       ON A.idNo <> B.idNo
      AND A.colA = B.colA
      AND A.colB = B.colB)

PRINT 'Duplicates to Delete'
SELECT * FROM #Demo
WHERE idNo IN
  (SELECT B.idNo
     FROM #Demo A JOIN #Demo B
       ON A.idNo < B.idNo -- < this time, not <>
      AND A.colA = B.colA
      AND A.colB = B.colB)

DELETE FROM #Demo
WHERE idNo IN
  (SELECT B.idNo
     FROM #Demo A JOIN #Demo B
       ON A.idNo < B.idNo -- < this time, not <>
      AND A.colA = B.colA
      AND A.colB = B.colB)

PRINT 'Cleaned-up Table'
SELECT * FROM #Demo

DROP TABLE #Demo

Show quote
"Ghulam Farid" <gfa***@yahoo.com> wrote in message
news:umVO5durFHA.3444@TK2MSFTNGP12.phx.gbl...
>
>
> Hi to All!
>
> I have a table with 60 columns and more than one million rows. i have to
> implement composite primary key but it gives me error of duplicate data.
> i have used following query to detect the duplicate rows
>
> select NID,output_No from tbl_Data
> group by NID,output_No
> having count(*) > 1
>
> it gives me 2526 duplicate dows. Now i wants to delete the duplicate
> rows what will be the query for deleting the duplicate records.
>
>
> Thanx
>
>
>
> *** Sent via Developersdex http://www.developersdex.com ***
Author
1 Sep 2005 1:08 PM
Brian Selzer
Try this:

(1) SELECT columnList INTO workTable FROM tableName GROUP BY keyColumns
HAVING COUNT(*) > 1
(2) DELETE tableName FROM workTable WHERE tableName.keyColumns =
workTable.keyColumns
(3) INSERT tableName (columnList) SELECT columnList FROM workTable
(4) DROP workTable

You might want to wrap this in a transaction, but if you don't then a temp
table for the work table is contraindicated because if power goes out
between steps 2 and 3, you will lose the duplicate rows altogether.

If the table were tiny, you could use something like:

SET ROWCOUNT 1
AGAIN:
DELETE tableName FROM (SELECT keyColumns FROM tableName GROUP BY keyColumns
HAVING COUNT(*) > 1) a WHERE tableName.keyColumns = a.keyColumns
IF @@ROWCOUNT > 0 GOTO AGAIN
SET ROWCOUNT 0


Show quote
"Ghulam Farid" <gfa***@yahoo.com> wrote in message
news:umVO5durFHA.3444@TK2MSFTNGP12.phx.gbl...
>
>
> Hi to All!
>
> I have a table with 60 columns and more than one million rows. i have to
> implement composite primary key but it gives me error of duplicate data.
> i have used following query to detect the duplicate rows
>
> select NID,output_No from tbl_Data
> group by NID,output_No
> having count(*) > 1
>
> it gives me 2526 duplicate dows. Now i wants to delete the duplicate
> rows what will be the query for deleting the duplicate records.
>
>
> Thanx
>
>
>
> *** Sent via Developersdex http://www.developersdex.com ***

AddThis Social Bookmark Button