Home All Groups Group Topic Archive Search About

Better way to do a large delete?

Author
26 Jan 2006 9:40 PM
Nancy Lytle
I have to do a delete from a table with about 25 million rows. I need to
delete about 4 million rows, based on the date in another table

Table Session

SessionID                     SessionDate

1                                     1/25/2004

2                                     7/25/2001

I have created a view (SessionView) containing all sessionid's where the
sessiondate less than 3 years old.

(Create view SessionView As Select SessionID From Session Where
Session.sessiondate >= getdate() - 1095)

Table Claims

ClaimID             SessionID             etc             etc
etc

24578                     1                         15
HealthPractice             5554441234

6548975                 2                         52             ChirocPract
5551234567

What I need is to delete all rows from Table Claims that have sessionids in
SessionView.

Delete  From Claims   Where Claims.sessionID In (Select SessionID from
SessionView)

-In this case ClaimID 6548975 would be deleted but ClaimID 24578 would not.

But I must be doing something wrong (I am very new at SQL Server, and have
never done a large scale delete that wasn't just straight forward), because
the delete seems to take forever. Last time I had to stop it at just over 5
hours because the network admin needed to do some work on the server and
reboot it.

I have set the recovery to simple, and the server has no other active
databases or other applications running on it.

Any assistance appreciated.

Nancy Lytle

N_Ly***@terplaum.umd.edu

Author
26 Jan 2006 9:52 PM
Alexander Kuznetsov
you could also copy the rows you want to keep to another table, drop
the orginal one, and rename the copy
Author
27 Jan 2006 1:16 AM
Nancy Lytle
Thanks, I was trying to avoid that because I actually have about 10 tables
that I have to handle in a similiar fashion, and copying was also going
slowly.
I think I will end up using Dean's suggesting and chunking it out.
Thanks,
Nancy

Show quote
"Alexander Kuznetsov" <AK_TIREDOFSPAM@hotmail.COM> wrote in message
news:1138312336.363053.109410@o13g2000cwo.googlegroups.com...
> you could also copy the rows you want to keep to another table, drop
> the orginal one, and rename the copy
>
Author
26 Jan 2006 9:54 PM
Dean
nancy,

try deleting in smaller chunks, say 1000 rows each. this should help.

dean

Show quote
"Nancy Lytle" <word_d***@hotmail.com> wrote in message
news:er8ntEsIGHA.2668@tk2msftngp13.phx.gbl...
>
> I have to do a delete from a table with about 25 million rows. I need to
> delete about 4 million rows, based on the date in another table
>
> Table Session
>
> SessionID                     SessionDate
>
> 1                                     1/25/2004
>
> 2                                     7/25/2001
>
> I have created a view (SessionView) containing all sessionid's where the
> sessiondate less than 3 years old.
>
> (Create view SessionView As Select SessionID From Session Where
> Session.sessiondate >= getdate() - 1095)
>
> Table Claims
>
> ClaimID             SessionID             etc             etc etc
>
> 24578                     1                         15 HealthPractice
> 5554441234
>
> 6548975                 2                         52
> ChirocPract 5551234567
>
> What I need is to delete all rows from Table Claims that have sessionids
> in SessionView.
>
> Delete  From Claims   Where Claims.sessionID In (Select SessionID from
> SessionView)
>
> -In this case ClaimID 6548975 would be deleted but ClaimID 24578 would
> not.
>
> But I must be doing something wrong (I am very new at SQL Server, and have
> never done a large scale delete that wasn't just straight forward),
> because the delete seems to take forever. Last time I had to stop it at
> just over 5 hours because the network admin needed to do some work on the
> server and reboot it.
>
> I have set the recovery to simple, and the server has no other active
> databases or other applications running on it.
>
> Any assistance appreciated.
>
> Nancy Lytle
>
> N_Ly***@terplaum.umd.edu
>
>
Author
27 Jan 2006 1:17 AM
Nancy Lytle
I think this is the way I will have to go, so I will just use smaller date
intervals, until I get everything deleted.
Thanks,
Nancy
Show quote
"Dean" <dvitner@nospam.gmail.com> wrote in message
news:OvslVLsIGHA.2828@TK2MSFTNGP12.phx.gbl...
> nancy,
>
> try deleting in smaller chunks, say 1000 rows each. this should help.
>
> dean
>
> "Nancy Lytle" <word_d***@hotmail.com> wrote in message
> news:er8ntEsIGHA.2668@tk2msftngp13.phx.gbl...
>>
>> I have to do a delete from a table with about 25 million rows. I need to
>> delete about 4 million rows, based on the date in another table
>>
>> Table Session
>>
>> SessionID                     SessionDate
>>
>> 1                                     1/25/2004
>>
>> 2                                     7/25/2001
>>
>> I have created a view (SessionView) containing all sessionid's where the
>> sessiondate less than 3 years old.
>>
>> (Create view SessionView As Select SessionID From Session Where
>> Session.sessiondate >= getdate() - 1095)
>>
>> Table Claims
>>
>> ClaimID             SessionID             etc             etc etc
>>
>> 24578                     1                         15 HealthPractice
>> 5554441234
>>
>> 6548975                 2                         52 ChirocPract
>> 5551234567
>>
>> What I need is to delete all rows from Table Claims that have sessionids
>> in SessionView.
>>
>> Delete  From Claims   Where Claims.sessionID In (Select SessionID from
>> SessionView)
>>
>> -In this case ClaimID 6548975 would be deleted but ClaimID 24578 would
>> not.
>>
>> But I must be doing something wrong (I am very new at SQL Server, and
>> have never done a large scale delete that wasn't just straight forward),
>> because the delete seems to take forever. Last time I had to stop it at
>> just over 5 hours because the network admin needed to do some work on the
>> server and reboot it.
>>
>> I have set the recovery to simple, and the server has no other active
>> databases or other applications running on it.
>>
>> Any assistance appreciated.
>>
>> Nancy Lytle
>>
>> N_Ly***@terplaum.umd.edu
>>
>>
>
>
Author
26 Jan 2006 10:08 PM
Patrik
Another thing could be to drop all indexes, delete your rows, recreate
the indexes.

Its all about how much you can do with the table. Do you have a
maintenance window where no one are using the database or is it always
online?
Author
26 Jan 2006 11:38 PM
Erland Sommarskog
Nancy Lytle (word_d***@hotmail.com) writes:
Show quote
> I have created a view (SessionView) containing all sessionid's where the
> sessiondate less than 3 years old.
>
> (Create view SessionView As Select SessionID From Session Where
> Session.sessiondate >= getdate() - 1095)
>...
> What I need is to delete all rows from Table Claims that have sessionids
> in SessionView.
>
> Delete  From Claims   Where Claims.sessionID In (Select SessionID from
> SessionView)
>
> -In this case ClaimID 6548975 would be deleted but ClaimID 24578 would
> not.
>
> But I must be doing something wrong (I am very new at SQL Server, and
> have never done a large scale delete that wasn't just straight forward),
> because the delete seems to take forever. Last time I had to stop it at
> just over 5 hours because the network admin needed to do some work on
> the server and reboot it.

Rather than creating a view, insert all the sessionids to delete into
a temp table.

You could also try this syntax:

   DELETE Claims
   FROM   Claims c
   JOIN   #temp t ON c.session_id = t.session_id

If Claims does not have its clustered index on SessionId, it could be
worth to do

   CREATE CLUSTERED INDEX session_id_ix ON Claims(session_id)
   WITH DROP_EXISTING

for this task, and then restore the original index.


--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
27 Jan 2006 2:10 PM
JT
When bulk deleting 4 million rows, the slowness you are experiencing
probably has more to do with transaction logging than anything else. I
assume this must a one time purge, but if it is a recurring process, then
you will probably want to reconsider your data model and workflow.

Start by dropping any indexes from the table that would not assist in
qualifying the rows needed for deletion. For example, if you are deleting
based on transaction date, then you would not need to retain an index on
LastName or AccountType. This will reduce the amount of transaction logging
during the process, and besides, all the indexes will need to be
defragmented after the process has completed anyway.

You can try creating a loop and delete the rows based on a date column,
however, this usally results in batches that vary greatly in size. For
example, your company have a lot more transactions in 2005 than in 2004.
Using the technique below, you can delete in batches of exactly 1000 rows.

set rowcount 1000
while
    delete from mytable where . . .
    if @@rowcount = 0 break
    checkpoint
end


Show quote
"Nancy Lytle" <word_d***@hotmail.com> wrote in message
news:er8ntEsIGHA.2668@tk2msftngp13.phx.gbl...
>
> I have to do a delete from a table with about 25 million rows. I need to
> delete about 4 million rows, based on the date in another table
>
> Table Session
>
> SessionID                     SessionDate
>
> 1                                     1/25/2004
>
> 2                                     7/25/2001
>
> I have created a view (SessionView) containing all sessionid's where the
> sessiondate less than 3 years old.
>
> (Create view SessionView As Select SessionID From Session Where
> Session.sessiondate >= getdate() - 1095)
>
> Table Claims
>
> ClaimID             SessionID             etc             etc etc
>
> 24578                     1                         15 HealthPractice
> 5554441234
>
> 6548975                 2                         52
> ChirocPract 5551234567
>
> What I need is to delete all rows from Table Claims that have sessionids
> in SessionView.
>
> Delete  From Claims   Where Claims.sessionID In (Select SessionID from
> SessionView)
>
> -In this case ClaimID 6548975 would be deleted but ClaimID 24578 would
> not.
>
> But I must be doing something wrong (I am very new at SQL Server, and have
> never done a large scale delete that wasn't just straight forward),
> because the delete seems to take forever. Last time I had to stop it at
> just over 5 hours because the network admin needed to do some work on the
> server and reboot it.
>
> I have set the recovery to simple, and the server has no other active
> databases or other applications running on it.
>
> Any assistance appreciated.
>
> Nancy Lytle
>
> N_Ly***@terplaum.umd.edu
>
>

AddThis Social Bookmark Button