Home All Groups Group Topic Archive Search About

best way to bulk update

Author
11 Sep 2006 12:02 AM
kenimojo
Hi I am new to sql server 2005.
I have a flat file with about 10 million rows of data which I need to
use to update the values in a table every day. The table has a single
clustered index which is the column ID. I do not need to insert new
data, just update the data on this table. The file has about 10 columns
with tab separators. I just need someone to tell me what I should look
into or what is the best way to accomplish this task?

Thank you.

Author
11 Sep 2006 12:13 AM
Hari Prasad
Hi,

1. Use BULK insert with  BATCHSIZE option or BCP IN with -b option to load
the data into temporary Table (Real table)
2. Create a Index on key column in temp table
3. Using the primary key to update the required columns in batches..Use
while loop
4. Truncate the temp table...
5. You could schedule this a SQL Agent job

Thanks
Hari
SQL Server MVP

<kenim***@gmail.com> wrote in message
Show quote
news:1157932970.916946.152660@i3g2000cwc.googlegroups.com...
> Hi I am new to sql server 2005.
> I have a flat file with about 10 million rows of data which I need to
> use to update the values in a table every day. The table has a single
> clustered index which is the column ID. I do not need to insert new
> data, just update the data on this table. The file has about 10 columns
> with tab separators. I just need someone to tell me what I should look
> into or what is the best way to accomplish this task?
>
> Thank you.
>
Author
11 Sep 2006 1:08 AM
Steve Kass
You haven't really said enough to allow too much of a
recommendation, since we don't know how the "column ID"
is related to any of the information in the flat file. However,
I think it's likely you will need to import the contents of the
entire file into a table separate from the one you are updating.

BULK INSERT and bcp are very fast ways to do this.  (I
suppose SQL Server Integration Services also is, but I have
not used it much.). How you update the big table once you
import the data is something you'll still have to figure out - what
indexes are best, or whether you should break the update up into
batches, depend on things we don't know from your post.

-- Steve Kass
-- Drew University
-- http://www.stevekass.com

kenim***@gmail.com wrote:

Show quote
>Hi I am new to sql server 2005.
>I have a flat file with about 10 million rows of data which I need to
>use to update the values in a table every day. The table has a single
>clustered index which is the column ID. I do not need to insert new
>data, just update the data on this table. The file has about 10 columns
>with tab separators. I just need someone to tell me what I should look
>into or what is the best way to accomplish this task?
>
>Thank you.
>

>
Author
11 Sep 2006 5:20 AM
kenimojo
Thanks so much for your answers. It helps me a lot. I will try bcp.

The key column is the cluster index, and it is one of the columns in
the flat file. Simple table, no triggers.

Why would I have to update in batches doing the while loop, if I am not
keeping a log file (simple recovery)? Is there any advantage in speed?

Do I create an index the temp table after or before I insert the data
into it ?

Thanks.
Author
11 Sep 2006 7:26 AM
Erland Sommarskog
(kenim***@gmail.com) writes:
> Why would I have to update in batches doing the while loop, if I am not
> keeping a log file (simple recovery)? Is there any advantage in speed?

Actually, batchings makes more sense with simple recovery than with full
recovery. You may have space enough for your log file, but if you want to
keep it down in size, batching helps. If you update all in one go, the
entire transaction must be logged, so it can be rolled back.

As for speed, the best is to benchmark on your own. But generally, if it
works OK with one big update, go for that.


--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
11 Sep 2006 6:42 PM
Tracy McKibben
kenim***@gmail.com wrote:
> Thanks so much for your answers. It helps me a lot. I will try bcp.
>
> The key column is the cluster index, and it is one of the columns in
> the flat file. Simple table, no triggers.
>
> Why would I have to update in batches doing the while loop, if I am not
> keeping a log file (simple recovery)? Is there any advantage in speed?
>
> Do I create an index the temp table after or before I insert the data
> into it ?
>
> Thanks.
>

Because even in Simple mode, the transaction log still has to be large
enough to hold the rollback info for the transaction.  A 10-million row
transaction will be pretty big.  See
http://realsqlguy.com/serendipity/archives/14-When-Is-A-Transaction-Log-Not-A-Transaction-Log.html


--
Tracy McKibben
MCDBA
http://www.realsqlguy.com
Author
11 Sep 2006 1:14 PM
Alexander Kuznetsov
How many rows does the target table have? It's important - if you
typically update a significant share of the rows in your target table,
you might be better off if you:
1. Create a new table and populate it with modified values.
2. Drop the old table.
3. Rename the new one.

This approach may be faster, and your brand new table will have no
fragmentation.

AddThis Social Bookmark Button