Home All Groups Group Topic Archive Search About

Trying to insert a LOT of records

Author
16 Dec 2005 4:07 PM
David Jessee
I have a table that has 3 fields..there's a clustered index on all 3
fields.....

The data in this table can change a lot depending on the changing state of
the system.  For performance purposes, I can't have a bunch of triggers
throughout the system that fire every time the DB state changes in such a way
that the data in this table becomes invalid...so I have to use a batch
process.

I'm not using DTS to execute the batch process...when I tried, it took 40
minutes to update the data in the table....that's too long.  So, I have a
..NET app that runs the update code asynchronously on 8 threads...the time
gets reduced to 5 minutes....life is good.

But I can't help but wonder if there's a better way to approach what I'm
doing from a SQL side to increase the efficiency.

When I update the data in the table, I delete a block of data and then
re-insert it.  There are times when, between the insert and update, a record
might be inserted that would cause a PK constraint error. so just doing a
insert...select won't work because the entire insert statement is aborted,
not just the one erroneous record.  I have to create a cursor, fetch through
it and then check each record then insert it...performance NIGHTMARE!!!

If I were importing a text file, I could do a bulk load and tell the thing
to ignore any errors...because if there's an error, its because the record
already exists....but you can't do a bulk load from a result set, can you??

I need some help here....is there a way to insert a result set in a way that
SQL ignore the erroneous rows and accepts the rows that can be inserted?

Author
16 Dec 2005 4:41 PM
Trey Walpole
see [NOT] EXISTS

David Jessee wrote:
Show quote
> I have a table that has 3 fields..there's a clustered index on all 3
> fields.....
>
> The data in this table can change a lot depending on the changing state of
> the system.  For performance purposes, I can't have a bunch of triggers
> throughout the system that fire every time the DB state changes in such a way
> that the data in this table becomes invalid...so I have to use a batch
> process.
>
> I'm not using DTS to execute the batch process...when I tried, it took 40
> minutes to update the data in the table....that's too long.  So, I have a
> .NET app that runs the update code asynchronously on 8 threads...the time
> gets reduced to 5 minutes....life is good.
>
> But I can't help but wonder if there's a better way to approach what I'm
> doing from a SQL side to increase the efficiency.
>
> When I update the data in the table, I delete a block of data and then
> re-insert it.  There are times when, between the insert and update, a record
> might be inserted that would cause a PK constraint error. so just doing a
> insert...select won't work because the entire insert statement is aborted,
> not just the one erroneous record.  I have to create a cursor, fetch through
> it and then check each record then insert it...performance NIGHTMARE!!!
>
> If I were importing a text file, I could do a bulk load and tell the thing
> to ignore any errors...because if there's an error, its because the record
> already exists....but you can't do a bulk load from a result set, can you??
>
> I need some help here....is there a way to insert a result set in a way that
> SQL ignore the erroneous rows and accepts the rows that can be inserted?
Author
16 Dec 2005 5:00 PM
David Jessee
That's how I'm checking to see f the records I'm cursoring through already
exist.
Is there a yaw where I don't have to do that?

Show quote
"Trey Walpole" wrote:

> see [NOT] EXISTS
>
> David Jessee wrote:
> > I have a table that has 3 fields..there's a clustered index on all 3
> > fields.....
> >
> > The data in this table can change a lot depending on the changing state of
> > the system.  For performance purposes, I can't have a bunch of triggers
> > throughout the system that fire every time the DB state changes in such a way
> > that the data in this table becomes invalid...so I have to use a batch
> > process.
> >
> > I'm not using DTS to execute the batch process...when I tried, it took 40
> > minutes to update the data in the table....that's too long.  So, I have a
> > .NET app that runs the update code asynchronously on 8 threads...the time
> > gets reduced to 5 minutes....life is good.
> >
> > But I can't help but wonder if there's a better way to approach what I'm
> > doing from a SQL side to increase the efficiency.
> >
> > When I update the data in the table, I delete a block of data and then
> > re-insert it.  There are times when, between the insert and update, a record
> > might be inserted that would cause a PK constraint error. so just doing a
> > insert...select won't work because the entire insert statement is aborted,
> > not just the one erroneous record.  I have to create a cursor, fetch through
> > it and then check each record then insert it...performance NIGHTMARE!!!
> >
> > If I were importing a text file, I could do a bulk load and tell the thing
> > to ignore any errors...because if there's an error, its because the record
> > already exists....but you can't do a bulk load from a result set, can you??
> >
> > I need some help here....is there a way to insert a result set in a way that
> > SQL ignore the erroneous rows and accepts the rows that can be inserted?
>
Author
16 Dec 2005 6:02 PM
Trey Walpole
not sure why you'd need a cursor...

insert into target_table (<column list>)
select <column list>
from <source tables>
where <batch limiting where clause>
AND NOT EXISTS (
    select *
    from target_table
    where pkcol1=<source table>.collated_column1
    and pkcol2=<source table>.collated_column2
    and pkcol3=<source table>.collated_column3
)



David Jessee wrote:
Show quote
> That's how I'm checking to see f the records I'm cursoring through already
> exist.
> Is there a yaw where I don't have to do that?
>
> "Trey Walpole" wrote:
>
>
>>see [NOT] EXISTS
>>
>>David Jessee wrote:
>>
>>>I have a table that has 3 fields..there's a clustered index on all 3
>>>fields.....
>>>
>>>The data in this table can change a lot depending on the changing state of
>>>the system.  For performance purposes, I can't have a bunch of triggers
>>>throughout the system that fire every time the DB state changes in such a way
>>>that the data in this table becomes invalid...so I have to use a batch
>>>process.
>>>
>>>I'm not using DTS to execute the batch process...when I tried, it took 40
>>>minutes to update the data in the table....that's too long.  So, I have a
>>>.NET app that runs the update code asynchronously on 8 threads...the time
>>>gets reduced to 5 minutes....life is good.
>>>
>>>But I can't help but wonder if there's a better way to approach what I'm
>>>doing from a SQL side to increase the efficiency.
>>>
>>>When I update the data in the table, I delete a block of data and then
>>>re-insert it.  There are times when, between the insert and update, a record
>>>might be inserted that would cause a PK constraint error. so just doing a
>>>insert...select won't work because the entire insert statement is aborted,
>>>not just the one erroneous record.  I have to create a cursor, fetch through
>>>it and then check each record then insert it...performance NIGHTMARE!!!
>>>
>>>If I were importing a text file, I could do a bulk load and tell the thing
>>>to ignore any errors...because if there's an error, its because the record
>>>already exists....but you can't do a bulk load from a result set, can you??
>>>
>>>I need some help here....is there a way to insert a result set in a way that
>>>SQL ignore the erroneous rows and accepts the rows that can be inserted?
>>
Author
16 Dec 2005 6:23 PM
JT
Are you basically inserting new data or just updating existing data in an
indirect way? It sounds like perhaps you need to be performing updates,
rather than selecting out, deleting, and then re-inserting. This would
require less I/O and mimimize the time window where conflicting rows can be
inserted by other applications.

Show quote
"David Jessee" <DavidJes***@discussions.microsoft.com> wrote in message
news:22F9D3D6-AAB4-440A-832C-B2E53B2B484D@microsoft.com...
>I have a table that has 3 fields..there's a clustered index on all 3
> fields.....
>
> The data in this table can change a lot depending on the changing state of
> the system.  For performance purposes, I can't have a bunch of triggers
> throughout the system that fire every time the DB state changes in such a
> way
> that the data in this table becomes invalid...so I have to use a batch
> process.
>
> I'm not using DTS to execute the batch process...when I tried, it took 40
> minutes to update the data in the table....that's too long.  So, I have a
> .NET app that runs the update code asynchronously on 8 threads...the time
> gets reduced to 5 minutes....life is good.
>
> But I can't help but wonder if there's a better way to approach what I'm
> doing from a SQL side to increase the efficiency.
>
> When I update the data in the table, I delete a block of data and then
> re-insert it.  There are times when, between the insert and update, a
> record
> might be inserted that would cause a PK constraint error. so just doing a
> insert...select won't work because the entire insert statement is aborted,
> not just the one erroneous record.  I have to create a cursor, fetch
> through
> it and then check each record then insert it...performance NIGHTMARE!!!
>
> If I were importing a text file, I could do a bulk load and tell the thing
> to ignore any errors...because if there's an error, its because the record
> already exists....but you can't do a bulk load from a result set, can
> you??
>
> I need some help here....is there a way to insert a result set in a way
> that
> SQL ignore the erroneous rows and accepts the rows that can be inserted?

AddThis Social Bookmark Button