Home All Groups Group Topic Archive Search About
Author
3 Aug 2006 9:33 PM
Curtis
Is it possible to insert records into the database from a CLR function?

I need to insert thousands of records based on For Each loops. I have my
function done except for the inserting part. I was hoping to complete the
task inside SQL, so I can schedule the task using Agent.

If I can't construct the insert statement inside my function is there any
other way to do it?

Author
3 Aug 2006 10:20 PM
Erland Sommarskog
Curtis (Cur***@discussions.microsoft.com) writes:
> Is it possible to insert records into the database from a CLR function?
>
> I need to insert thousands of records based on For Each loops. I have my
> function done except for the inserting part. I was hoping to complete the
> task inside SQL, so I can schedule the task using Agent.
>
> If I can't construct the insert statement inside my function is there any
> other way to do it?

You can't insert data from a function. Functions are for data-retrieval
only, and are not permitted to change database state.

But you could insert data from a stored procedure in the CLR. However,
I get a little worrried when you say For Each loops. That is not the
most effective way to do it. Try to insert all at once. This can be
done by creating an XML document that you shred with .nodes(). Or
you can generate an INSERT statement that goes:

    INSERT tbl (col1, col2, col3, ...)
       EXEC ('SELECT 1, ''AD'', 3, ...
              SELECT 12, ''PO'', 24, ...
              ...')


From where does the data come? Since you plan to run it from Agent,
it does not sound like comes from the outside (although it's possible
to do external access from a CLR procedure).


--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
3 Aug 2006 10:53 PM
Curtis
The data is coming from tables in the database.

Here is my code:
        Dim ClientID As Integer
        Dim ScanJobHeaderID As Integer
        Dim SearchEngineGroupID As Integer
        Dim SearchEngineID As Integer
        Dim PageSyntaxID As Integer
        Dim EndJob As Integer
        Dim ScanJobHeader As New DataTable
        Dim SearchEngine As New DataTable
        Dim KeywordSearch As New DataTable
        Do
            ScanJobHeader.Rows.Clear()
            SearchEngine.Rows.Clear()
            KeywordSearch.Rows.Clear()
            'getScanJobHeader()
            Using sql As New SqlConnection("context connection=true")

                'Dump Scan Job Header info into datatable
                '>ClientID
                '>ScanJobJeaderID
                '>SearchEngineGroupID

                Using adapterScanJobHeader As New SqlDataAdapter
                    Using selectScanJobHeader As SqlCommand = New
SqlCommand("Select * From dbo.getScanJobHeader( )", sql)
                        sql.Open()
                        selectScanJobHeader.Connection = sql

                        adapterScanJobHeader.SelectCommand =
selectScanJobHeader

                        adapterScanJobHeader.Fill(ScanJobHeader)

                    End Using
                End Using



                For Each dr As DataRow In ScanJobHeader.Rows
                    If Not dr.IsNull(0) Then
                        ClientID = ScanJobHeader.Rows(0).Item(0)
                        ScanJobHeaderID = ScanJobHeader.Rows(0).Item(1)
                        SearchEngineGroupID = ScanJobHeader.Rows(0).Item(2)
                        EndJob = 1

                        Dim i As Integer = 0
                        Dim j As Integer = 0
                        'Dump Search Engine info into datatable
                        '>SearchEngineID
                        '>PageSyntaxID

                        Using selectSearchEngine As SqlCommand = New
SqlCommand("Select * From dbo.getSearchEngineID(" & SearchEngineGroupID &
",1)", sql)

                            Dim adapterSearchEngine As New SqlDataAdapter
                            adapterSearchEngine.SelectCommand =
selectSearchEngine
                            adapterSearchEngine.Fill(SearchEngine)
                        End Using
                        'Dump Keyword info into datatable
                        '>KeywordSearchID

                        Using selectKeywordSearch As SqlCommand = New
SqlCommand("Select * From dbo.getKeywordSearchID(" & ClientID & ")", sql)
                            Dim adapterKeywordSearch As New SqlDataAdapter
                            adapterKeywordSearch.SelectCommand =
selectKeywordSearch
                            adapterKeywordSearch.Fill(KeywordSearch)
                        End Using
                        If SearchEngine.Rows.Count > 0 Then
                            For Each dr1 As DataRow In SearchEngine.Rows
                                SearchEngineID = SearchEngine.Rows(i).Item(0)
                                PageSyntaxID = SearchEngine.Rows(i).Item(1)
                                For Each dr2 As DataRow In KeywordSearch.Rows

                                    Dim insertCommand As SqlCommand = New
SqlCommand("exec dbo.insertBillingScanJobDetails 1, 1, 1, 3", sql)
                                    insertCommand.Connection = sql
                                    insertCommand.ExecuteNonQuery()

                                    j += 1
                                Next

                                i += 1
                                j = 0

                            Next
                        End If
                        i = 0
                    Else
                        EndJob = 0
                    End If
                Next



            End Using
        Loop Until EndJob = 0

I tried executing a stored procedure to insert the records, but I get:
Invalid use of side-effecting or time-dependent operator in 'INSERT' within
a function.

Based on my code do you still think looping is not the best option?

Could you explain more on how I would create the XML document? I assume I
would pass the XML as a parameter to the insert procedure and then use a
Select statement on the XML to insert the records into the database. Is it
possible to send the XML without creating a file on the server?

Show quote
"Erland Sommarskog" wrote:

> Curtis (Cur***@discussions.microsoft.com) writes:
> > Is it possible to insert records into the database from a CLR function?
> >
> > I need to insert thousands of records based on For Each loops. I have my
> > function done except for the inserting part. I was hoping to complete the
> > task inside SQL, so I can schedule the task using Agent.
> >
> > If I can't construct the insert statement inside my function is there any
> > other way to do it?

> You can't insert data from a function. Functions are for data-retrieval
> only, and are not permitted to change database state.
>
> But you could insert data from a stored procedure in the CLR. However,
> I get a little worrried when you say For Each loops. That is not the
> most effective way to do it. Try to insert all at once. This can be
> done by creating an XML document that you shred with .nodes(). Or
> you can generate an INSERT statement that goes:
>
>     INSERT tbl (col1, col2, col3, ...)
>        EXEC ('SELECT 1, ''AD'', 3, ...
>               SELECT 12, ''PO'', 24, ...
>               ...')
>           
>
> From where does the data come? Since you plan to run it from Agent,
> it does not sound like comes from the outside (although it's possible
> to do external access from a CLR procedure).
>
>
> --
> Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se
>
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
>
Author
4 Aug 2006 7:24 AM
Erland Sommarskog
Curtis (Cur***@discussions.microsoft.com) writes:
> The data is coming from tables in the database.

If you are to insert table into one table from other tables in the
database, then you should perform the selection and insertion of the
data from a stored procedure in T-SQL. If there is some complex processing,
it may be worth using a CLR function for that.

Permit me a few more comments on your code:

>                     Using selectScanJobHeader As SqlCommand = New
> SqlCommand("Select * From dbo.getScanJobHeader( )", sql)

Never use SELECT * in production code, but explicitly list the columns
you need. This makes tracking easier, is there is a need to change to
change the function/table/view that you are selecting from.

>                         Using selectKeywordSearch As SqlCommand = New
> SqlCommand("Select * From dbo.getKeywordSearchID(" & ClientID & ")", sql)

Always use parameterised commands, never interpolate parameter values
directly. This is a basic protection against SQL injection, and it makes
more effecient use of the plan cache in SQL Server. See the section
http://www.sommarskog.se/dynamic_sql.html#queryplans, which also has an
example of a parameterised command with SqlClient.

>                                     Dim insertCommand As SqlCommand = New
> SqlCommand("exec dbo.insertBillingScanJobDetails 1, 1, 1, 3", sql)

When calling stored procedures you should use CommandType.StoredProcedure
and not use EXEC statements for efficientcy.

> I tried executing a stored procedure to insert the records, but I get:
> Invalid use of side-effecting or time-dependent operator in 'INSERT'
> within a function.

Yes, you need to use a stored procedure to perform updates. And that
you *can* do in the CLR, but I think it's a poor design.

> Based on my code do you still think looping is not the best option?

I was not able to grasp if the code was doing anything in particular than
just moving the data, but I did not analyse it in detail. But I'm fairly
confident that it can all be done in T-SQL.

> Could you explain more on how I would create the XML document? I assume I
> would pass the XML as a parameter to the insert procedure and then use a
> Select statement on the XML to insert the records into the database. Is it
> possible to send the XML without creating a file on the server?

You don't need to create a file, no. I don't really know who to create
XML documents from data, as I am an SQL programmer, not a .Net person.
But our GUI developers frequently employ this technique when they want
to insert lots of data in one go. And they are even using VB6.

Anyway using XML would be an overkill in this case, as the entire CLR
thing is an overkill.


--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
8 Aug 2006 10:01 PM
ML
Instead of posting code you should first explain what you actually need. And
as Erland already mentioned - you can't use SQL functions to alter database
state (e.g. modify data in tables).

As far as I can see a set-based solution is possible here or am I missing
something?

Anyway, it's difficult to say based on your post(s), so provide proper DDL,
sample data and expected results to get a better answer.


ML

---
http://milambda.blogspot.com/
Author
10 Aug 2006 8:25 PM
Curtis
Let me see if I can explain what I am trying to do effectively.

TableA
ClientID int,
ScanJobHeaderID int,
SearchEngineGroupID int

TableB
SearchEngineID int,
PageSyntaxID int

TableC
KeywordSearchID int

TableD
KeywordSearchID int,
SearchEngineID int,
PageSyntaxID int

TableE
ScanJobHeaderID int,
TableDID int

Query 1:
Select ClientID, ScanJobHeaderID, SearchEngineGroupID from TableA
Returns 1 Row

Query 2:
Select SearchEngineID, PageSyntaxID from TableB based on the
SearchEngineGroupID
Returns multiple rows

Query 3:
Select ClientID from TableC based on the ClientID
Returns multiple rows

Do
Query 1

For each row in Query 2
i <advance to the next row each time and assign the correct value to be used
in the insert statement>

For each row in Query 3
j <advance to the next row each time and assign the correct value to be
used in the insert statement>

Insert KeywordSearchID, SearchEngineID, PageSyntaxID, ScanJobHeaderID
j +=1
Next
1+=1
Next

Loop

I would like to do this in SQL, so I can trigger it when a new record is
entered into Table A. I use Scop_Identity() that is used in TableE

The result is over 1 million records entered into TableD and TableE

Show quote
"ML" wrote:

> Instead of posting code you should first explain what you actually need. And
> as Erland already mentioned - you can't use SQL functions to alter database
> state (e.g. modify data in tables).
>
> As far as I can see a set-based solution is possible here or am I missing
> something?
>
> Anyway, it's difficult to say based on your post(s), so provide proper DDL,
> sample data and expected results to get a better answer.
>
>
> ML
>
> ---
> http://milambda.blogspot.com/
Author
10 Aug 2006 8:56 PM
ML
You're either not thinking in sets or I'm missing something.

For-each row in set one do a transformation in sets two and three? That's
what JOINs are for.

We can help you better if you follow these instructions:
http://www.aspfaq.com/etiquette.asp?id=5006

Anyway, what you need to do is something like this:

1) create one query where TableA and TableB are joined on the common column
(SearchEngineGroupID) and then use values from the tables to come up with one
set of rows to insert;

2) do pretty much the same for TableA joined to TableC on the common column
(ClientID) to come up with another set of rows.

OR:

Join all three tables together on appropriate columns to come up with a
single set of rows.


This is SQL - we carry water in buckets, not spoons.


ML

---
http://milambda.blogspot.com/
Author
10 Aug 2006 10:33 PM
Erland Sommarskog
Curtis (Cur***@discussions.microsoft.com) writes:
> Query 1:
> Select ClientID, ScanJobHeaderID, SearchEngineGroupID from TableA
> Returns 1 Row
>
> Query 2:
> Select SearchEngineID, PageSyntaxID from TableB based on the
> SearchEngineGroupID
> Returns multiple rows
>
> Query 3:
> Select ClientID from TableC based on the ClientID
> Returns multiple rows

SELECT A.ClientID, A.ScanJobHeaderID, B.SearchEngineID, B.PageSyntaxID
FROM   TableA A
JOIN   TableB B ON A.SearchEngineGroupID = B.SearchEngineGroupID
JOIN   TableC C ON A.ClientID = C.ClientID
WHERE  A.xxx = @

So suddently three queries are one. If I had better description of
the tables you are inserting into, I would have written the INSERT
statements as well. There is a mention of scope_identity() that
makes me a little nervous. IDENTITY columns are usually a cause
for problems when you need to insert referring rows. The simplest
remedy is often to simple get rid of the IDENTITY property.

> I would like to do this in SQL, so I can trigger it when a new record is
> entered into Table A.

Ah, so change the query into:

SELECT A.ClientID, A.ScanJobHeaderID, B.SearchEngineID, B.PageSyntaxID
FROM   inserted A
JOIN   TableB B ON A.SearchEngineGroupID = B.SearchEngineGroupID
JOIN   TableC C ON A.ClientID = C.ClientID

(It should appear in a trigger.) Note that this handle the case that many
rows are inserted at once.

> The result is over 1 million records entered into TableD and TableE

Don't loop to insert one million rows! That's completely irresponsible!
With those volumes of data, you must work set-based! Running a loop
could take hours! A single insert should complete in a minute or two.

--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
11 Aug 2006 1:59 PM
Curtis
CREATE TABLE [dbo].[TableA](
    [ScanJobHeaderID] [int] IDENTITY(1,1) NOT NULL,
    [ClientID] [smallint] NOT NULL
    [SearchEngineGroupID] [smallint] NOT NULL
CONSTRAINT [PK_TableA] PRIMARY KEY CLUSTERED
(
    [ScanJobHeaderID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[j-TableA](
    [ScanJobHeaderID] [int] NOT NULL,
    [ScanJobDetailID] [int] NOT NULL
) ON [PRIMARY]

    Select    Top 1 t1.ClientID,  t1.ScanJobHeaderID,  t1.SearchEngineGroupID
    From    dbo.TableA t1
        LEFT JOIN    dbo.[j-TableA] t2 ON t1.ScanJobHeaderID = t2.ScanJobHeaderID
    Where    t2.ScanJobHeaderID IS NULL
    Order by t1.ScanJobHeaderID

Above is the tables and query for the first part (Query1)

CREATE TABLE [dbo].[TableB](
    [PageSyntaxID] [int] NOT NULL,
    [SearchEngineID] [int] NULL,
    [PageNumber] [int] NULL
) ON [PRIMARY]

CREATE TABLE [dbo].[j-TableB](
    [SearchEngineID] [tinyint] NOT NULL,
    [SearchEngineGroupID] [smallint] NOT NULL
) ON [PRIMARY]


    Select    t1.SearchEngineID, t2.PageSyntaxID
    From    dbo.[j-tblSearchEngineGroup] t1
        INNER JOIN
            dbo.tblSearchEnginePageSyntax t2 on t2.SearchEngineID = t1.SearchEngineID
    Where    (SearchEngineGroupID = @SearchEngineGroupID) and (t2.PageNumber =
@PageNumber)

Above is the tables and query for the first part (Query2)

CREATE TABLE [dbo].[TableC](
    [KeywordSearchID] [int] NOT NULL
CONSTRAINT [PK_TableC] PRIMARY KEY CLUSTERED
(
    [KeywordSearchID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[j-TableC](
    [ClientID] [smallint] NOT NULL,
    [KeywordSearchID] [int] NOT NULL
) ON [PRIMARY]

    Select    t1.KeywordSearchID
    From    dbo.TableC t1
        INNER JOIN
            dbo.[j-TableC] t2 on t2.KeywordSearchID = t1.KeywordSearchID
    Where    (t2.ClientID = @ClientID) and t1.Billable = '1'


Above is the tables and query for the first part (Query3)

CREATE TABLE [dbo].[TableD](
    [ScanJobDetailID] [int] IDENTITY(1,1) NOT NULL,
    [KeywordSearchID] [int] NULL,
    [SearchEngineID] [tinyint] NULL,
    [SearchEnginePageID] [tinyint] NULL
CONSTRAINT [PK_TableD] PRIMARY KEY CLUSTERED
(
    [ScanJobDetailID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[j-TableE](
    [ScanJobHeaderID] [int] NOT NULL,
    [ScanJobDetailID] [int] NOT NULL
) ON [PRIMARY]

Declare @ScanJobDetailID int

    Insert    dbo.TableD (KeywordSearchID, SearchEngineID, SearchEnginePageID)
                            Values (@KeyWordSearchID, @SearchEngineID, @SearchEnginePageID)

    Set @ScanJobDetailID = SCOPE_IDENTITY()

    Insert dbo.[j-TableE] (ScanJobHeaderID, ScanJobDetailID) Values
(@ScanJobHeaderID, @ScanJobDetailID)

I don't understand how I can get this into one set of results.

Thank you both for your assistance.

Show quote
"Erland Sommarskog" wrote:

> Curtis (Cur***@discussions.microsoft.com) writes:
> > Query 1:
> > Select ClientID, ScanJobHeaderID, SearchEngineGroupID from TableA
> > Returns 1 Row
> >
> > Query 2:
> > Select SearchEngineID, PageSyntaxID from TableB based on the
> > SearchEngineGroupID
> > Returns multiple rows
> >
> > Query 3:
> > Select ClientID from TableC based on the ClientID
> > Returns multiple rows
>
> SELECT A.ClientID, A.ScanJobHeaderID, B.SearchEngineID, B.PageSyntaxID
> FROM   TableA A
> JOIN   TableB B ON A.SearchEngineGroupID = B.SearchEngineGroupID
> JOIN   TableC C ON A.ClientID = C.ClientID
> WHERE  A.xxx = @
>
> So suddently three queries are one. If I had better description of
> the tables you are inserting into, I would have written the INSERT
> statements as well. There is a mention of scope_identity() that
> makes me a little nervous. IDENTITY columns are usually a cause
> for problems when you need to insert referring rows. The simplest
> remedy is often to simple get rid of the IDENTITY property.

> > I would like to do this in SQL, so I can trigger it when a new record is
> > entered into Table A.
>
> Ah, so change the query into:
>
> SELECT A.ClientID, A.ScanJobHeaderID, B.SearchEngineID, B.PageSyntaxID
> FROM   inserted A
> JOIN   TableB B ON A.SearchEngineGroupID = B.SearchEngineGroupID
> JOIN   TableC C ON A.ClientID = C.ClientID
>
> (It should appear in a trigger.) Note that this handle the case that many
> rows are inserted at once.
>
> > The result is over 1 million records entered into TableD and TableE

> Don't loop to insert one million rows! That's completely irresponsible!
> With those volumes of data, you must work set-based! Running a loop
> could take hours! A single insert should complete in a minute or two.
>
> --
> Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se
>
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
>
Author
12 Aug 2006 9:09 PM
Erland Sommarskog
Curtis (Cur***@discussions.microsoft.com) writes:
> I don't understand how I can get this into one set of results.
>
> Thank you both for your assistance.

In SQL 2000, the IDENTITY column would had caused problems, but on SQL 2005,
the new OUTPUT clause can sometimes save the situation, and this is such a
case. As I understand it, the second INSERT into j-TableE just inserts
all generated ids from the first query, gathered under the ScanJobHeaderID
from the starting query.

Below is a query batch with your tables and one SELECT, and two INSERT.
It compiles, but without test data I cannot guarantee that it produces
the correct result. Also, you obfustication of the original table names
was not complete, so I had to make some assumptions.

Still there is one thing that nags me: why the TOP 1 in the first query?
Don't you ever want to handle multiple clients at the same time?

DECLARE @ClientID smallint,
        @PageNumber int,
        @SearchEngineGroupID int,
        @ScanJobDetailID int,
        @KeyWordSearchID int,
        @SearchEngineID int,
        @SearchEnginePageID int,
        @ScanJobHeaderID int

CREATE TABLE [dbo].[TableA](
    [ScanJobHeaderID] [int] IDENTITY(1,1) NOT NULL,
    [ClientID] [smallint] NOT NULL,
    [SearchEngineGroupID] [smallint] NOT NULL
CONSTRAINT [PK_TableA] PRIMARY KEY CLUSTERED
(
    [ScanJobHeaderID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[j-TableA](
    [ScanJobHeaderID] [int] NOT NULL,
    [ScanJobDetailID] [int] NOT NULL
) ON [PRIMARY]


--Above is the tables and query for the first part (Query1)

CREATE TABLE [dbo].[TableB](
    [PageSyntaxID] [int] NOT NULL,
    [SearchEngineID] [int] NULL,
    [PageNumber] [int] NULL
) ON [PRIMARY]

CREATE TABLE [dbo].[j-TableB](
    [SearchEngineID] [tinyint] NOT NULL,
    [SearchEngineGroupID] [smallint] NOT NULL
) ON [PRIMARY]


--Above is the tables and query for the first part (Query2)

CREATE TABLE [dbo].[TableC](
    [KeywordSearchID] [int] NOT NULL,
    Billable char(1) NOT NULL
CONSTRAINT [PK_TableC] PRIMARY KEY CLUSTERED
(
    [KeywordSearchID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[j-TableC](
    [ClientID] [smallint] NOT NULL,
    [KeywordSearchID] [int] NOT NULL
) ON [PRIMARY]


--Above is the tables and query for the first part (Query3)

CREATE TABLE [dbo].[TableD](
    [ScanJobDetailID] [int] IDENTITY(1,1) NOT NULL,
    [KeywordSearchID] [int] NULL,
    [SearchEngineID] [tinyint] NULL,
    [SearchEnginePageID] [tinyint] NULL
CONSTRAINT [PK_TableD] PRIMARY KEY CLUSTERED
(
    [ScanJobDetailID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

CREATE TABLE [dbo].[j-TableE](
    [ScanJobHeaderID] [int] NOT NULL,
    [ScanJobDetailID] [int] NOT NULL
) ON [PRIMARY]

DECLARE @IDs TABLE (id int NOT NULL)

Select    Top 1 @ClientID = t1.ClientID,  @ScanJobHeaderID = t1.ScanJobHeaderID, 
               @SearchEngineGroupID = t1.SearchEngineGroupID
From        dbo.TableA t1
LEFT JOIN dbo.[j-TableA] t2 ON t1.ScanJobHeaderID = t2.ScanJobHeaderID
Where    t2.ScanJobHeaderID IS NULL
Order by t1.ScanJobHeaderID

IF @@rowcount > 0
BEGIN
    Insert    dbo.TableD (KeywordSearchID, SearchEngineID, SearchEnginePageID)
    OUTPUT inserted.ScanJobDetailID INTO @IDs(id)
    Select  C.KeyWordSerarchID, jB.SearchEngineID, jb.PageSyntaxID
    From      dbo.[j-TableB] jB
    JOIN      dbo.TableB B on jB.SearchEngineID = B.SearchEngineID
    CROSS   JOIN (dbo.TableC C
                   JOIN dbo.[j-TableC] C on jC.KeywordSearchID = C.KeywordSearchID)
    Where      B.SearchEngineGroupID = @SearchEngineGroupID
      AND   B.PageNumber = @PageNumber
      AND   C.Billlable = '1'

    Insert dbo.[j-TableE] (ScanJobHeaderID, ScanJobDetailID)
      SELECT @ScanJobHeaderID, id
      FROM  @IDs
END     

go
drop Table TableA, [j-TableA], TableB, [j-TableB], TableC, [j-TableC], TableD, [j-TableE]

--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
14 Aug 2006 6:16 PM
Curtis
I am working on digesting your last response, so I can understand how it
works.

Yes, I will be working with more than one client, but in my loop scenario I
was working with one client at a time. After the loop finished the work for
one client that client wasn't available in Query1. Basically if
ScanJobHeaderID exists in TableA and j-TableA then I don't want it included
in Query1.

Show quote
"Erland Sommarskog" wrote:

> Curtis (Cur***@discussions.microsoft.com) writes:
> > I don't understand how I can get this into one set of results.
> >
> > Thank you both for your assistance.

> In SQL 2000, the IDENTITY column would had caused problems, but on SQL 2005,
> the new OUTPUT clause can sometimes save the situation, and this is such a
> case. As I understand it, the second INSERT into j-TableE just inserts
> all generated ids from the first query, gathered under the ScanJobHeaderID
> from the starting query.
>
> Below is a query batch with your tables and one SELECT, and two INSERT.
> It compiles, but without test data I cannot guarantee that it produces
> the correct result. Also, you obfustication of the original table names
> was not complete, so I had to make some assumptions.
>
> Still there is one thing that nags me: why the TOP 1 in the first query?
> Don't you ever want to handle multiple clients at the same time?
>
> DECLARE @ClientID smallint,
>         @PageNumber int,
>         @SearchEngineGroupID int,
>         @ScanJobDetailID int,
>         @KeyWordSearchID int,
>         @SearchEngineID int,
>         @SearchEnginePageID int,
>         @ScanJobHeaderID int
>
> CREATE TABLE [dbo].[TableA](
>     [ScanJobHeaderID] [int] IDENTITY(1,1) NOT NULL,
>     [ClientID] [smallint] NOT NULL,
>     [SearchEngineGroupID] [smallint] NOT NULL
>  CONSTRAINT [PK_TableA] PRIMARY KEY CLUSTERED
> (
>     [ScanJobHeaderID] ASC
> )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
> ) ON [PRIMARY]
>
> CREATE TABLE [dbo].[j-TableA](
>     [ScanJobHeaderID] [int] NOT NULL,
>     [ScanJobDetailID] [int] NOT NULL
> ) ON [PRIMARY]
>
>
> --Above is the tables and query for the first part (Query1)
>
> CREATE TABLE [dbo].[TableB](
>     [PageSyntaxID] [int] NOT NULL,
>     [SearchEngineID] [int] NULL,
>     [PageNumber] [int] NULL
> ) ON [PRIMARY]
>
> CREATE TABLE [dbo].[j-TableB](
>     [SearchEngineID] [tinyint] NOT NULL,
>     [SearchEngineGroupID] [smallint] NOT NULL
> ) ON [PRIMARY]
>
>
> --Above is the tables and query for the first part (Query2)
>
> CREATE TABLE [dbo].[TableC](
>     [KeywordSearchID] [int] NOT NULL,
>     Billable char(1) NOT NULL
>  CONSTRAINT [PK_TableC] PRIMARY KEY CLUSTERED
> (
>     [KeywordSearchID] ASC
> )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
> ) ON [PRIMARY]
>
> CREATE TABLE [dbo].[j-TableC](
>     [ClientID] [smallint] NOT NULL,
>     [KeywordSearchID] [int] NOT NULL
> ) ON [PRIMARY]
>
>
> --Above is the tables and query for the first part (Query3)
>
> CREATE TABLE [dbo].[TableD](
>     [ScanJobDetailID] [int] IDENTITY(1,1) NOT NULL,
>     [KeywordSearchID] [int] NULL,
>     [SearchEngineID] [tinyint] NULL,
>     [SearchEnginePageID] [tinyint] NULL
>  CONSTRAINT [PK_TableD] PRIMARY KEY CLUSTERED
> (
>     [ScanJobDetailID] ASC
> )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
> ) ON [PRIMARY]
>
> CREATE TABLE [dbo].[j-TableE](
>     [ScanJobHeaderID] [int] NOT NULL,
>     [ScanJobDetailID] [int] NOT NULL
> ) ON [PRIMARY]
>
> DECLARE @IDs TABLE (id int NOT NULL)
>
> Select    Top 1 @ClientID = t1.ClientID,  @ScanJobHeaderID = t1.ScanJobHeaderID, 
>                @SearchEngineGroupID = t1.SearchEngineGroupID
> From        dbo.TableA t1
> LEFT JOIN dbo.[j-TableA] t2 ON t1.ScanJobHeaderID = t2.ScanJobHeaderID
> Where    t2.ScanJobHeaderID IS NULL
> Order by t1.ScanJobHeaderID
>
> IF @@rowcount > 0
> BEGIN
>     Insert    dbo.TableD (KeywordSearchID, SearchEngineID, SearchEnginePageID)
>     OUTPUT inserted.ScanJobDetailID INTO @IDs(id)
>     Select  C.KeyWordSerarchID, jB.SearchEngineID, jb.PageSyntaxID
>     From      dbo.[j-TableB] jB
>     JOIN      dbo.TableB B on jB.SearchEngineID = B.SearchEngineID
>     CROSS   JOIN (dbo.TableC C
>                    JOIN dbo.[j-TableC] C on jC.KeywordSearchID = C.KeywordSearchID)
>     Where      B.SearchEngineGroupID = @SearchEngineGroupID
>       AND   B.PageNumber = @PageNumber
>       AND   C.Billlable = '1'
>    
>     Insert dbo.[j-TableE] (ScanJobHeaderID, ScanJobDetailID)
>       SELECT @ScanJobHeaderID, id
>       FROM  @IDs
> END     
>
> go
> drop Table TableA, [j-TableA], TableB, [j-TableB], TableC, [j-TableC], TableD, [j-TableE]
>
> --
> Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se
>
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
>
>
Author
14 Aug 2006 10:50 PM
Erland Sommarskog
Curtis (Cur***@discussions.microsoft.com) writes:
> I am working on digesting your last response, so I can understand how it
> works.

If you are used to think in loops, then it may take some to digest.
But it will be well-spent time. Because this is really how operate in
SQL. You operate on sets. Sometimes looping may be called for, but you
should always try to avoid it. There is a dramatic difference in
execution time for the two approaches. And I mean dramatic: you mentioned
a million rows. Looping could take an hour or two. A set-based operation
could complete within the minute.

> Yes, I will be working with more than one client, but in my loop
> scenario I was working with one client at a time. After the loop
> finished the work for one client that client wasn't available in Query1.
> Basically if ScanJobHeaderID exists in TableA and j-TableA then I don't
> want it included in Query1.

Then probably the next step is to handle all customers at a time as
well.

--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
14 Aug 2006 11:29 PM
Curtis
I am trying my luck with a cursor to deal with all of the clients at one
time. It seems to be taking a long time to complete.

DECLARE @ClientID smallint,
        @PageNumber int,
        @SearchEngineGroupID int,
        @ScanJobDetailID int,
        @KeyWordSearchID int,
        @SearchEngineID int,
        @SearchEnginePageID int,
        @ScanJobHeaderID int



DECLARE @IDs TABLE (id int NOT NULL)
Declare HeaderCursor Cursor FAST_FORWARD FOR
Select    t1.ClientID, t1.ScanJobHeaderID, t1.SearchEngineGroupID
From        dbo.TableA t1
LEFT JOIN dbo.[j-TableA] t2 ON t1.ScanJobHeaderID = t2.ScanJobHeaderID
Where    t2.ScanJobHeaderID IS NULL
Order by t1.ScanJobHeaderID
Open HeaderCursor

FETCH NEXT FROM HeaderCursor into @ClientID, @ScanJobHeaderID,
@SearchEngineGroupID
While @@fetch_status =0

BEGIN

    Insert    dbo.TableD (KeywordSearchID, SearchEngineID, SearchEnginePageID)
    OUTPUT inserted.ScanJobDetailID INTO @IDs(id)
    Select  C.KeyWordSearchID, jB.SearchEngineID, B.PageSyntaxID
    From      dbo.[j-TableB] jB
    JOIN      dbo.TableB B on jB.SearchEngineID = B.SearchEngineID
    CROSS   JOIN (
                      dbo.TableC C
                   JOIN dbo.[j-TableC] jc on jC.KeywordSearchID = C.KeywordSearchID)
    Where      jb.SearchEngineGroupID = @SearchEngineGroupID and jc.ClientID =
@ClientID
      AND   B.PageNumber = '1'/*@PageNumber*/
      AND   C.Billable = '1'

    Insert dbo.[j-TableA] (ScanJobHeaderID, ScanJobDetailID)
      SELECT @ScanJobHeaderID, id
      FROM  @IDs

FETCH NEXT FROM HeaderCursor into @ClientID, @ScanJobHeaderID,
@SearchEngineGroupID

END     
Close HeaderCursor
DEALLOCATE HeaderCursor

I am expecting the final product to be 855,555 rows added. It's been running
for 15 minutes and it's only at 349,630. Is a cursor more efficient then
using a while loop and calling the client one at a time?


Show quote
"Erland Sommarskog" wrote:

> Curtis (Cur***@discussions.microsoft.com) writes:
> > I am working on digesting your last response, so I can understand how it
> > works.
>
> If you are used to think in loops, then it may take some to digest.
> But it will be well-spent time. Because this is really how operate in
> SQL. You operate on sets. Sometimes looping may be called for, but you
> should always try to avoid it. There is a dramatic difference in
> execution time for the two approaches. And I mean dramatic: you mentioned
> a million rows. Looping could take an hour or two. A set-based operation
> could complete within the minute.

> > Yes, I will be working with more than one client, but in my loop
> > scenario I was working with one client at a time. After the loop
> > finished the work for one client that client wasn't available in Query1.
> > Basically if ScanJobHeaderID exists in TableA and j-TableA then I don't
> > want it included in Query1.
>
> Then probably the next step is to handle all customers at a time as
> well.
>
> --
> Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se
>
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
>
Author
15 Aug 2006 10:23 PM
Erland Sommarskog
Curtis (Cur***@discussions.microsoft.com) writes:
> I am trying my luck with a cursor to deal with all of the clients at one
> time. It seems to be taking a long time to complete.

No wonder. Didn't I say that looping is the slow way to access a database?

> I am expecting the final product to be 855,555 rows added. It's been
> running for 15 minutes and it's only at 349,630. Is a cursor more
> efficient then using a while loop and calling the client one at a time?

When you need to loop, a cursor is often less bad than any other sort
of loop. But it's slow.

There is a bug in your code which may affect execution though: you don't
empty the @IDs table, so you are inserting far to much data into the second
table. That could explain some of the delay. (And that also shows the
dangers with looping. You cannot do such mistakes when you work set-
based.)

Obviously all clients should be handled in one go, and for the first
INSERT it's not that difficult:

   Insert   dbo.TableD (KeywordSearchID, SearchEngineID, SearchEnginePageID)
   OUTPUT   inserted.ScanJobDetailID INTO @IDs(id)
   Select   C.KeyWordSearchID, jB.SearchEngineID, B.PageSyntaxID
   From     dbo.TableA A
   JOIN     dbo.[j-TableB] jB
         on jB.SearchEngineGroupID = A.SearchEngineGroupID
   JOIN     dbo.TableB B ON jB.SearchEngineID = B.SearchEngineID
   JOIN     dbo.[j-TableC] jc ON jc.ClientID = A.ClientID
   JOIN     dbo.TableC C ON jC.KeywordSearchID = C.KeywordSearchID
   Where    NOT EXISTS (SELECT *
                        FROM   dbo.[j-TableA]
                        WHERE  A.ScanJobHeaderID = jA.ScanJobHeaderID)
     AND    B.PageNumber = '1'/*@PageNumber*/
     AND    C.Billable = '1'

(With the general disclaimer that I have about zero understanding of
what this stuff is actually doing.)

The problem is with the second table. Since we are now inserting many
clients at the same time, there no longer any apparent relation between
the ScanJobHeaderID and the ids generated in TableD. If many clients
have the same ScanJobHeaderID, you can improve things by looping over
the ScanJobHeaderIDs instead (add "AND A.ScanJobHeaderID =
@ScanJobHeaderIDs".)

If it is possible to change TableD, so that they key column does not
have the IDENTITY property, then it's possible to devise a solution
that handles all clients in one go.

(Actually that solution could be possible with IDENTITY as well, if
you permit SET IDENTITY_INSERT ON. But I don't think SET
IDENTITY_INSERT belongs in production code.)



--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
16 Aug 2006 5:50 PM
Curtis
Thank you for catching the bug. The process completed in a minute after I
corrected the problem.

I can't change the table structure of TableD.

Thank you for all your help.

Show quote
"Erland Sommarskog" wrote:

> Curtis (Cur***@discussions.microsoft.com) writes:
> > I am trying my luck with a cursor to deal with all of the clients at one
> > time. It seems to be taking a long time to complete.
>
> No wonder. Didn't I say that looping is the slow way to access a database?

> > I am expecting the final product to be 855,555 rows added. It's been
> > running for 15 minutes and it's only at 349,630. Is a cursor more
> > efficient then using a while loop and calling the client one at a time?
>
> When you need to loop, a cursor is often less bad than any other sort
> of loop. But it's slow.
>
> There is a bug in your code which may affect execution though: you don't
> empty the @IDs table, so you are inserting far to much data into the second
> table. That could explain some of the delay. (And that also shows the
> dangers with looping. You cannot do such mistakes when you work set-
> based.)
>
> Obviously all clients should be handled in one go, and for the first
> INSERT it's not that difficult:
>
>    Insert   dbo.TableD (KeywordSearchID, SearchEngineID, SearchEnginePageID)
>    OUTPUT   inserted.ScanJobDetailID INTO @IDs(id)
>    Select   C.KeyWordSearchID, jB.SearchEngineID, B.PageSyntaxID
>    From     dbo.TableA A
>    JOIN     dbo.[j-TableB] jB
>          on jB.SearchEngineGroupID = A.SearchEngineGroupID
>    JOIN     dbo.TableB B ON jB.SearchEngineID = B.SearchEngineID
>    JOIN     dbo.[j-TableC] jc ON jc.ClientID = A.ClientID
>    JOIN     dbo.TableC C ON jC.KeywordSearchID = C.KeywordSearchID
>    Where    NOT EXISTS (SELECT *
>                         FROM   dbo.[j-TableA]
>                         WHERE  A.ScanJobHeaderID = jA.ScanJobHeaderID)
>      AND    B.PageNumber = '1'/*@PageNumber*/
>      AND    C.Billable = '1'
>
> (With the general disclaimer that I have about zero understanding of
> what this stuff is actually doing.)
>
> The problem is with the second table. Since we are now inserting many
> clients at the same time, there no longer any apparent relation between
> the ScanJobHeaderID and the ids generated in TableD. If many clients
> have the same ScanJobHeaderID, you can improve things by looping over
> the ScanJobHeaderIDs instead (add "AND A.ScanJobHeaderID =
> @ScanJobHeaderIDs".)
>
> If it is possible to change TableD, so that they key column does not
> have the IDENTITY property, then it's possible to devise a solution
> that handles all clients in one go.
>
> (Actually that solution could be possible with IDENTITY as well, if
> you permit SET IDENTITY_INSERT ON. But I don't think SET
> IDENTITY_INSERT belongs in production code.)
>
>
>
> --
> Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se
>
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
>
Author
16 Aug 2006 10:07 PM
Erland Sommarskog
Curtis (Cur***@discussions.microsoft.com) writes:
> Thank you for catching the bug. The process completed in a minute after I
> corrected the problem.

A minute is still a tad slow, but if it's acceptable to you, maybe it's
time to look for new feats.

> I can't change the table structure of TableD.

Too bad. (But tell the DBA or whoever designed it that IDENTITY was no
blessing this time, but a curse.)

--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
16 Aug 2006 11:11 PM
Curtis
I am rather new to SQL, so I was thinking 1 minute for that many records was
pretty good. I want to do this the best way. 

If I change TableD so that it does not have the IDENTITY then how would I
assign a value to that column each time I run the query?

Show quote
"Erland Sommarskog" wrote:

> Curtis (Cur***@discussions.microsoft.com) writes:
> > Thank you for catching the bug. The process completed in a minute after I
> > corrected the problem.
>
> A minute is still a tad slow, but if it's acceptable to you, maybe it's
> time to look for new feats.

> > I can't change the table structure of TableD.
>
> Too bad. (But tell the DBA or whoever designed it that IDENTITY was no
> blessing this time, but a curse.)
>
> --
> Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se
>
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
>
Author
19 Aug 2006 7:39 PM
Erland Sommarskog
Curtis (Cur***@discussions.microsoft.com) writes:
> I am rather new to SQL, so I was thinking 1 minute for that many records
> was pretty good. I want to do this the best way. 
>
> If I change TableD so that it does not have the IDENTITY then how would I
> assign a value to that column each time I run the query?

Actually, when I look at your tables and the queries, I don't understand
why you have this [j-TableE]. There seems to be a one-to-one relation to
TableD, so if you just added ScanJobHeaderID to that table, there would
be no problem at all.

But it could very well be that I have mangled the table relations when
I have rearranged your original code. Keep in mind that while I know SQL,
I know zero about the business your tables relates to. All that I am
able to is mechanical changes.

Anyway, if there has two be two different tables, you could do as below.
That's not the only way to skin the cat. I'm bouncing the data over an
intermediate table, so that you don't have to read the base tables twice.
But there is a cost for the intermediate table, so I cannot say which is
the best.

And, please, please! Test everything that I post very carefully. It does
not help if it runs faster than your original solution, if it produces an
incorrect result.

DECLARE @temp TABLE (incr int NOT NULL PRIMARY KEY,
                     KeywordSearchID int NULL,
                     SearchEngineID  tinyint NULL,
                     SearchEnginePageID tinyint NULL,
                     ScanJobHeaderID int NOT NULL)

Insert   @temp (incr, KeywordSearchID, SearchEngineID,
                SearchEnginePageID, ScanJobHeaderID)
   Select   row_number() OVER (ORDER BY A.ClientID), C.KeywordSearchID,
            jB.SearchEngineID, B.PageSyntaxID, A.ScanJobHeaderID
   From     dbo.TableA A
   JOIN     dbo.[j-TableB] jB
         on jB.SearchEngineGroupID = A.SearchEngineGroupID
   JOIN     dbo.TableB B ON jB.SearchEngineID = B.SearchEngineID
   JOIN     dbo.[j-TableC] jC ON jC.ClientID = A.ClientID
   JOIN     dbo.TableC C ON jC.KeywordSearchID = C.KeywordSearchID
   Where    NOT EXISTS (SELECT *
                        FROM   dbo.[j-TableA] jA
                        WHERE  A.ScanJobHeaderID = jA.ScanJobHeaderID)
     AND    B.PageNumber = '1'/*@PageNumber*/
     AND    C.Billable = '1'

DECLARE @maxid int

SELECT @maxid = coalesce(MAX(ScanJobDetailID), 0) FROM [TableD]

INSERT [TableD] (ScanJobDetailID, KeywordSearchID, SearchEngineID,
                  SearchEnginePageID)
   SELECT @maxid + incr, KeywordSearchID, SearchEngineID,
          SearchEnginePageID
   FROM   @temp

INSERT [j-TableE] (ScanJobDetailID, ScanJobHeaderID)
   SELECT @maxid + incr, ScanJobHeaderID
   FROM   @temp


--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
21 Aug 2006 10:33 PM
Curtis
You are correct, [j-TableE] is not needed.

Your code has been dead on.

Thank you for your patience and assistance.

The query runs in about 35 seconds the first time. Table D ie empty the
first time I run it.

Insert    dbo.TableD(ScanJobHeaderID, KeywordSearchID, SearchEngineID,
SearchEnginePageID)
Select   A.ScanJobHeaderID, C.KeywordSearchID, jB.SearchEngineID,
B.PageSyntaxID
From         dbo.TableA A
   JOIN     dbo.[j-TableB] jB ON jB.SearchEngineGroupID =
A.SearchEngineGroupID
   JOIN     dbo.TableB B ON jB.SearchEngineID = B.SearchEngineID

   JOIN     dbo.[j-TableC] jC ON jC.ClientID = A.ClientID
   JOIN     dbo.TableC C ON jC.KeywordSearchID = C.KeywordSearchID
Where   NOT EXISTS (SELECT ScanJobHeaderID
                        FROM   dbo.TableD jA
                        WHERE  JA.ScanJobHeaderID = A.ScanJobHeaderID)

     AND    B.PageNumber = '1'/*@PageNumber*/
    AND    C.Billable = '1'

It takes 1:35 the second time I run it. I placed a new record in TableA to
make sure it was just grabbing new records. It takes just as long to complete
if I don't add a new record to TableA. I would expect it to complete quickly
because there is nothing to add to TableD. Does this seem abnormal to you? 

I ran
Select *
From dbo.TableA
Where not exists(Select ScanJobHeaderID From dbo.TableD ja Where
ja.ScanJobDetailID = a.ScanJobDetailID)

To see if it was the source of the delay and it's not. The delay comes from
the Joins.

Show quote
"Erland Sommarskog" wrote:

> Curtis (Cur***@discussions.microsoft.com) writes:
> > I am rather new to SQL, so I was thinking 1 minute for that many records
> > was pretty good. I want to do this the best way. 
> >
> > If I change TableD so that it does not have the IDENTITY then how would I
> > assign a value to that column each time I run the query?

> Actually, when I look at your tables and the queries, I don't understand
> why you have this [j-TableE]. There seems to be a one-to-one relation to
> TableD, so if you just added ScanJobHeaderID to that table, there would
> be no problem at all.
>
> But it could very well be that I have mangled the table relations when
> I have rearranged your original code. Keep in mind that while I know SQL,
> I know zero about the business your tables relates to. All that I am
> able to is mechanical changes.
>
> Anyway, if there has two be two different tables, you could do as below.
> That's not the only way to skin the cat. I'm bouncing the data over an
> intermediate table, so that you don't have to read the base tables twice.
> But there is a cost for the intermediate table, so I cannot say which is
> the best.
>
> And, please, please! Test everything that I post very carefully. It does
> not help if it runs faster than your original solution, if it produces an
> incorrect result.
>
> DECLARE @temp TABLE (incr int NOT NULL PRIMARY KEY,
>                      KeywordSearchID int NULL,
>                      SearchEngineID  tinyint NULL,
>                      SearchEnginePageID tinyint NULL,
>                      ScanJobHeaderID int NOT NULL)
>
> Insert   @temp (incr, KeywordSearchID, SearchEngineID,
>                 SearchEnginePageID, ScanJobHeaderID)
>    Select   row_number() OVER (ORDER BY A.ClientID), C.KeywordSearchID,
>             jB.SearchEngineID, B.PageSyntaxID, A.ScanJobHeaderID
>    From     dbo.TableA A
>    JOIN     dbo.[j-TableB] jB
>          on jB.SearchEngineGroupID = A.SearchEngineGroupID
>    JOIN     dbo.TableB B ON jB.SearchEngineID = B.SearchEngineID
>    JOIN     dbo.[j-TableC] jC ON jC.ClientID = A.ClientID
>    JOIN     dbo.TableC C ON jC.KeywordSearchID = C.KeywordSearchID
>    Where    NOT EXISTS (SELECT *
>                         FROM   dbo.[j-TableA] jA
>                         WHERE  A.ScanJobHeaderID = jA.ScanJobHeaderID)
>      AND    B.PageNumber = '1'/*@PageNumber*/
>      AND    C.Billable = '1'
>
> DECLARE @maxid int
>
> SELECT @maxid = coalesce(MAX(ScanJobDetailID), 0) FROM [TableD]
>
> INSERT [TableD] (ScanJobDetailID, KeywordSearchID, SearchEngineID,
>                   SearchEnginePageID)
>    SELECT @maxid + incr, KeywordSearchID, SearchEngineID,
>           SearchEnginePageID
>    FROM   @temp
>
> INSERT [j-TableE] (ScanJobDetailID, ScanJobHeaderID)
>    SELECT @maxid + incr, ScanJobHeaderID
>    FROM   @temp
>
>
> --
> Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se
>
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
>
Author
21 Aug 2006 10:50 PM
Erland Sommarskog
Curtis (Cur***@discussions.microsoft.com) writes:
> It takes 1:35 the second time I run it. I placed a new record in TableA
> to make sure it was just grabbing new records. It takes just as long to
> complete if I don't add a new record to TableA. I would expect it to
> complete quickly because there is nothing to add to TableD. Does this
> seem abnormal to you? 

Abnormal and abnormal, it indicates that there is some indexing work to
do. It is not all that uncommon to have queries that takes over a minute
to find out rhat there is nothing all to return.

> Select *
> From dbo.TableA
> Where not exists(Select ScanJobHeaderID From dbo.TableD ja Where
> ja.ScanJobDetailID = a.ScanJobDetailID)
>
> To see if it was the source of the delay and it's not. The delay comes
> from the Joins.

The more tables you throw into the mix, the more choices the optimizer gets,
and the more chances for bad choices.

I don't remember the tables exactly. Or rather, I recall that they looked
a bit incomplete, as not all had primary keys, and I don't know about
other indexes.

--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
22 Aug 2006 10:08 PM
Curtis
Indexing helped. I assumed my indexes were included in the create statements,
but they weren't.

I have spent days trying to optimize the query that works with these records
and I can't think of any other way to make it more efficient. It executes
quickly until I add Order by NewID(), but I have to grab a random sample of
search engines. Would you mind looking at my query to see if your experience
can see a better way? The maximum time it takes to complete is 4 seconds.

create PROCEDURE [dbo].[getJobByScanMachineID]

    (
    @ScanMachineID int
    )

AS

Declare @ScanJobHeaderID int,
        @ScanJobDetailID int

Declare @Parameters table(
            ScanJobDetailID int not null,
            ScanJobHeaderID int not null,
            SearchEngineID int not null,
            URL varchar(max) not null)


Insert @Parameters(ScanJobDetailID, ScanJobHeaderID, SearchEngineID, URL)
Select Top 10 t1.ScanJobDetailID, t1.ScanJobHeaderID, t1.SearchEngineID,
replace(URLSyntax, '<<keywords>>', t5.SearchPhrase) as URL
From dbo.TableD t1
    JOIN
        (Select    ScanJobHeaderID, SortPriority, SubmissionTime
            From    dbo.TableA
            Where    (HoldUntilTime is null) or (HoldUntilTime <= getdate())) t3 on
t3.ScanJobHeaderID = t1.ScanJobHeaderID
    JOIN
            (Select Top 10 SearchEngineID
            From dbo.getAvailableSearchEngines(@ScanMachineID)
            Order by NewID()) t2 ON t2.SearchEngineID = t1.SearchEngineID
        CROSS APPLY
            (Select    PageSyntaxID, SearchEngineID, PageNumber, URLSyntax
            From    dbo.TableB
            Where    SearchEngineID = t2.SearchEngineID) t4
    JOIN
            (Select KeywordSearchID,
        Case
            When (t1.LocalizationID is null)
                Then t2.RootPhrase
            Else
                t2.RootPhrase + ' ' + t3.Localization
            End  As SearchPhrase
    From    dbo.TableC t1
        LEFT OUTER JOIN
            dbo.TableF t2 on t2.RootPhraseID = t1.RootPhraseID
        LEFT OUTER JOIN
            dbo.TableG t3 on t3.LocalizationID = t1.LocalizationID) as t5 on
t5.KeywordSearchID = t1.KeyWordSearchID
Where (AssignmentTime is null) and (t4.PageSyntaxID = t1.SearchEnginePageID)
Order by SortPriority, SubmissionTime

Select    *
From    @Parameters

While (Select Count(ScanJobDetailID) From @Parameters) > 0
Begin
Select Top 1 @ScanJobHeaderID = ScanJobHeaderID, @ScanJobDetailID =
ScanJobDetailID
From @Parameters

exec dbo.updateAssignJobByScanMachineID @ScanJobHeaderID, @ScanMachineID,
@ScanjobDetailID

Delete @Parameters
Where ScanJobDetailID = @ScanJobDetailID
END

CREATE TABLE [dbo].[TableF](
    [RootPhraseID] [int] IDENTITY(1,1) NOT NULL,
    [RootPhrase] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
CONSTRAINT [PK_tblKeywordSearch] PRIMARY KEY CLUSTERED
(
    [RootPhraseID] ASC
)WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF
GO
/****** Object:  Statistic [_dta_stat_481436789_2_1]    Script Date:
08/22/2006 16:02:55 ******/
CREATE STATISTICS [_dta_stat_481436789_2_1] ON
[dbo].[tblRootPhrase]([RootPhrase], [RootPhraseID])


CREATE TABLE [dbo].[TableG](
    [LocalizationID] [smallint] IDENTITY(1,1) NOT NULL,
    [Localization] [varchar](100) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF

Show quote
"Erland Sommarskog" wrote:

> Curtis (Cur***@discussions.microsoft.com) writes:
> > It takes 1:35 the second time I run it. I placed a new record in TableA
> > to make sure it was just grabbing new records. It takes just as long to
> > complete if I don't add a new record to TableA. I would expect it to
> > complete quickly because there is nothing to add to TableD. Does this
> > seem abnormal to you? 
>
> Abnormal and abnormal, it indicates that there is some indexing work to
> do. It is not all that uncommon to have queries that takes over a minute
> to find out rhat there is nothing all to return.
>
> > Select *
> > From dbo.TableA
> > Where not exists(Select ScanJobHeaderID From dbo.TableD ja Where
> > ja.ScanJobDetailID = a.ScanJobDetailID)
> >
> > To see if it was the source of the delay and it's not. The delay comes
> > from the Joins.
>
> The more tables you throw into the mix, the more choices the optimizer gets,
> and the more chances for bad choices.
>
> I don't remember the tables exactly. Or rather, I recall that they looked
> a bit incomplete, as not all had primary keys, and I don't know about
> other indexes.
>
> --
> Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se
>
> Books Online for SQL Server 2005 at
> http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
> Books Online for SQL Server 2000 at
> http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
>
Author
24 Aug 2006 10:09 PM
Erland Sommarskog
Curtis (Cur***@discussions.microsoft.com) writes:
> I have spent days trying to optimize the query that works with these
> records and I can't think of any other way to make it more efficient. It
> executes quickly until I add Order by NewID(), but I have to grab a
> random sample of search engines. Would you mind looking at my query to
> see if your experience can see a better way? The maximum time it takes
> to complete is 4 seconds.

It's difficult to suggest improvements with knowledge about all parts
of the queries, indexes, and not talking about the business logic.

The ORDER BY newid() appears here as far as I can see:

    Select Top 10 SearchEngineID
    From dbo.getAvailableSearchEngines(@ScanMachineID)
    Order by NewID()

Does this query run quickly on its own? In such case, it may be an idea
to have a temp table to where you insert the search engines. Temp tables
has statistics, and that can help you get a better query plan for the
total query.

--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
25 Aug 2006 12:33 AM
Alexander Kuznetsov
Hi Erland,

I sentyou an E-mail to the e-mail account that shows on your website.
Did you get it?

AddThis Social Bookmark Button