|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
CLR and InsertIs it possible to insert records into the database from a CLR function?
I need to insert thousands of records based on For Each loops. I have my function done except for the inserting part. I was hoping to complete the task inside SQL, so I can schedule the task using Agent. If I can't construct the insert statement inside my function is there any other way to do it? Curtis (Cur***@discussions.microsoft.com) writes:
> Is it possible to insert records into the database from a CLR function? You can't insert data from a function. Functions are for data-retrieval> > I need to insert thousands of records based on For Each loops. I have my > function done except for the inserting part. I was hoping to complete the > task inside SQL, so I can schedule the task using Agent. > > If I can't construct the insert statement inside my function is there any > other way to do it? only, and are not permitted to change database state. But you could insert data from a stored procedure in the CLR. However, I get a little worrried when you say For Each loops. That is not the most effective way to do it. Try to insert all at once. This can be done by creating an XML document that you shred with .nodes(). Or you can generate an INSERT statement that goes: INSERT tbl (col1, col2, col3, ...) EXEC ('SELECT 1, ''AD'', 3, ... SELECT 12, ''PO'', 24, ... ...') From where does the data come? Since you plan to run it from Agent, it does not sound like comes from the outside (although it's possible to do external access from a CLR procedure). -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx The data is coming from tables in the database.
Here is my code: Dim ClientID As Integer Dim ScanJobHeaderID As Integer Dim SearchEngineGroupID As Integer Dim SearchEngineID As Integer Dim PageSyntaxID As Integer Dim EndJob As Integer Dim ScanJobHeader As New DataTable Dim SearchEngine As New DataTable Dim KeywordSearch As New DataTable Do ScanJobHeader.Rows.Clear() SearchEngine.Rows.Clear() KeywordSearch.Rows.Clear() 'getScanJobHeader() Using sql As New SqlConnection("context connection=true") 'Dump Scan Job Header info into datatable '>ClientID '>ScanJobJeaderID '>SearchEngineGroupID Using adapterScanJobHeader As New SqlDataAdapter Using selectScanJobHeader As SqlCommand = New SqlCommand("Select * From dbo.getScanJobHeader( )", sql) sql.Open() selectScanJobHeader.Connection = sql adapterScanJobHeader.SelectCommand = selectScanJobHeader adapterScanJobHeader.Fill(ScanJobHeader) End Using End Using For Each dr As DataRow In ScanJobHeader.Rows If Not dr.IsNull(0) Then ClientID = ScanJobHeader.Rows(0).Item(0) ScanJobHeaderID = ScanJobHeader.Rows(0).Item(1) SearchEngineGroupID = ScanJobHeader.Rows(0).Item(2) EndJob = 1 Dim i As Integer = 0 Dim j As Integer = 0 'Dump Search Engine info into datatable '>SearchEngineID '>PageSyntaxID Using selectSearchEngine As SqlCommand = New SqlCommand("Select * From dbo.getSearchEngineID(" & SearchEngineGroupID & ",1)", sql) Dim adapterSearchEngine As New SqlDataAdapter adapterSearchEngine.SelectCommand = selectSearchEngine adapterSearchEngine.Fill(SearchEngine) End Using 'Dump Keyword info into datatable '>KeywordSearchID Using selectKeywordSearch As SqlCommand = New SqlCommand("Select * From dbo.getKeywordSearchID(" & ClientID & ")", sql) Dim adapterKeywordSearch As New SqlDataAdapter adapterKeywordSearch.SelectCommand = selectKeywordSearch adapterKeywordSearch.Fill(KeywordSearch) End Using If SearchEngine.Rows.Count > 0 Then For Each dr1 As DataRow In SearchEngine.Rows SearchEngineID = SearchEngine.Rows(i).Item(0) PageSyntaxID = SearchEngine.Rows(i).Item(1) For Each dr2 As DataRow In KeywordSearch.Rows Dim insertCommand As SqlCommand = New SqlCommand("exec dbo.insertBillingScanJobDetails 1, 1, 1, 3", sql) insertCommand.Connection = sql insertCommand.ExecuteNonQuery() j += 1 Next i += 1 j = 0 Next End If i = 0 Else EndJob = 0 End If Next End Using Loop Until EndJob = 0 I tried executing a stored procedure to insert the records, but I get: Invalid use of side-effecting or time-dependent operator in 'INSERT' within a function. Based on my code do you still think looping is not the best option? Could you explain more on how I would create the XML document? I assume I would pass the XML as a parameter to the insert procedure and then use a Select statement on the XML to insert the records into the database. Is it possible to send the XML without creating a file on the server? Show quote "Erland Sommarskog" wrote: > Curtis (Cur***@discussions.microsoft.com) writes: > > Is it possible to insert records into the database from a CLR function? > > > > I need to insert thousands of records based on For Each loops. I have my > > function done except for the inserting part. I was hoping to complete the > > task inside SQL, so I can schedule the task using Agent. > > > > If I can't construct the insert statement inside my function is there any > > other way to do it? > > You can't insert data from a function. Functions are for data-retrieval > only, and are not permitted to change database state. > > But you could insert data from a stored procedure in the CLR. However, > I get a little worrried when you say For Each loops. That is not the > most effective way to do it. Try to insert all at once. This can be > done by creating an XML document that you shred with .nodes(). Or > you can generate an INSERT statement that goes: > > INSERT tbl (col1, col2, col3, ...) > EXEC ('SELECT 1, ''AD'', 3, ... > SELECT 12, ''PO'', 24, ... > ...') > > > From where does the data come? Since you plan to run it from Agent, > it does not sound like comes from the outside (although it's possible > to do external access from a CLR procedure). > > > -- > Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se > > Books Online for SQL Server 2005 at > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx > Books Online for SQL Server 2000 at > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > Curtis (Cur***@discussions.microsoft.com) writes:
> The data is coming from tables in the database. If you are to insert table into one table from other tables in the database, then you should perform the selection and insertion of the data from a stored procedure in T-SQL. If there is some complex processing, it may be worth using a CLR function for that. Permit me a few more comments on your code: > Using selectScanJobHeader As SqlCommand = New Never use SELECT * in production code, but explicitly list the columns> SqlCommand("Select * From dbo.getScanJobHeader( )", sql) you need. This makes tracking easier, is there is a need to change to change the function/table/view that you are selecting from. > Using selectKeywordSearch As SqlCommand = New Always use parameterised commands, never interpolate parameter values> SqlCommand("Select * From dbo.getKeywordSearchID(" & ClientID & ")", sql) directly. This is a basic protection against SQL injection, and it makes more effecient use of the plan cache in SQL Server. See the section http://www.sommarskog.se/dynamic_sql.html#queryplans, which also has an example of a parameterised command with SqlClient. > Dim insertCommand As SqlCommand = New When calling stored procedures you should use CommandType.StoredProcedure> SqlCommand("exec dbo.insertBillingScanJobDetails 1, 1, 1, 3", sql) and not use EXEC statements for efficientcy. > I tried executing a stored procedure to insert the records, but I get: Yes, you need to use a stored procedure to perform updates. And that > Invalid use of side-effecting or time-dependent operator in 'INSERT' > within a function. you *can* do in the CLR, but I think it's a poor design. > Based on my code do you still think looping is not the best option? I was not able to grasp if the code was doing anything in particular thanjust moving the data, but I did not analyse it in detail. But I'm fairly confident that it can all be done in T-SQL. > Could you explain more on how I would create the XML document? I assume I You don't need to create a file, no. I don't really know who to create> would pass the XML as a parameter to the insert procedure and then use a > Select statement on the XML to insert the records into the database. Is it > possible to send the XML without creating a file on the server? XML documents from data, as I am an SQL programmer, not a .Net person. But our GUI developers frequently employ this technique when they want to insert lots of data in one go. And they are even using VB6. Anyway using XML would be an overkill in this case, as the entire CLR thing is an overkill. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx Instead of posting code you should first explain what you actually need. And
as Erland already mentioned - you can't use SQL functions to alter database state (e.g. modify data in tables). As far as I can see a set-based solution is possible here or am I missing something? Anyway, it's difficult to say based on your post(s), so provide proper DDL, sample data and expected results to get a better answer. ML --- http://milambda.blogspot.com/ Let me see if I can explain what I am trying to do effectively.
TableA ClientID int, ScanJobHeaderID int, SearchEngineGroupID int TableB SearchEngineID int, PageSyntaxID int TableC KeywordSearchID int TableD KeywordSearchID int, SearchEngineID int, PageSyntaxID int TableE ScanJobHeaderID int, TableDID int Query 1: Select ClientID, ScanJobHeaderID, SearchEngineGroupID from TableA Returns 1 Row Query 2: Select SearchEngineID, PageSyntaxID from TableB based on the SearchEngineGroupID Returns multiple rows Query 3: Select ClientID from TableC based on the ClientID Returns multiple rows Do Query 1 For each row in Query 2 i <advance to the next row each time and assign the correct value to be used in the insert statement> For each row in Query 3 j <advance to the next row each time and assign the correct value to be used in the insert statement> Insert KeywordSearchID, SearchEngineID, PageSyntaxID, ScanJobHeaderID j +=1 Next 1+=1 Next Loop I would like to do this in SQL, so I can trigger it when a new record is entered into Table A. I use Scop_Identity() that is used in TableE The result is over 1 million records entered into TableD and TableE Show quote "ML" wrote: > Instead of posting code you should first explain what you actually need. And > as Erland already mentioned - you can't use SQL functions to alter database > state (e.g. modify data in tables). > > As far as I can see a set-based solution is possible here or am I missing > something? > > Anyway, it's difficult to say based on your post(s), so provide proper DDL, > sample data and expected results to get a better answer. > > > ML > > --- > http://milambda.blogspot.com/ You're either not thinking in sets or I'm missing something.
For-each row in set one do a transformation in sets two and three? That's what JOINs are for. We can help you better if you follow these instructions: http://www.aspfaq.com/etiquette.asp?id=5006 Anyway, what you need to do is something like this: 1) create one query where TableA and TableB are joined on the common column (SearchEngineGroupID) and then use values from the tables to come up with one set of rows to insert; 2) do pretty much the same for TableA joined to TableC on the common column (ClientID) to come up with another set of rows. OR: Join all three tables together on appropriate columns to come up with a single set of rows. This is SQL - we carry water in buckets, not spoons. ML --- http://milambda.blogspot.com/ Curtis (Cur***@discussions.microsoft.com) writes:
> Query 1: SELECT A.ClientID, A.ScanJobHeaderID, B.SearchEngineID, B.PageSyntaxID> Select ClientID, ScanJobHeaderID, SearchEngineGroupID from TableA > Returns 1 Row > > Query 2: > Select SearchEngineID, PageSyntaxID from TableB based on the > SearchEngineGroupID > Returns multiple rows > > Query 3: > Select ClientID from TableC based on the ClientID > Returns multiple rows FROM TableA A JOIN TableB B ON A.SearchEngineGroupID = B.SearchEngineGroupID JOIN TableC C ON A.ClientID = C.ClientID WHERE A.xxx = @ So suddently three queries are one. If I had better description of the tables you are inserting into, I would have written the INSERT statements as well. There is a mention of scope_identity() that makes me a little nervous. IDENTITY columns are usually a cause for problems when you need to insert referring rows. The simplest remedy is often to simple get rid of the IDENTITY property. > I would like to do this in SQL, so I can trigger it when a new record is Ah, so change the query into:> entered into Table A. SELECT A.ClientID, A.ScanJobHeaderID, B.SearchEngineID, B.PageSyntaxID FROM inserted A JOIN TableB B ON A.SearchEngineGroupID = B.SearchEngineGroupID JOIN TableC C ON A.ClientID = C.ClientID (It should appear in a trigger.) Note that this handle the case that many rows are inserted at once. > The result is over 1 million records entered into TableD and TableE Don't loop to insert one million rows! That's completely irresponsible!With those volumes of data, you must work set-based! Running a loop could take hours! A single insert should complete in a minute or two. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx CREATE TABLE [dbo].[TableA](
[ScanJobHeaderID] [int] IDENTITY(1,1) NOT NULL, [ClientID] [smallint] NOT NULL [SearchEngineGroupID] [smallint] NOT NULL CONSTRAINT [PK_TableA] PRIMARY KEY CLUSTERED ( [ScanJobHeaderID] ASC )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] CREATE TABLE [dbo].[j-TableA]( [ScanJobHeaderID] [int] NOT NULL, [ScanJobDetailID] [int] NOT NULL ) ON [PRIMARY] Select Top 1 t1.ClientID, t1.ScanJobHeaderID, t1.SearchEngineGroupID From dbo.TableA t1 LEFT JOIN dbo.[j-TableA] t2 ON t1.ScanJobHeaderID = t2.ScanJobHeaderID Where t2.ScanJobHeaderID IS NULL Order by t1.ScanJobHeaderID Above is the tables and query for the first part (Query1) CREATE TABLE [dbo].[TableB]( [PageSyntaxID] [int] NOT NULL, [SearchEngineID] [int] NULL, [PageNumber] [int] NULL ) ON [PRIMARY] CREATE TABLE [dbo].[j-TableB]( [SearchEngineID] [tinyint] NOT NULL, [SearchEngineGroupID] [smallint] NOT NULL ) ON [PRIMARY] Select t1.SearchEngineID, t2.PageSyntaxID From dbo.[j-tblSearchEngineGroup] t1 INNER JOIN dbo.tblSearchEnginePageSyntax t2 on t2.SearchEngineID = t1.SearchEngineID Where (SearchEngineGroupID = @SearchEngineGroupID) and (t2.PageNumber = @PageNumber) Above is the tables and query for the first part (Query2) CREATE TABLE [dbo].[TableC]( [KeywordSearchID] [int] NOT NULL CONSTRAINT [PK_TableC] PRIMARY KEY CLUSTERED ( [KeywordSearchID] ASC )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] CREATE TABLE [dbo].[j-TableC]( [ClientID] [smallint] NOT NULL, [KeywordSearchID] [int] NOT NULL ) ON [PRIMARY] Select t1.KeywordSearchID From dbo.TableC t1 INNER JOIN dbo.[j-TableC] t2 on t2.KeywordSearchID = t1.KeywordSearchID Where (t2.ClientID = @ClientID) and t1.Billable = '1' Above is the tables and query for the first part (Query3) CREATE TABLE [dbo].[TableD]( [ScanJobDetailID] [int] IDENTITY(1,1) NOT NULL, [KeywordSearchID] [int] NULL, [SearchEngineID] [tinyint] NULL, [SearchEnginePageID] [tinyint] NULL CONSTRAINT [PK_TableD] PRIMARY KEY CLUSTERED ( [ScanJobDetailID] ASC )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] CREATE TABLE [dbo].[j-TableE]( [ScanJobHeaderID] [int] NOT NULL, [ScanJobDetailID] [int] NOT NULL ) ON [PRIMARY] Declare @ScanJobDetailID int Insert dbo.TableD (KeywordSearchID, SearchEngineID, SearchEnginePageID) Values (@KeyWordSearchID, @SearchEngineID, @SearchEnginePageID) Set @ScanJobDetailID = SCOPE_IDENTITY() Insert dbo.[j-TableE] (ScanJobHeaderID, ScanJobDetailID) Values (@ScanJobHeaderID, @ScanJobDetailID) I don't understand how I can get this into one set of results. Thank you both for your assistance. Show quote "Erland Sommarskog" wrote: > Curtis (Cur***@discussions.microsoft.com) writes: > > Query 1: > > Select ClientID, ScanJobHeaderID, SearchEngineGroupID from TableA > > Returns 1 Row > > > > Query 2: > > Select SearchEngineID, PageSyntaxID from TableB based on the > > SearchEngineGroupID > > Returns multiple rows > > > > Query 3: > > Select ClientID from TableC based on the ClientID > > Returns multiple rows > > SELECT A.ClientID, A.ScanJobHeaderID, B.SearchEngineID, B.PageSyntaxID > FROM TableA A > JOIN TableB B ON A.SearchEngineGroupID = B.SearchEngineGroupID > JOIN TableC C ON A.ClientID = C.ClientID > WHERE A.xxx = @ > > So suddently three queries are one. If I had better description of > the tables you are inserting into, I would have written the INSERT > statements as well. There is a mention of scope_identity() that > makes me a little nervous. IDENTITY columns are usually a cause > for problems when you need to insert referring rows. The simplest > remedy is often to simple get rid of the IDENTITY property. > > > I would like to do this in SQL, so I can trigger it when a new record is > > entered into Table A. > > Ah, so change the query into: > > SELECT A.ClientID, A.ScanJobHeaderID, B.SearchEngineID, B.PageSyntaxID > FROM inserted A > JOIN TableB B ON A.SearchEngineGroupID = B.SearchEngineGroupID > JOIN TableC C ON A.ClientID = C.ClientID > > (It should appear in a trigger.) Note that this handle the case that many > rows are inserted at once. > > > The result is over 1 million records entered into TableD and TableE > > Don't loop to insert one million rows! That's completely irresponsible! > With those volumes of data, you must work set-based! Running a loop > could take hours! A single insert should complete in a minute or two. > > -- > Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se > > Books Online for SQL Server 2005 at > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx > Books Online for SQL Server 2000 at > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > Curtis (Cur***@discussions.microsoft.com) writes:
> I don't understand how I can get this into one set of results. In SQL 2000, the IDENTITY column would had caused problems, but on SQL 2005,> > Thank you both for your assistance. the new OUTPUT clause can sometimes save the situation, and this is such a case. As I understand it, the second INSERT into j-TableE just inserts all generated ids from the first query, gathered under the ScanJobHeaderID from the starting query. Below is a query batch with your tables and one SELECT, and two INSERT. It compiles, but without test data I cannot guarantee that it produces the correct result. Also, you obfustication of the original table names was not complete, so I had to make some assumptions. Still there is one thing that nags me: why the TOP 1 in the first query? Don't you ever want to handle multiple clients at the same time? DECLARE @ClientID smallint, @PageNumber int, @SearchEngineGroupID int, @ScanJobDetailID int, @KeyWordSearchID int, @SearchEngineID int, @SearchEnginePageID int, @ScanJobHeaderID int CREATE TABLE [dbo].[TableA]( [ScanJobHeaderID] [int] IDENTITY(1,1) NOT NULL, [ClientID] [smallint] NOT NULL, [SearchEngineGroupID] [smallint] NOT NULL CONSTRAINT [PK_TableA] PRIMARY KEY CLUSTERED ( [ScanJobHeaderID] ASC )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] CREATE TABLE [dbo].[j-TableA]( [ScanJobHeaderID] [int] NOT NULL, [ScanJobDetailID] [int] NOT NULL ) ON [PRIMARY] --Above is the tables and query for the first part (Query1) CREATE TABLE [dbo].[TableB]( [PageSyntaxID] [int] NOT NULL, [SearchEngineID] [int] NULL, [PageNumber] [int] NULL ) ON [PRIMARY] CREATE TABLE [dbo].[j-TableB]( [SearchEngineID] [tinyint] NOT NULL, [SearchEngineGroupID] [smallint] NOT NULL ) ON [PRIMARY] --Above is the tables and query for the first part (Query2) CREATE TABLE [dbo].[TableC]( [KeywordSearchID] [int] NOT NULL, Billable char(1) NOT NULL CONSTRAINT [PK_TableC] PRIMARY KEY CLUSTERED ( [KeywordSearchID] ASC )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] CREATE TABLE [dbo].[j-TableC]( [ClientID] [smallint] NOT NULL, [KeywordSearchID] [int] NOT NULL ) ON [PRIMARY] --Above is the tables and query for the first part (Query3) CREATE TABLE [dbo].[TableD]( [ScanJobDetailID] [int] IDENTITY(1,1) NOT NULL, [KeywordSearchID] [int] NULL, [SearchEngineID] [tinyint] NULL, [SearchEnginePageID] [tinyint] NULL CONSTRAINT [PK_TableD] PRIMARY KEY CLUSTERED ( [ScanJobDetailID] ASC )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] CREATE TABLE [dbo].[j-TableE]( [ScanJobHeaderID] [int] NOT NULL, [ScanJobDetailID] [int] NOT NULL ) ON [PRIMARY] DECLARE @IDs TABLE (id int NOT NULL) Select Top 1 @ClientID = t1.ClientID, @ScanJobHeaderID = t1.ScanJobHeaderID, @SearchEngineGroupID = t1.SearchEngineGroupID From dbo.TableA t1 LEFT JOIN dbo.[j-TableA] t2 ON t1.ScanJobHeaderID = t2.ScanJobHeaderID Where t2.ScanJobHeaderID IS NULL Order by t1.ScanJobHeaderID IF @@rowcount > 0 BEGIN Insert dbo.TableD (KeywordSearchID, SearchEngineID, SearchEnginePageID) OUTPUT inserted.ScanJobDetailID INTO @IDs(id) Select C.KeyWordSerarchID, jB.SearchEngineID, jb.PageSyntaxID From dbo.[j-TableB] jB JOIN dbo.TableB B on jB.SearchEngineID = B.SearchEngineID CROSS JOIN (dbo.TableC C JOIN dbo.[j-TableC] C on jC.KeywordSearchID = C.KeywordSearchID) Where B.SearchEngineGroupID = @SearchEngineGroupID AND B.PageNumber = @PageNumber AND C.Billlable = '1' Insert dbo.[j-TableE] (ScanJobHeaderID, ScanJobDetailID) SELECT @ScanJobHeaderID, id FROM @IDs END go drop Table TableA, [j-TableA], TableB, [j-TableB], TableC, [j-TableC], TableD, [j-TableE] -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx I am working on digesting your last response, so I can understand how it
works. Yes, I will be working with more than one client, but in my loop scenario I was working with one client at a time. After the loop finished the work for one client that client wasn't available in Query1. Basically if ScanJobHeaderID exists in TableA and j-TableA then I don't want it included in Query1. Show quote "Erland Sommarskog" wrote: > Curtis (Cur***@discussions.microsoft.com) writes: > > I don't understand how I can get this into one set of results. > > > > Thank you both for your assistance. > > In SQL 2000, the IDENTITY column would had caused problems, but on SQL 2005, > the new OUTPUT clause can sometimes save the situation, and this is such a > case. As I understand it, the second INSERT into j-TableE just inserts > all generated ids from the first query, gathered under the ScanJobHeaderID > from the starting query. > > Below is a query batch with your tables and one SELECT, and two INSERT. > It compiles, but without test data I cannot guarantee that it produces > the correct result. Also, you obfustication of the original table names > was not complete, so I had to make some assumptions. > > Still there is one thing that nags me: why the TOP 1 in the first query? > Don't you ever want to handle multiple clients at the same time? > > DECLARE @ClientID smallint, > @PageNumber int, > @SearchEngineGroupID int, > @ScanJobDetailID int, > @KeyWordSearchID int, > @SearchEngineID int, > @SearchEnginePageID int, > @ScanJobHeaderID int > > CREATE TABLE [dbo].[TableA]( > [ScanJobHeaderID] [int] IDENTITY(1,1) NOT NULL, > [ClientID] [smallint] NOT NULL, > [SearchEngineGroupID] [smallint] NOT NULL > CONSTRAINT [PK_TableA] PRIMARY KEY CLUSTERED > ( > [ScanJobHeaderID] ASC > )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY] > ) ON [PRIMARY] > > CREATE TABLE [dbo].[j-TableA]( > [ScanJobHeaderID] [int] NOT NULL, > [ScanJobDetailID] [int] NOT NULL > ) ON [PRIMARY] > > > --Above is the tables and query for the first part (Query1) > > CREATE TABLE [dbo].[TableB]( > [PageSyntaxID] [int] NOT NULL, > [SearchEngineID] [int] NULL, > [PageNumber] [int] NULL > ) ON [PRIMARY] > > CREATE TABLE [dbo].[j-TableB]( > [SearchEngineID] [tinyint] NOT NULL, > [SearchEngineGroupID] [smallint] NOT NULL > ) ON [PRIMARY] > > > --Above is the tables and query for the first part (Query2) > > CREATE TABLE [dbo].[TableC]( > [KeywordSearchID] [int] NOT NULL, > Billable char(1) NOT NULL > CONSTRAINT [PK_TableC] PRIMARY KEY CLUSTERED > ( > [KeywordSearchID] ASC > )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY] > ) ON [PRIMARY] > > CREATE TABLE [dbo].[j-TableC]( > [ClientID] [smallint] NOT NULL, > [KeywordSearchID] [int] NOT NULL > ) ON [PRIMARY] > > > --Above is the tables and query for the first part (Query3) > > CREATE TABLE [dbo].[TableD]( > [ScanJobDetailID] [int] IDENTITY(1,1) NOT NULL, > [KeywordSearchID] [int] NULL, > [SearchEngineID] [tinyint] NULL, > [SearchEnginePageID] [tinyint] NULL > CONSTRAINT [PK_TableD] PRIMARY KEY CLUSTERED > ( > [ScanJobDetailID] ASC > )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY] > ) ON [PRIMARY] > > CREATE TABLE [dbo].[j-TableE]( > [ScanJobHeaderID] [int] NOT NULL, > [ScanJobDetailID] [int] NOT NULL > ) ON [PRIMARY] > > DECLARE @IDs TABLE (id int NOT NULL) > > Select Top 1 @ClientID = t1.ClientID, @ScanJobHeaderID = t1.ScanJobHeaderID, > @SearchEngineGroupID = t1.SearchEngineGroupID > From dbo.TableA t1 > LEFT JOIN dbo.[j-TableA] t2 ON t1.ScanJobHeaderID = t2.ScanJobHeaderID > Where t2.ScanJobHeaderID IS NULL > Order by t1.ScanJobHeaderID > > IF @@rowcount > 0 > BEGIN > Insert dbo.TableD (KeywordSearchID, SearchEngineID, SearchEnginePageID) > OUTPUT inserted.ScanJobDetailID INTO @IDs(id) > Select C.KeyWordSerarchID, jB.SearchEngineID, jb.PageSyntaxID > From dbo.[j-TableB] jB > JOIN dbo.TableB B on jB.SearchEngineID = B.SearchEngineID > CROSS JOIN (dbo.TableC C > JOIN dbo.[j-TableC] C on jC.KeywordSearchID = C.KeywordSearchID) > Where B.SearchEngineGroupID = @SearchEngineGroupID > AND B.PageNumber = @PageNumber > AND C.Billlable = '1' > > Insert dbo.[j-TableE] (ScanJobHeaderID, ScanJobDetailID) > SELECT @ScanJobHeaderID, id > FROM @IDs > END > > go > drop Table TableA, [j-TableA], TableB, [j-TableB], TableC, [j-TableC], TableD, [j-TableE] > > -- > Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se > > Books Online for SQL Server 2005 at > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx > Books Online for SQL Server 2000 at > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > > Curtis (Cur***@discussions.microsoft.com) writes:
> I am working on digesting your last response, so I can understand how it If you are used to think in loops, then it may take some to digest.> works. But it will be well-spent time. Because this is really how operate in SQL. You operate on sets. Sometimes looping may be called for, but you should always try to avoid it. There is a dramatic difference in execution time for the two approaches. And I mean dramatic: you mentioned a million rows. Looping could take an hour or two. A set-based operation could complete within the minute. > Yes, I will be working with more than one client, but in my loop Then probably the next step is to handle all customers at a time as> scenario I was working with one client at a time. After the loop > finished the work for one client that client wasn't available in Query1. > Basically if ScanJobHeaderID exists in TableA and j-TableA then I don't > want it included in Query1. well. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx I am trying my luck with a cursor to deal with all of the clients at one
time. It seems to be taking a long time to complete. DECLARE @ClientID smallint, @PageNumber int, @SearchEngineGroupID int, @ScanJobDetailID int, @KeyWordSearchID int, @SearchEngineID int, @SearchEnginePageID int, @ScanJobHeaderID int DECLARE @IDs TABLE (id int NOT NULL) Declare HeaderCursor Cursor FAST_FORWARD FOR Select t1.ClientID, t1.ScanJobHeaderID, t1.SearchEngineGroupID From dbo.TableA t1 LEFT JOIN dbo.[j-TableA] t2 ON t1.ScanJobHeaderID = t2.ScanJobHeaderID Where t2.ScanJobHeaderID IS NULL Order by t1.ScanJobHeaderID Open HeaderCursor FETCH NEXT FROM HeaderCursor into @ClientID, @ScanJobHeaderID, @SearchEngineGroupID While @@fetch_status =0 BEGIN Insert dbo.TableD (KeywordSearchID, SearchEngineID, SearchEnginePageID) OUTPUT inserted.ScanJobDetailID INTO @IDs(id) Select C.KeyWordSearchID, jB.SearchEngineID, B.PageSyntaxID From dbo.[j-TableB] jB JOIN dbo.TableB B on jB.SearchEngineID = B.SearchEngineID CROSS JOIN ( dbo.TableC C JOIN dbo.[j-TableC] jc on jC.KeywordSearchID = C.KeywordSearchID) Where jb.SearchEngineGroupID = @SearchEngineGroupID and jc.ClientID = @ClientID AND B.PageNumber = '1'/*@PageNumber*/ AND C.Billable = '1' Insert dbo.[j-TableA] (ScanJobHeaderID, ScanJobDetailID) SELECT @ScanJobHeaderID, id FROM @IDs FETCH NEXT FROM HeaderCursor into @ClientID, @ScanJobHeaderID, @SearchEngineGroupID END Close HeaderCursor DEALLOCATE HeaderCursor I am expecting the final product to be 855,555 rows added. It's been running for 15 minutes and it's only at 349,630. Is a cursor more efficient then using a while loop and calling the client one at a time? Show quote "Erland Sommarskog" wrote: > Curtis (Cur***@discussions.microsoft.com) writes: > > I am working on digesting your last response, so I can understand how it > > works. > > If you are used to think in loops, then it may take some to digest. > But it will be well-spent time. Because this is really how operate in > SQL. You operate on sets. Sometimes looping may be called for, but you > should always try to avoid it. There is a dramatic difference in > execution time for the two approaches. And I mean dramatic: you mentioned > a million rows. Looping could take an hour or two. A set-based operation > could complete within the minute. > > > Yes, I will be working with more than one client, but in my loop > > scenario I was working with one client at a time. After the loop > > finished the work for one client that client wasn't available in Query1. > > Basically if ScanJobHeaderID exists in TableA and j-TableA then I don't > > want it included in Query1. > > Then probably the next step is to handle all customers at a time as > well. > > -- > Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se > > Books Online for SQL Server 2005 at > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx > Books Online for SQL Server 2000 at > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > Curtis (Cur***@discussions.microsoft.com) writes:
> I am trying my luck with a cursor to deal with all of the clients at one No wonder. Didn't I say that looping is the slow way to access a database?> time. It seems to be taking a long time to complete. > I am expecting the final product to be 855,555 rows added. It's been When you need to loop, a cursor is often less bad than any other sort > running for 15 minutes and it's only at 349,630. Is a cursor more > efficient then using a while loop and calling the client one at a time? of loop. But it's slow. There is a bug in your code which may affect execution though: you don't empty the @IDs table, so you are inserting far to much data into the second table. That could explain some of the delay. (And that also shows the dangers with looping. You cannot do such mistakes when you work set- based.) Obviously all clients should be handled in one go, and for the first INSERT it's not that difficult: Insert dbo.TableD (KeywordSearchID, SearchEngineID, SearchEnginePageID) OUTPUT inserted.ScanJobDetailID INTO @IDs(id) Select C.KeyWordSearchID, jB.SearchEngineID, B.PageSyntaxID From dbo.TableA A JOIN dbo.[j-TableB] jB on jB.SearchEngineGroupID = A.SearchEngineGroupID JOIN dbo.TableB B ON jB.SearchEngineID = B.SearchEngineID JOIN dbo.[j-TableC] jc ON jc.ClientID = A.ClientID JOIN dbo.TableC C ON jC.KeywordSearchID = C.KeywordSearchID Where NOT EXISTS (SELECT * FROM dbo.[j-TableA] WHERE A.ScanJobHeaderID = jA.ScanJobHeaderID) AND B.PageNumber = '1'/*@PageNumber*/ AND C.Billable = '1' (With the general disclaimer that I have about zero understanding of what this stuff is actually doing.) The problem is with the second table. Since we are now inserting many clients at the same time, there no longer any apparent relation between the ScanJobHeaderID and the ids generated in TableD. If many clients have the same ScanJobHeaderID, you can improve things by looping over the ScanJobHeaderIDs instead (add "AND A.ScanJobHeaderID = @ScanJobHeaderIDs".) If it is possible to change TableD, so that they key column does not have the IDENTITY property, then it's possible to devise a solution that handles all clients in one go. (Actually that solution could be possible with IDENTITY as well, if you permit SET IDENTITY_INSERT ON. But I don't think SET IDENTITY_INSERT belongs in production code.) -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx Thank you for catching the bug. The process completed in a minute after I
corrected the problem. I can't change the table structure of TableD. Thank you for all your help. Show quote "Erland Sommarskog" wrote: > Curtis (Cur***@discussions.microsoft.com) writes: > > I am trying my luck with a cursor to deal with all of the clients at one > > time. It seems to be taking a long time to complete. > > No wonder. Didn't I say that looping is the slow way to access a database? > > > I am expecting the final product to be 855,555 rows added. It's been > > running for 15 minutes and it's only at 349,630. Is a cursor more > > efficient then using a while loop and calling the client one at a time? > > When you need to loop, a cursor is often less bad than any other sort > of loop. But it's slow. > > There is a bug in your code which may affect execution though: you don't > empty the @IDs table, so you are inserting far to much data into the second > table. That could explain some of the delay. (And that also shows the > dangers with looping. You cannot do such mistakes when you work set- > based.) > > Obviously all clients should be handled in one go, and for the first > INSERT it's not that difficult: > > Insert dbo.TableD (KeywordSearchID, SearchEngineID, SearchEnginePageID) > OUTPUT inserted.ScanJobDetailID INTO @IDs(id) > Select C.KeyWordSearchID, jB.SearchEngineID, B.PageSyntaxID > From dbo.TableA A > JOIN dbo.[j-TableB] jB > on jB.SearchEngineGroupID = A.SearchEngineGroupID > JOIN dbo.TableB B ON jB.SearchEngineID = B.SearchEngineID > JOIN dbo.[j-TableC] jc ON jc.ClientID = A.ClientID > JOIN dbo.TableC C ON jC.KeywordSearchID = C.KeywordSearchID > Where NOT EXISTS (SELECT * > FROM dbo.[j-TableA] > WHERE A.ScanJobHeaderID = jA.ScanJobHeaderID) > AND B.PageNumber = '1'/*@PageNumber*/ > AND C.Billable = '1' > > (With the general disclaimer that I have about zero understanding of > what this stuff is actually doing.) > > The problem is with the second table. Since we are now inserting many > clients at the same time, there no longer any apparent relation between > the ScanJobHeaderID and the ids generated in TableD. If many clients > have the same ScanJobHeaderID, you can improve things by looping over > the ScanJobHeaderIDs instead (add "AND A.ScanJobHeaderID = > @ScanJobHeaderIDs".) > > If it is possible to change TableD, so that they key column does not > have the IDENTITY property, then it's possible to devise a solution > that handles all clients in one go. > > (Actually that solution could be possible with IDENTITY as well, if > you permit SET IDENTITY_INSERT ON. But I don't think SET > IDENTITY_INSERT belongs in production code.) > > > > -- > Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se > > Books Online for SQL Server 2005 at > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx > Books Online for SQL Server 2000 at > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > Curtis (Cur***@discussions.microsoft.com) writes:
> Thank you for catching the bug. The process completed in a minute after I A minute is still a tad slow, but if it's acceptable to you, maybe it's> corrected the problem. time to look for new feats. > I can't change the table structure of TableD. Too bad. (But tell the DBA or whoever designed it that IDENTITY was noblessing this time, but a curse.) -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx I am rather new to SQL, so I was thinking 1 minute for that many records was
pretty good. I want to do this the best way. If I change TableD so that it does not have the IDENTITY then how would I assign a value to that column each time I run the query? Show quote "Erland Sommarskog" wrote: > Curtis (Cur***@discussions.microsoft.com) writes: > > Thank you for catching the bug. The process completed in a minute after I > > corrected the problem. > > A minute is still a tad slow, but if it's acceptable to you, maybe it's > time to look for new feats. > > > I can't change the table structure of TableD. > > Too bad. (But tell the DBA or whoever designed it that IDENTITY was no > blessing this time, but a curse.) > > -- > Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se > > Books Online for SQL Server 2005 at > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx > Books Online for SQL Server 2000 at > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > Curtis (Cur***@discussions.microsoft.com) writes:
> I am rather new to SQL, so I was thinking 1 minute for that many records Actually, when I look at your tables and the queries, I don't understand> was pretty good. I want to do this the best way. > > If I change TableD so that it does not have the IDENTITY then how would I > assign a value to that column each time I run the query? why you have this [j-TableE]. There seems to be a one-to-one relation to TableD, so if you just added ScanJobHeaderID to that table, there would be no problem at all. But it could very well be that I have mangled the table relations when I have rearranged your original code. Keep in mind that while I know SQL, I know zero about the business your tables relates to. All that I am able to is mechanical changes. Anyway, if there has two be two different tables, you could do as below. That's not the only way to skin the cat. I'm bouncing the data over an intermediate table, so that you don't have to read the base tables twice. But there is a cost for the intermediate table, so I cannot say which is the best. And, please, please! Test everything that I post very carefully. It does not help if it runs faster than your original solution, if it produces an incorrect result. DECLARE @temp TABLE (incr int NOT NULL PRIMARY KEY, KeywordSearchID int NULL, SearchEngineID tinyint NULL, SearchEnginePageID tinyint NULL, ScanJobHeaderID int NOT NULL) Insert @temp (incr, KeywordSearchID, SearchEngineID, SearchEnginePageID, ScanJobHeaderID) Select row_number() OVER (ORDER BY A.ClientID), C.KeywordSearchID, jB.SearchEngineID, B.PageSyntaxID, A.ScanJobHeaderID From dbo.TableA A JOIN dbo.[j-TableB] jB on jB.SearchEngineGroupID = A.SearchEngineGroupID JOIN dbo.TableB B ON jB.SearchEngineID = B.SearchEngineID JOIN dbo.[j-TableC] jC ON jC.ClientID = A.ClientID JOIN dbo.TableC C ON jC.KeywordSearchID = C.KeywordSearchID Where NOT EXISTS (SELECT * FROM dbo.[j-TableA] jA WHERE A.ScanJobHeaderID = jA.ScanJobHeaderID) AND B.PageNumber = '1'/*@PageNumber*/ AND C.Billable = '1' DECLARE @maxid int SELECT @maxid = coalesce(MAX(ScanJobDetailID), 0) FROM [TableD] INSERT [TableD] (ScanJobDetailID, KeywordSearchID, SearchEngineID, SearchEnginePageID) SELECT @maxid + incr, KeywordSearchID, SearchEngineID, SearchEnginePageID FROM @temp INSERT [j-TableE] (ScanJobDetailID, ScanJobHeaderID) SELECT @maxid + incr, ScanJobHeaderID FROM @temp -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx You are correct, [j-TableE] is not needed.
Your code has been dead on. Thank you for your patience and assistance. The query runs in about 35 seconds the first time. Table D ie empty the first time I run it. Insert dbo.TableD(ScanJobHeaderID, KeywordSearchID, SearchEngineID, SearchEnginePageID) Select A.ScanJobHeaderID, C.KeywordSearchID, jB.SearchEngineID, B.PageSyntaxID From dbo.TableA A JOIN dbo.[j-TableB] jB ON jB.SearchEngineGroupID = A.SearchEngineGroupID JOIN dbo.TableB B ON jB.SearchEngineID = B.SearchEngineID JOIN dbo.[j-TableC] jC ON jC.ClientID = A.ClientID JOIN dbo.TableC C ON jC.KeywordSearchID = C.KeywordSearchID Where NOT EXISTS (SELECT ScanJobHeaderID FROM dbo.TableD jA WHERE JA.ScanJobHeaderID = A.ScanJobHeaderID) AND B.PageNumber = '1'/*@PageNumber*/ AND C.Billable = '1' It takes 1:35 the second time I run it. I placed a new record in TableA to make sure it was just grabbing new records. It takes just as long to complete if I don't add a new record to TableA. I would expect it to complete quickly because there is nothing to add to TableD. Does this seem abnormal to you? I ran Select * From dbo.TableA Where not exists(Select ScanJobHeaderID From dbo.TableD ja Where ja.ScanJobDetailID = a.ScanJobDetailID) To see if it was the source of the delay and it's not. The delay comes from the Joins. Show quote "Erland Sommarskog" wrote: > Curtis (Cur***@discussions.microsoft.com) writes: > > I am rather new to SQL, so I was thinking 1 minute for that many records > > was pretty good. I want to do this the best way. > > > > If I change TableD so that it does not have the IDENTITY then how would I > > assign a value to that column each time I run the query? > > Actually, when I look at your tables and the queries, I don't understand > why you have this [j-TableE]. There seems to be a one-to-one relation to > TableD, so if you just added ScanJobHeaderID to that table, there would > be no problem at all. > > But it could very well be that I have mangled the table relations when > I have rearranged your original code. Keep in mind that while I know SQL, > I know zero about the business your tables relates to. All that I am > able to is mechanical changes. > > Anyway, if there has two be two different tables, you could do as below. > That's not the only way to skin the cat. I'm bouncing the data over an > intermediate table, so that you don't have to read the base tables twice. > But there is a cost for the intermediate table, so I cannot say which is > the best. > > And, please, please! Test everything that I post very carefully. It does > not help if it runs faster than your original solution, if it produces an > incorrect result. > > DECLARE @temp TABLE (incr int NOT NULL PRIMARY KEY, > KeywordSearchID int NULL, > SearchEngineID tinyint NULL, > SearchEnginePageID tinyint NULL, > ScanJobHeaderID int NOT NULL) > > Insert @temp (incr, KeywordSearchID, SearchEngineID, > SearchEnginePageID, ScanJobHeaderID) > Select row_number() OVER (ORDER BY A.ClientID), C.KeywordSearchID, > jB.SearchEngineID, B.PageSyntaxID, A.ScanJobHeaderID > From dbo.TableA A > JOIN dbo.[j-TableB] jB > on jB.SearchEngineGroupID = A.SearchEngineGroupID > JOIN dbo.TableB B ON jB.SearchEngineID = B.SearchEngineID > JOIN dbo.[j-TableC] jC ON jC.ClientID = A.ClientID > JOIN dbo.TableC C ON jC.KeywordSearchID = C.KeywordSearchID > Where NOT EXISTS (SELECT * > FROM dbo.[j-TableA] jA > WHERE A.ScanJobHeaderID = jA.ScanJobHeaderID) > AND B.PageNumber = '1'/*@PageNumber*/ > AND C.Billable = '1' > > DECLARE @maxid int > > SELECT @maxid = coalesce(MAX(ScanJobDetailID), 0) FROM [TableD] > > INSERT [TableD] (ScanJobDetailID, KeywordSearchID, SearchEngineID, > SearchEnginePageID) > SELECT @maxid + incr, KeywordSearchID, SearchEngineID, > SearchEnginePageID > FROM @temp > > INSERT [j-TableE] (ScanJobDetailID, ScanJobHeaderID) > SELECT @maxid + incr, ScanJobHeaderID > FROM @temp > > > -- > Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se > > Books Online for SQL Server 2005 at > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx > Books Online for SQL Server 2000 at > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > Curtis (Cur***@discussions.microsoft.com) writes:
> It takes 1:35 the second time I run it. I placed a new record in TableA Abnormal and abnormal, it indicates that there is some indexing work to> to make sure it was just grabbing new records. It takes just as long to > complete if I don't add a new record to TableA. I would expect it to > complete quickly because there is nothing to add to TableD. Does this > seem abnormal to you? do. It is not all that uncommon to have queries that takes over a minute to find out rhat there is nothing all to return. > Select * The more tables you throw into the mix, the more choices the optimizer gets,> From dbo.TableA > Where not exists(Select ScanJobHeaderID From dbo.TableD ja Where > ja.ScanJobDetailID = a.ScanJobDetailID) > > To see if it was the source of the delay and it's not. The delay comes > from the Joins. and the more chances for bad choices. I don't remember the tables exactly. Or rather, I recall that they looked a bit incomplete, as not all had primary keys, and I don't know about other indexes. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx Indexing helped. I assumed my indexes were included in the create statements,
but they weren't. I have spent days trying to optimize the query that works with these records and I can't think of any other way to make it more efficient. It executes quickly until I add Order by NewID(), but I have to grab a random sample of search engines. Would you mind looking at my query to see if your experience can see a better way? The maximum time it takes to complete is 4 seconds. create PROCEDURE [dbo].[getJobByScanMachineID] ( @ScanMachineID int ) AS Declare @ScanJobHeaderID int, @ScanJobDetailID int Declare @Parameters table( ScanJobDetailID int not null, ScanJobHeaderID int not null, SearchEngineID int not null, URL varchar(max) not null) Insert @Parameters(ScanJobDetailID, ScanJobHeaderID, SearchEngineID, URL) Select Top 10 t1.ScanJobDetailID, t1.ScanJobHeaderID, t1.SearchEngineID, replace(URLSyntax, '<<keywords>>', t5.SearchPhrase) as URL From dbo.TableD t1 JOIN (Select ScanJobHeaderID, SortPriority, SubmissionTime From dbo.TableA Where (HoldUntilTime is null) or (HoldUntilTime <= getdate())) t3 on t3.ScanJobHeaderID = t1.ScanJobHeaderID JOIN (Select Top 10 SearchEngineID From dbo.getAvailableSearchEngines(@ScanMachineID) Order by NewID()) t2 ON t2.SearchEngineID = t1.SearchEngineID CROSS APPLY (Select PageSyntaxID, SearchEngineID, PageNumber, URLSyntax From dbo.TableB Where SearchEngineID = t2.SearchEngineID) t4 JOIN (Select KeywordSearchID, Case When (t1.LocalizationID is null) Then t2.RootPhrase Else t2.RootPhrase + ' ' + t3.Localization End As SearchPhrase From dbo.TableC t1 LEFT OUTER JOIN dbo.TableF t2 on t2.RootPhraseID = t1.RootPhraseID LEFT OUTER JOIN dbo.TableG t3 on t3.LocalizationID = t1.LocalizationID) as t5 on t5.KeywordSearchID = t1.KeyWordSearchID Where (AssignmentTime is null) and (t4.PageSyntaxID = t1.SearchEnginePageID) Order by SortPriority, SubmissionTime Select * From @Parameters While (Select Count(ScanJobDetailID) From @Parameters) > 0 Begin Select Top 1 @ScanJobHeaderID = ScanJobHeaderID, @ScanJobDetailID = ScanJobDetailID From @Parameters exec dbo.updateAssignJobByScanMachineID @ScanJobHeaderID, @ScanMachineID, @ScanjobDetailID Delete @Parameters Where ScanJobDetailID = @ScanJobDetailID END CREATE TABLE [dbo].[TableF]( [RootPhraseID] [int] IDENTITY(1,1) NOT NULL, [RootPhrase] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, CONSTRAINT [PK_tblKeywordSearch] PRIMARY KEY CLUSTERED ( [RootPhraseID] ASC )WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] GO SET ANSI_PADDING OFF GO /****** Object: Statistic [_dta_stat_481436789_2_1] Script Date: 08/22/2006 16:02:55 ******/ CREATE STATISTICS [_dta_stat_481436789_2_1] ON [dbo].[tblRootPhrase]([RootPhrase], [RootPhraseID]) CREATE TABLE [dbo].[TableG]( [LocalizationID] [smallint] IDENTITY(1,1) NOT NULL, [Localization] [varchar](100) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ) ON [PRIMARY] GO SET ANSI_PADDING OFF Show quote "Erland Sommarskog" wrote: > Curtis (Cur***@discussions.microsoft.com) writes: > > It takes 1:35 the second time I run it. I placed a new record in TableA > > to make sure it was just grabbing new records. It takes just as long to > > complete if I don't add a new record to TableA. I would expect it to > > complete quickly because there is nothing to add to TableD. Does this > > seem abnormal to you? > > Abnormal and abnormal, it indicates that there is some indexing work to > do. It is not all that uncommon to have queries that takes over a minute > to find out rhat there is nothing all to return. > > > Select * > > From dbo.TableA > > Where not exists(Select ScanJobHeaderID From dbo.TableD ja Where > > ja.ScanJobDetailID = a.ScanJobDetailID) > > > > To see if it was the source of the delay and it's not. The delay comes > > from the Joins. > > The more tables you throw into the mix, the more choices the optimizer gets, > and the more chances for bad choices. > > I don't remember the tables exactly. Or rather, I recall that they looked > a bit incomplete, as not all had primary keys, and I don't know about > other indexes. > > -- > Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se > > Books Online for SQL Server 2005 at > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx > Books Online for SQL Server 2000 at > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx > Curtis (Cur***@discussions.microsoft.com) writes:
> I have spent days trying to optimize the query that works with these It's difficult to suggest improvements with knowledge about all parts> records and I can't think of any other way to make it more efficient. It > executes quickly until I add Order by NewID(), but I have to grab a > random sample of search engines. Would you mind looking at my query to > see if your experience can see a better way? The maximum time it takes > to complete is 4 seconds. of the queries, indexes, and not talking about the business logic. The ORDER BY newid() appears here as far as I can see: Select Top 10 SearchEngineID From dbo.getAvailableSearchEngines(@ScanMachineID) Order by NewID() Does this query run quickly on its own? In such case, it may be an idea to have a temp table to where you insert the search engines. Temp tables has statistics, and that can help you get a better query plan for the total query. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx |
|||||||||||||||||||||||