|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Table scan on 0 record temp table?across an interesting problem. SQL Server (2000 SP3a) is SLOWLY table-scanning temp tables with 0 records--and performance is severely hindered. A little background... I'm using a temp table that gets data from another table. I may then truncate the temp table if the data isn't needed. I don't drop the table as a craftily-designed SQL statement uses it in an outer join. Throw that in a loop several dozen times, each outer-joining hundreds or thousands of records, and the 0 record temp table is by far the slowest piece of each query (when looking at the execution plan), and takes a huge chunk of the procedure's time. Now, here's the kicker--adding a clustered index to that 0 record temp table, even after it has been truncated, speeds up the execution of the procedure significantly. To me, an index makes sense on almost every table--but not one with 0 records. My question is this... (Forgive me, I don't know any of the internals of SQL Server--my forte' is writing stored procedures for use in reports). Doesn't SQL Server know how many records any given table has without having to scan it, and shouldn't it know that a 0 record (temp or regular) table used in an outer join really has nothing to add to the result set? Call me crazy, but I'd go so far to say that this is a bug that hinders performance... Thanks for anyone's help. --Mark Zadony P.S. To prove my point, try this stored procedure on the "pubs" db... ------------------------------------------------------- CREATE PROCEDURE [guest].[TEST1] AS BEGIN select * into #titles_temp from titles truncate table #titles_temp --create clustered index CIDX_TT_TitleID -- on #titles_temp (Title_ID) select t1.title, t2.title from titles t1, #titles_temp t2 where t1.title_id *= t2.title_id END ------------------------------------------------------- Try a "before & after" without the clustered index then with. In my case, the index significantly helps with execution time since in my procedure the looping happens 40-50 times... (BTW--Yes, I know I should use 'LEFT OUTER JOIN' and not '*=', but old habits die hard). Mark Zadony wrote:
> My question is this... (Forgive me, I don't know any of the internals SQL Server doesn't know the number of rows. That information is not > of SQL Server--my forte' is writing stored procedures for use in > reports). Doesn't SQL Server know how many records any given table > has without having to scan it, and shouldn't it know that a 0 record > (temp or regular) table used in an outer join really has nothing to > add to the result set? stored in completely accurate fashion. Although there is a rowcount column in sysindexes, even that figure may not be current (requires running DBCC UPDATEUSAGE to update). If adding a clustered index on the temp table helps with zero rows, then I would suggest you keep it in place as it's likely to help even more with many rows in the table. Not trying to pick apart a design I haven't even seen, but I would
imagine that you would get even greater benefit without the loop. If you can't get rid of the loop, have you considered using conditional logic to avoid using the temp table except when you need it? Stu As David pointed out, no, SQL Server doesn't keep a current
count of the number of rows. The overhead would be worse than the benefit, in the big picture. Even when statistics suggest there are zero rows at some point in a query plan, the query optimizer assumes there is at least one row, since statistics are not known to be current, and query plans that are optimal for zero rows (ignoring a table, for example) mean wrong answers when there are not zero rows. This is a great example, by the way, to point out some aspects of query processing and optimization. I didn't run any timings, but the reasons I can think of for this to be slower without the index make most sense if the table once had rows, but they were deleted. Is that the situation you're seeing? Without the index, the best query plan is a nested loop join against a table scan. So the 0-row table must be scanned many times. For a table that once had rows but does no more, the table scan may have to read a few (even many) data pages following dead-end pointers, or passing by deleted rows. With the index, the plan is a nested loop join against a clustered index seek. While the table may still contain pages full of nothing, at least the seek allows the processor to discover that a particular row sought doesn't exist without scanning all those pages. Put another way, you may really be doing this: select t1.title, t2.title from titles t1, #titles_temp_including_deleted_rows t2 where t1.title_id *= t2.title_id AND t2.THIS_ROW is not a deleted row A clustered index may make little or no sense on a table with no data pages, but that's not the same thing as a table with no rows. What does sp_spaceused have to say about the "empty" table you are having this trouble with? Steve Kass Drew University Mark Zadony wrote: Show quote >In trying to tweak the performance of a slow-running stored proc, I've come >across an interesting problem. SQL Server (2000 SP3a) is SLOWLY >table-scanning temp tables with 0 records--and performance is severely >hindered. > >A little background... I'm using a temp table that gets data from another >table. I may then truncate the temp table if the data isn't needed. I don't >drop the table as a craftily-designed SQL statement uses it in an outer join. > Throw that in a loop several dozen times, each outer-joining hundreds or >thousands of records, and the 0 record temp table is by far the slowest piece >of each query (when looking at the execution plan), and takes a huge chunk of >the procedure's time. > >Now, here's the kicker--adding a clustered index to that 0 record temp >table, even after it has been truncated, speeds up the execution of the >procedure significantly. To me, an index makes sense on almost every >table--but not one with 0 records. > >My question is this... (Forgive me, I don't know any of the internals of SQL >Server--my forte' is writing stored procedures for use in reports). Doesn't >SQL Server know how many records any given table has without having to scan >it, and shouldn't it know that a 0 record (temp or regular) table used in an >outer join really has nothing to add to the result set? > >Call me crazy, but I'd go so far to say that this is a bug that hinders >performance... Thanks for anyone's help. > >--Mark Zadony > >P.S. To prove my point, try this stored procedure on the "pubs" db... >------------------------------------------------------- >CREATE PROCEDURE [guest].[TEST1] AS > BEGIN > >select * >into #titles_temp >from titles > >truncate table #titles_temp > >--create clustered index CIDX_TT_TitleID >-- on #titles_temp (Title_ID) > >select t1.title, t2.title >from titles t1, #titles_temp t2 >where t1.title_id *= t2.title_id > > END >------------------------------------------------------- >Try a "before & after" without the clustered index then with. In my case, >the index significantly helps with execution time since in my procedure the >looping happens 40-50 times... (BTW--Yes, I know I should use 'LEFT OUTER >JOIN' and not '*=', but old habits die hard). > > Wow,Steve.
Thanks, very useful info Show quote "Steve Kass" <sk***@drew.edu> wrote in message news:O%23MM3TvnFHA.1148@TK2MSFTNGP12.phx.gbl... > As David pointed out, no, SQL Server doesn't keep a current > count of the number of rows. The overhead would be worse > than the benefit, in the big picture. > > Even when statistics suggest there are zero rows at > some point in a query plan, the query optimizer assumes there > is at least one row, since statistics are not known to be current, > and query plans that are optimal for zero rows (ignoring a table, > for example) mean wrong answers when there are not zero rows. > > This is a great example, by the way, to point out some aspects of > query processing and optimization. > > I didn't run any timings, but the reasons I can think of for this to > be slower without the index make most sense if the table once > had rows, but they were deleted. Is that the situation you're seeing? > > Without the index, the best query plan is a nested loop join against a > table scan. So the 0-row table must be scanned many times. For a > table that once had rows but does no more, the table scan may have > to read a few (even many) data pages following dead-end pointers, > or passing by deleted rows. > > With the index, the plan is a nested loop join against a clustered index > seek. While the table may still contain pages full of nothing, at least > the seek allows the processor to discover that a particular row sought > doesn't exist without scanning all those pages. > > Put another way, you may really be doing this: > > select t1.title, t2.title > from titles t1, > #titles_temp_including_deleted_rows t2 > where t1.title_id *= t2.title_id > AND t2.THIS_ROW is not a deleted row > > > A clustered index may make little or no sense on a table with > no data pages, but that's not the same thing as a table with > no rows. > > What does sp_spaceused have to say about the "empty" > table you are having this trouble with? > Steve Kass > Drew University > > Mark Zadony wrote: > >>In trying to tweak the performance of a slow-running stored proc, I've >>come across an interesting problem. SQL Server (2000 SP3a) is SLOWLY >>table-scanning temp tables with 0 records--and performance is severely >>hindered. >> >>A little background... I'm using a temp table that gets data from another >>table. I may then truncate the temp table if the data isn't needed. I >>don't drop the table as a craftily-designed SQL statement uses it in an >>outer join. Throw that in a loop several dozen times, each outer-joining >>hundreds or thousands of records, and the 0 record temp table is by far >>the slowest piece of each query (when looking at the execution plan), and >>takes a huge chunk of the procedure's time. >> >>Now, here's the kicker--adding a clustered index to that 0 record temp >>table, even after it has been truncated, speeds up the execution of the >>procedure significantly. To me, an index makes sense on almost every >>table--but not one with 0 records. >> >>My question is this... (Forgive me, I don't know any of the internals of >>SQL Server--my forte' is writing stored procedures for use in reports). >>Doesn't SQL Server know how many records any given table has without >>having to scan it, and shouldn't it know that a 0 record (temp or regular) >>table used in an outer join really has nothing to add to the result set? >> >>Call me crazy, but I'd go so far to say that this is a bug that hinders >>performance... Thanks for anyone's help. >> >>--Mark Zadony >> >>P.S. To prove my point, try this stored procedure on the "pubs" db... >>------------------------------------------------------- >>CREATE PROCEDURE [guest].[TEST1] AS >> BEGIN >> >>select * >>into #titles_temp >>from titles >> >>truncate table #titles_temp >> >>--create clustered index CIDX_TT_TitleID >>-- on #titles_temp (Title_ID) >> >>select t1.title, t2.title >>from titles t1, #titles_temp t2 >>where t1.title_id *= t2.title_id >> >> END >>------------------------------------------------------- >>Try a "before & after" without the clustered index then with. In my case, >>the index significantly helps with execution time since in my procedure >>the looping happens 40-50 times... (BTW--Yes, I know I should use 'LEFT >>OUTER JOIN' and not '*=', but old habits die hard). >> Steve--I never figured the fact that a table that HAD pages would still be
scanned. I can't remember since this goes back to my SQL Server 6.5 training, but is the table definition in the same page as any of its data? If not, then SQL Server should know there are no data pages associated with that table--hence 0 records. This should especially hold true since the table was TRUNCATEd (as opposed to a DELETE FROM...) As for your question regarding sp_spaceused, it's giving me a "The object '#TEMP_Hierarchies' does not exist in database 'xxxxxxxx'." error, even though I embedded it into the proc right after the truncation. (I know I can't call it from one session into another--am I hitting against that within the same proc?). Am I doing something wrong? My next option would be to cut/paste the code into Query Analyzer and see what shows up there. (I'll leave that for Monday...) Stu--The code I'm using to get this (sometimes) 0-record temp table is actually called from an EXEC within a cursor. I have a table that tells this query what fields to pull from, hence the need for the EXEC. Not cursoring/looping is not an option since the fields to populate the report may change at any given time. You bring up a good point, however, now that I think about it. Since my EXEC code is dynamic (and not precompiled), I could include the temp table only when needed as a conditional statement and adjust the statement accordingly, which would have the same effect as outer joining this 0-record temp table. I've never tried this--can I use IF or CASE statements _within_ an EXEC? If not, EXEC will be running a varchar(8000) variable... If anyone at Microsoft is listening... Here's one way to tweak SQL Server's performance some more--find a way to know when a table has 0 records and to ignore them in an outer join--or ignore all other tables in a regular join... (SP5--please???) Thanks for everyone's help........ --Mark Zadony Show quote "Steve Kass" wrote: > As David pointed out, no, SQL Server doesn't keep a current > count of the number of rows. The overhead would be worse > than the benefit, in the big picture. > > Even when statistics suggest there are zero rows at > some point in a query plan, the query optimizer assumes there > is at least one row, since statistics are not known to be current, > and query plans that are optimal for zero rows (ignoring a table, > for example) mean wrong answers when there are not zero rows. > > This is a great example, by the way, to point out some aspects of > query processing and optimization. > > I didn't run any timings, but the reasons I can think of for this to > be slower without the index make most sense if the table once > had rows, but they were deleted. Is that the situation you're seeing? > > Without the index, the best query plan is a nested loop join against a > table scan. So the 0-row table must be scanned many times. For a > table that once had rows but does no more, the table scan may have > to read a few (even many) data pages following dead-end pointers, > or passing by deleted rows. > > With the index, the plan is a nested loop join against a clustered index > seek. While the table may still contain pages full of nothing, at least > the seek allows the processor to discover that a particular row sought > doesn't exist without scanning all those pages. > > Put another way, you may really be doing this: > > select t1.title, t2.title > from titles t1, > #titles_temp_including_deleted_rows t2 > where t1.title_id *= t2.title_id > AND t2.THIS_ROW is not a deleted row > > > A clustered index may make little or no sense on a table with > no data pages, but that's not the same thing as a table with > no rows. > > What does sp_spaceused have to say about the "empty" > table you are having this trouble with? > > Steve Kass > Drew University On Fri, 12 Aug 2005 15:59:02 -0700, Mark Zadony wrote:
(snip) >As for your question regarding sp_spaceused, it's giving me a "The object Hi Mark,>'#TEMP_Hierarchies' does not exist in database 'xxxxxxxx'." error, even >though I embedded it into the proc right after the truncation. Temporary tables live in the tempdb database. USE tempdb EXEC sp_spaceused #TEMP_Hierarchies USE xxxxxxxx >If anyone at Microsoft is listening... Here's one way to tweak SQL Server's One place where the MS folk are definitely listening for requests like>performance some more--find a way to know when a table has 0 records and to >ignore them in an outer join--or ignore all other tables in a regular join... > (SP5--please???) this is sqlw***@microsoft.com. Best, Hugo -- (Remove _NO_ and _SPAM_ to get my e-mail address) Poke around this group a bit looking for other's opinions on cursors
and dynamic SQL, and you'll find that there are a lot of issues with using them. If you must use dynamic SQL, however, you can certainly include any valid batch of SQL commands (as long as you don't exceed the length of the varchar). IF and CASE statements will work just fine, eg: DECLARE @sql varchar(8000) SET @sql = 'IF <some condition> BEGIN SELECT 1 END ELSE BEGIN SELECT 2 END ' EXEC (@SQL) --you could also use sp_execute_sql with parameters I would encourage you, however, to rethink your design, and try to find a way to achive your goals without dynamic SQL or a cursor. Perhaps, you may want to post your proc and we'll help you think through it (be prepared for some extrememly critical postings, however). HTH, Stu |
|||||||||||||||||||||||