Home All Groups Group Topic Archive Search About

Table scan on 0 record temp table?

Author
11 Aug 2005 11:47 PM
Mark Zadony
In trying to tweak the performance of a slow-running stored proc, I've come
across an interesting problem.  SQL Server (2000 SP3a) is SLOWLY
table-scanning temp tables with 0 records--and performance is severely
hindered.

A little background...  I'm using a temp table that gets data from another
table.  I may then truncate the temp table if the data isn't needed.  I don't
drop the table as a craftily-designed SQL statement uses it in an outer join.
Throw that in a loop several dozen times, each outer-joining hundreds or
thousands of records, and the 0 record temp table is by far the slowest piece
of each query (when looking at the execution plan), and takes a huge chunk of
the procedure's time.

Now, here's the kicker--adding a clustered index to that 0 record temp
table, even after it has been truncated, speeds up the execution of the
procedure significantly.  To me, an index makes sense on almost every
table--but not one with 0 records.

My question is this... (Forgive me, I don't know any of the internals of SQL
Server--my forte' is writing stored procedures for use in reports).  Doesn't
SQL Server know how many records any given table has without having to scan
it, and shouldn't it know that a 0 record (temp or regular) table used in an
outer join really has nothing to add to the result set?

Call me crazy, but I'd go so far to say that this is a bug that hinders
performance...  Thanks for anyone's help.

--Mark Zadony

P.S.  To prove my point, try this stored procedure on the "pubs" db...
-------------------------------------------------------
CREATE PROCEDURE [guest].[TEST1] AS
    BEGIN

select *
into #titles_temp
from titles

truncate table #titles_temp

--create clustered index CIDX_TT_TitleID
--  on #titles_temp (Title_ID)

select t1.title, t2.title
from titles t1, #titles_temp t2
where t1.title_id *= t2.title_id

    END
-------------------------------------------------------
Try a "before & after" without the clustered index then with.  In my case,
the index significantly helps with execution time since in my procedure the
looping happens 40-50 times...  (BTW--Yes, I know I should use 'LEFT OUTER
JOIN' and not '*=', but old habits die hard).

Author
12 Aug 2005 12:09 AM
David Gugick
Mark Zadony wrote:
> My question is this... (Forgive me, I don't know any of the internals
> of SQL Server--my forte' is writing stored procedures for use in
> reports).  Doesn't SQL Server know how many records any given table
> has without having to scan it, and shouldn't it know that a 0 record
> (temp or regular) table used in an outer join really has nothing to
> add to the result set?

SQL Server doesn't know the number of rows. That information is not
stored in completely accurate fashion. Although there is a rowcount
column in sysindexes, even that figure may not be current (requires
running DBCC UPDATEUSAGE to update).

If adding a clustered index on the temp table helps with zero rows, then
I would suggest you keep it in place as it's likely to help even more
with many rows in the table.


--
David Gugick
Quest Software
www.imceda.com
www.quest.com
Author
12 Aug 2005 12:34 AM
Stu
Not trying to pick apart a design I haven't even seen, but I would
imagine that you would get even greater benefit without the loop.  If
you can't get rid of the loop, have you considered using conditional
logic to avoid using the temp table except when you need it?

Stu
Author
12 Aug 2005 4:19 AM
Steve Kass
As David pointed out, no, SQL Server doesn't keep a current
count of the number of rows.  The overhead would be worse
than the benefit, in the big picture.

Even when statistics suggest there are zero rows at
some point in a query plan, the query optimizer assumes there
is at least one row, since statistics are not known to be current,
and query plans that are optimal for zero rows (ignoring a table,
for example) mean wrong answers when there are not zero rows.

This is a great example, by the way, to point out some aspects of
query processing and optimization.

I didn't run any timings, but the reasons I can think of for this to
be slower without the index make most sense if the table once
had rows, but they were deleted.  Is that the situation you're seeing?

Without the index, the best query plan is a nested loop join against a
table scan.  So the 0-row table must be scanned many times.  For a
table that once had rows but does no more, the table scan may have
to read a few (even many) data pages following dead-end pointers,
or passing by deleted rows.

With the index, the plan is a nested loop join against a clustered index
seek.  While the table may still contain pages full of nothing, at least
the seek allows the processor to discover that a particular row sought
doesn't exist without scanning all those pages.

Put another way, you may really be doing this:

select t1.title, t2.title
from titles t1,
#titles_temp_including_deleted_rows t2
where t1.title_id *= t2.title_id
AND t2.THIS_ROW is not a deleted row


A clustered index may make little or no sense on a table with
no data pages, but that's not the same thing as a table with
no rows.

What does sp_spaceused have to say about the "empty"
table you are having this trouble with?

Steve Kass
Drew University

Mark Zadony wrote:

Show quote
>In trying to tweak the performance of a slow-running stored proc, I've come
>across an interesting problem.  SQL Server (2000 SP3a) is SLOWLY
>table-scanning temp tables with 0 records--and performance is severely
>hindered.
>
>A little background...  I'm using a temp table that gets data from another
>table.  I may then truncate the temp table if the data isn't needed.  I don't
>drop the table as a craftily-designed SQL statement uses it in an outer join.
> Throw that in a loop several dozen times, each outer-joining hundreds or
>thousands of records, and the 0 record temp table is by far the slowest piece
>of each query (when looking at the execution plan), and takes a huge chunk of
>the procedure's time.
>
>Now, here's the kicker--adding a clustered index to that 0 record temp
>table, even after it has been truncated, speeds up the execution of the
>procedure significantly.  To me, an index makes sense on almost every
>table--but not one with 0 records.
>
>My question is this... (Forgive me, I don't know any of the internals of SQL
>Server--my forte' is writing stored procedures for use in reports).  Doesn't
>SQL Server know how many records any given table has without having to scan
>it, and shouldn't it know that a 0 record (temp or regular) table used in an
>outer join really has nothing to add to the result set?
>
>Call me crazy, but I'd go so far to say that this is a bug that hinders
>performance...  Thanks for anyone's help.
>
>--Mark Zadony
>
>P.S.  To prove my point, try this stored procedure on the "pubs" db...
>-------------------------------------------------------
>CREATE PROCEDURE [guest].[TEST1] AS
>     BEGIN
>
>select *
>into #titles_temp
>from titles
>
>truncate table #titles_temp
>
>--create clustered index CIDX_TT_TitleID
>--  on #titles_temp (Title_ID)
>
>select t1.title, t2.title
>from titles t1, #titles_temp t2
>where t1.title_id *= t2.title_id
>
>    END
>-------------------------------------------------------
>Try a "before & after" without the clustered index then with.  In my case,
>the index significantly helps with execution time since in my procedure the
>looping happens 40-50 times...  (BTW--Yes, I know I should use 'LEFT OUTER
>JOIN' and not '*=', but old habits die hard).

>
Author
12 Aug 2005 4:43 AM
Uri Dimant
Wow,Steve.
Thanks, very useful info



Show quote
"Steve Kass" <sk***@drew.edu> wrote in message
news:O%23MM3TvnFHA.1148@TK2MSFTNGP12.phx.gbl...
> As David pointed out, no, SQL Server doesn't keep a current
> count of the number of rows.  The overhead would be worse
> than the benefit, in the big picture.
>
> Even when statistics suggest there are zero rows at
> some point in a query plan, the query optimizer assumes there
> is at least one row, since statistics are not known to be current,
> and query plans that are optimal for zero rows (ignoring a table,
> for example) mean wrong answers when there are not zero rows.
>
> This is a great example, by the way, to point out some aspects of
> query processing and optimization.
>
> I didn't run any timings, but the reasons I can think of for this to
> be slower without the index make most sense if the table once
> had rows, but they were deleted.  Is that the situation you're seeing?
>
> Without the index, the best query plan is a nested loop join against a
> table scan.  So the 0-row table must be scanned many times.  For a
> table that once had rows but does no more, the table scan may have
> to read a few (even many) data pages following dead-end pointers,
> or passing by deleted rows.
>
> With the index, the plan is a nested loop join against a clustered index
> seek.  While the table may still contain pages full of nothing, at least
> the seek allows the processor to discover that a particular row sought
> doesn't exist without scanning all those pages.
>
> Put another way, you may really be doing this:
>
> select t1.title, t2.title
> from titles t1,
> #titles_temp_including_deleted_rows t2
> where t1.title_id *= t2.title_id
> AND t2.THIS_ROW is not a deleted row
>
>
> A clustered index may make little or no sense on a table with
> no data pages, but that's not the same thing as a table with
> no rows.
>
> What does sp_spaceused have to say about the "empty"
> table you are having this trouble with?
> Steve Kass
> Drew University
>
> Mark Zadony wrote:
>
>>In trying to tweak the performance of a slow-running stored proc, I've
>>come across an interesting problem.  SQL Server (2000 SP3a) is SLOWLY
>>table-scanning temp tables with 0 records--and performance is severely
>>hindered.
>>
>>A little background...  I'm using a temp table that gets data from another
>>table.  I may then truncate the temp table if the data isn't needed.  I
>>don't drop the table as a craftily-designed SQL statement uses it in an
>>outer join. Throw that in a loop several dozen times, each outer-joining
>>hundreds or thousands of records, and the 0 record temp table is by far
>>the slowest piece of each query (when looking at the execution plan), and
>>takes a huge chunk of the procedure's time.
>>
>>Now, here's the kicker--adding a clustered index to that 0 record temp
>>table, even after it has been truncated, speeds up the execution of the
>>procedure significantly.  To me, an index makes sense on almost every
>>table--but not one with 0 records.
>>
>>My question is this... (Forgive me, I don't know any of the internals of
>>SQL Server--my forte' is writing stored procedures for use in reports).
>>Doesn't SQL Server know how many records any given table has without
>>having to scan it, and shouldn't it know that a 0 record (temp or regular)
>>table used in an outer join really has nothing to add to the result set?
>>
>>Call me crazy, but I'd go so far to say that this is a bug that hinders
>>performance...  Thanks for anyone's help.
>>
>>--Mark Zadony
>>
>>P.S.  To prove my point, try this stored procedure on the "pubs" db...
>>-------------------------------------------------------
>>CREATE PROCEDURE [guest].[TEST1] AS
>> BEGIN
>>
>>select *
>>into #titles_temp
>>from titles
>>
>>truncate table #titles_temp
>>
>>--create clustered index CIDX_TT_TitleID
>>--  on #titles_temp (Title_ID)
>>
>>select t1.title, t2.title
>>from titles t1, #titles_temp t2
>>where t1.title_id *= t2.title_id
>>
>> END
>>-------------------------------------------------------
>>Try a "before & after" without the clustered index then with.  In my case,
>>the index significantly helps with execution time since in my procedure
>>the looping happens 40-50 times...  (BTW--Yes, I know I should use 'LEFT
>>OUTER JOIN' and not '*=', but old habits die hard).
>>
Author
12 Aug 2005 10:59 PM
Mark Zadony
Steve--I never figured the fact that a table that HAD pages would still be
scanned.  I can't remember since this goes back to my SQL Server 6.5
training, but is the table definition in the same page as any of its data? 
If not, then SQL Server should know there are no data pages associated with
that table--hence 0 records.  This should especially hold true since the
table was TRUNCATEd (as opposed to a DELETE FROM...)

As for your question regarding sp_spaceused, it's giving me a "The object
'#TEMP_Hierarchies' does not exist in database 'xxxxxxxx'." error, even
though I embedded it into the proc right after the truncation.  (I know I
can't call it from one session into another--am I hitting against that within
the same proc?).  Am I doing something wrong?  My next option would be to
cut/paste the code into Query Analyzer and see what shows up there.  (I'll
leave that for Monday...)

Stu--The code I'm using to get this (sometimes) 0-record temp table is
actually called from an EXEC within a cursor.  I have a table that tells this
query what fields to pull from, hence the need for the EXEC.  Not
cursoring/looping is not an option since the fields to populate the report
may change at any given time.  You bring up a good point, however, now that I
think about it.  Since my EXEC code is dynamic (and not precompiled), I could
include the temp table only when needed as a conditional statement and adjust
the statement accordingly, which would have the same effect as outer joining
this 0-record temp table.  I've never tried this--can I use IF or CASE
statements _within_ an EXEC?  If not, EXEC will be running a varchar(8000)
variable...

If anyone at Microsoft is listening... Here's one way to tweak SQL Server's
performance some more--find a way to know when a table has 0 records and to
ignore them in an outer join--or ignore all other tables in a regular join...
(SP5--please???)

Thanks for everyone's help........

--Mark Zadony


Show quote
"Steve Kass" wrote:

> As David pointed out, no, SQL Server doesn't keep a current
> count of the number of rows.  The overhead would be worse
> than the benefit, in the big picture.
>
> Even when statistics suggest there are zero rows at
> some point in a query plan, the query optimizer assumes there
> is at least one row, since statistics are not known to be current,
> and query plans that are optimal for zero rows (ignoring a table,
> for example) mean wrong answers when there are not zero rows.
>
> This is a great example, by the way, to point out some aspects of
> query processing and optimization.
>
> I didn't run any timings, but the reasons I can think of for this to
> be slower without the index make most sense if the table once
> had rows, but they were deleted.  Is that the situation you're seeing?
>
> Without the index, the best query plan is a nested loop join against a
> table scan.  So the 0-row table must be scanned many times.  For a
> table that once had rows but does no more, the table scan may have
> to read a few (even many) data pages following dead-end pointers,
> or passing by deleted rows.
>
> With the index, the plan is a nested loop join against a clustered index
> seek.  While the table may still contain pages full of nothing, at least
> the seek allows the processor to discover that a particular row sought
> doesn't exist without scanning all those pages.
>
> Put another way, you may really be doing this:
>
> select t1.title, t2.title
> from titles t1,
> #titles_temp_including_deleted_rows t2
> where t1.title_id *= t2.title_id
> AND t2.THIS_ROW is not a deleted row
>
>
> A clustered index may make little or no sense on a table with
> no data pages, but that's not the same thing as a table with
> no rows.
>
> What does sp_spaceused have to say about the "empty"
> table you are having this trouble with?
>
> Steve Kass
> Drew University
Author
14 Aug 2005 11:41 AM
Hugo Kornelis
On Fri, 12 Aug 2005 15:59:02 -0700, Mark Zadony wrote:

(snip)
>As for your question regarding sp_spaceused, it's giving me a "The object
>'#TEMP_Hierarchies' does not exist in database 'xxxxxxxx'." error, even
>though I embedded it into the proc right after the truncation.

Hi Mark,

Temporary tables live in the tempdb database.

USE tempdb
EXEC sp_spaceused #TEMP_Hierarchies
USE xxxxxxxx

>If anyone at Microsoft is listening... Here's one way to tweak SQL Server's
>performance some more--find a way to know when a table has 0 records and to
>ignore them in an outer join--or ignore all other tables in a regular join...
> (SP5--please???)

One place where the MS folk are definitely listening for requests like
this is sqlw***@microsoft.com.

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)
Author
14 Aug 2005 1:12 PM
Stu
Poke around this group a bit looking for other's opinions on cursors
and dynamic SQL, and you'll find that there are a lot of issues with
using them.  If you must use dynamic SQL, however, you can certainly
include any valid batch of SQL commands (as long as you don't exceed
the length of the varchar).  IF and CASE statements will work just
fine, eg:

DECLARE @sql varchar(8000)

SET @sql = 'IF <some condition>
               BEGIN
                   SELECT 1
               END
            ELSE
               BEGIN
                   SELECT 2
               END '

EXEC (@SQL)  --you could also use sp_execute_sql with parameters

I would encourage you, however, to rethink your design, and try to find
a way to achive your goals without dynamic SQL or a cursor.  Perhaps,
you may want to post your proc and we'll help you think through it (be
prepared for some extrememly critical postings, however).

HTH,
Stu

AddThis Social Bookmark Button