Home All Groups Group Topic Archive Search About

need help with execution plan

Author
12 Aug 2005 8:45 PM
vuht2000
a simplified problem:

create table tbl1 (c1 int)
create clustered index c1_idx on tbl1(c1)

create table tbl2 (c1 int)
create clustered index c1_idx on tbl2(c1)

-- method 1
select a.c1
from tbl1 a where not exists(select * from tbl2 b where a.c1 = b.c1)

-- method 2
select a.c1
from tbl1 a left  join tbl2 b on a.c1 = b.c1
where b.c1 is null

my intuition is, the 2 methods are the same, until I read in a book
that says method 2 is better (no more explanation why it is better).
But when I try on my SQL Server 2000 (SP4) both queries in a batch, the
execution plan shows that query 1 takes 27% of the batch cost, the
second 73%, and the 2 plans are different, as below:
-- query 1
StmtText

--------------------------------------------------------------------------------------------------------------

  |--Merge Join(Right Anti Semi Join, MANY-TO-MANY
MERGE:([b].[c1])=([a].[c1]), RESIDUAL:([a].[c1]=[b].[c1]))
       |--Clustered Index Scan(OBJECT:([laureate].[dbo].[tbl2].[c1_idx]
AS [b]), ORDERED FORWARD)
       |--Clustered Index Scan(OBJECT:([laureate].[dbo].[tbl1].[c1_idx]
AS [a]), ORDERED FORWARD)

-- query 2
-------------------------------------------------------------------------------------------------

  |--Filter(WHERE:([b].[c1]=NULL))
       |--Hash Match(Right Outer Join, HASH:([b].[c1])=([a].[c1]),
RESIDUAL:([a].[c1]=[b].[c1]))
            |--Clustered Index
Scan(OBJECT:([laureate].[dbo].[tbl2].[c1_idx] AS [b]))
            |--Clustered Index
Scan(OBJECT:([laureate].[dbo].[tbl1].[c1_idx] AS [a]))

Yet the execution times are about the same, and other statistics show
no big difference. As a matter of size, each has about 500K rows,
returning about 100K rows. These all lost me. My questions:
1) is there a method better than the other? which one and why?
2) why do the 2 queries show different execution plans (one use merge,
one use hash/match?
3) is the "show execution plan" in query analyser a reliable source
about relative cost of a query in a batch? in my run, it shows that the
first query costs about 1/3 of the other, while actual running times
and logical/physical reads are about the same

thanks,
Tam

Author
12 Aug 2005 10:41 PM
Adam Machanic
I think the lesson to be learned here is to always, always, always test many
different techniques when presented with performance problems.

I generally lean towards your method 1 when I'm doing this kind of thing
(because I like it better for readability), and I would have expected the
merge join to be much faster than a hash join.  The query optimizer seems to
agree with me, estimating a much cheaper plan.  Yet on my system, after
populating both tables with 500000 rows (integers 1 to 500000 in both
cases), the first method, with an estimated cost of 4.55, runs in 9 seconds.
The second method, with an estimated cost of 13.4, runs in 1 second!

I use the estimated cost as a rough guide, but I always use careful testing
to get the final, best answer.


--
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--


Show quote
"vuht2000" <vuht2***@yahoo.com> wrote in message
news:1123879527.282056.121120@g49g2000cwa.googlegroups.com...
> a simplified problem:
>
> create table tbl1 (c1 int)
> create clustered index c1_idx on tbl1(c1)
>
> create table tbl2 (c1 int)
> create clustered index c1_idx on tbl2(c1)
>
> -- method 1
> select a.c1
> from tbl1 a where not exists(select * from tbl2 b where a.c1 = b.c1)
>
> -- method 2
> select a.c1
> from tbl1 a left  join tbl2 b on a.c1 = b.c1
> where b.c1 is null
>
> my intuition is, the 2 methods are the same, until I read in a book
> that says method 2 is better (no more explanation why it is better).
> But when I try on my SQL Server 2000 (SP4) both queries in a batch, the
> execution plan shows that query 1 takes 27% of the batch cost, the
> second 73%, and the 2 plans are different, as below:
> -- query 1
> StmtText
>
> --------------------------------------------------------------------------
------------------------------------
>
>   |--Merge Join(Right Anti Semi Join, MANY-TO-MANY
> MERGE:([b].[c1])=([a].[c1]), RESIDUAL:([a].[c1]=[b].[c1]))
>        |--Clustered Index Scan(OBJECT:([laureate].[dbo].[tbl2].[c1_idx]
> AS [b]), ORDERED FORWARD)
>        |--Clustered Index Scan(OBJECT:([laureate].[dbo].[tbl1].[c1_idx]
> AS [a]), ORDERED FORWARD)
>
> -- query 2
> --------------------------------------------------------------------------
-----------------------
>
>   |--Filter(WHERE:([b].[c1]=NULL))
>        |--Hash Match(Right Outer Join, HASH:([b].[c1])=([a].[c1]),
> RESIDUAL:([a].[c1]=[b].[c1]))
>             |--Clustered Index
> Scan(OBJECT:([laureate].[dbo].[tbl2].[c1_idx] AS [b]))
>             |--Clustered Index
> Scan(OBJECT:([laureate].[dbo].[tbl1].[c1_idx] AS [a]))
>
> Yet the execution times are about the same, and other statistics show
> no big difference. As a matter of size, each has about 500K rows,
> returning about 100K rows. These all lost me. My questions:
> 1) is there a method better than the other? which one and why?
> 2) why do the 2 queries show different execution plans (one use merge,
> one use hash/match?
> 3) is the "show execution plan" in query analyser a reliable source
> about relative cost of a query in a batch? in my run, it shows that the
> first query costs about 1/3 of the other, while actual running times
> and logical/physical reads are about the same
>
> thanks,
> Tam
>
Author
13 Aug 2005 3:05 AM
vuht2000
Adam,
in this case (your test), estimated cost is not a rough guide, it's
misleading, and in my case it's misleading too. Is it a bug/limited
feature or is it something one has to be aware of when working in SQL
Server ?
Colin,
I run both queries several times and the number of logical reads are
constant, at about 1100 for both (zero physical read). Actual running
times are about 2 seconds.
It will be very interesting for me to know how to tweak the indexes of
these 2 tables so that index seeks can be used. What are your ideas?

thanks a lot guys and have a good weekend,

Tam
Author
13 Aug 2005 8:59 AM
Colin Dawson
That's interesting.   Wish I had a SQL Server handy to test stuff out on.
(It's at work)

Wish I'd looked closer at your tables when writing that post last night,
you've already got the best indexing for these tables.

Normally, on a real table, there will be more columns, so creating an index
will produce the same performance as you've just achieved.

Regards

Colin Dawson
www.cjdawson.com

Show quote
"vuht2000" <vuht2***@yahoo.com> wrote in message
news:1123902359.078910.103400@g14g2000cwa.googlegroups.com...
> Adam,
> in this case (your test), estimated cost is not a rough guide, it's
> misleading, and in my case it's misleading too. Is it a bug/limited
> feature or is it something one has to be aware of when working in SQL
> Server ?
> Colin,
> I run both queries several times and the number of logical reads are
> constant, at about 1100 for both (zero physical read). Actual running
> times are about 2 seconds.
> It will be very interesting for me to know how to tweak the indexes of
> these 2 tables so that index seeks can be used. What are your ideas?
>
> thanks a lot guys and have a good weekend,
>
> Tam
>
Author
12 Aug 2005 11:01 PM
Colin Dawson
Looking at this I'd also think that method 1 would be better. It's
interesting that the execution times are about the same.  I'd run SQL
Profiler, to see what the cost is for each query in terms of the number of
database reads and writes needed to complete the operation.  This can often
be dramatically different.
Something else worth noting is that your execution plans show that clustered
index scans are being performed.  It would be worth adding the correct
indexes, so that index seeks can be used.  This will change the execution
plan too as SQL will compile the query to make use of the indexes.

Regards

Colin Dawson
www.cjdawson.com

Show quote
"vuht2000" <vuht2***@yahoo.com> wrote in message
news:1123879527.282056.121120@g49g2000cwa.googlegroups.com...
>a simplified problem:
>
> create table tbl1 (c1 int)
> create clustered index c1_idx on tbl1(c1)
>
> create table tbl2 (c1 int)
> create clustered index c1_idx on tbl2(c1)
>
> -- method 1
> select a.c1
> from tbl1 a where not exists(select * from tbl2 b where a.c1 = b.c1)
>
> -- method 2
> select a.c1
> from tbl1 a left  join tbl2 b on a.c1 = b.c1
> where b.c1 is null
>
> my intuition is, the 2 methods are the same, until I read in a book
> that says method 2 is better (no more explanation why it is better).
> But when I try on my SQL Server 2000 (SP4) both queries in a batch, the
> execution plan shows that query 1 takes 27% of the batch cost, the
> second 73%, and the 2 plans are different, as below:
> -- query 1
> StmtText
>
> --------------------------------------------------------------------------------------------------------------
>
>  |--Merge Join(Right Anti Semi Join, MANY-TO-MANY
> MERGE:([b].[c1])=([a].[c1]), RESIDUAL:([a].[c1]=[b].[c1]))
>       |--Clustered Index Scan(OBJECT:([laureate].[dbo].[tbl2].[c1_idx]
> AS [b]), ORDERED FORWARD)
>       |--Clustered Index Scan(OBJECT:([laureate].[dbo].[tbl1].[c1_idx]
> AS [a]), ORDERED FORWARD)
>
> -- query 2
> -------------------------------------------------------------------------------------------------
>
>  |--Filter(WHERE:([b].[c1]=NULL))
>       |--Hash Match(Right Outer Join, HASH:([b].[c1])=([a].[c1]),
> RESIDUAL:([a].[c1]=[b].[c1]))
>            |--Clustered Index
> Scan(OBJECT:([laureate].[dbo].[tbl2].[c1_idx] AS [b]))
>            |--Clustered Index
> Scan(OBJECT:([laureate].[dbo].[tbl1].[c1_idx] AS [a]))
>
> Yet the execution times are about the same, and other statistics show
> no big difference. As a matter of size, each has about 500K rows,
> returning about 100K rows. These all lost me. My questions:
> 1) is there a method better than the other? which one and why?
> 2) why do the 2 queries show different execution plans (one use merge,
> one use hash/match?
> 3) is the "show execution plan" in query analyser a reliable source
> about relative cost of a query in a batch? in my run, it shows that the
> first query costs about 1/3 of the other, while actual running times
> and logical/physical reads are about the same
>
> thanks,
> Tam
>

AddThis Social Bookmark Button