|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
need help with execution plancreate table tbl1 (c1 int) create clustered index c1_idx on tbl1(c1) create table tbl2 (c1 int) create clustered index c1_idx on tbl2(c1) -- method 1 select a.c1 from tbl1 a where not exists(select * from tbl2 b where a.c1 = b.c1) -- method 2 select a.c1 from tbl1 a left join tbl2 b on a.c1 = b.c1 where b.c1 is null my intuition is, the 2 methods are the same, until I read in a book that says method 2 is better (no more explanation why it is better). But when I try on my SQL Server 2000 (SP4) both queries in a batch, the execution plan shows that query 1 takes 27% of the batch cost, the second 73%, and the 2 plans are different, as below: -- query 1 StmtText -------------------------------------------------------------------------------------------------------------- |--Merge Join(Right Anti Semi Join, MANY-TO-MANY MERGE:([b].[c1])=([a].[c1]), RESIDUAL:([a].[c1]=[b].[c1])) |--Clustered Index Scan(OBJECT:([laureate].[dbo].[tbl2].[c1_idx] AS [b]), ORDERED FORWARD) |--Clustered Index Scan(OBJECT:([laureate].[dbo].[tbl1].[c1_idx] AS [a]), ORDERED FORWARD)-- query 2 ------------------------------------------------------------------------------------------------- |--Filter(WHERE:([b].[c1]=NULL)) RESIDUAL:([a].[c1]=[b].[c1]))|--Hash Match(Right Outer Join, HASH:([b].[c1])=([a].[c1]), |--Clustered Index Scan(OBJECT:([laureate].[dbo].[tbl2].[c1_idx] AS [b])) |--Clustered Index Scan(OBJECT:([laureate].[dbo].[tbl1].[c1_idx] AS [a]))Yet the execution times are about the same, and other statistics show no big difference. As a matter of size, each has about 500K rows, returning about 100K rows. These all lost me. My questions: 1) is there a method better than the other? which one and why? 2) why do the 2 queries show different execution plans (one use merge, one use hash/match? 3) is the "show execution plan" in query analyser a reliable source about relative cost of a query in a batch? in my run, it shows that the first query costs about 1/3 of the other, while actual running times and logical/physical reads are about the same thanks, Tam I think the lesson to be learned here is to always, always, always test many
different techniques when presented with performance problems. I generally lean towards your method 1 when I'm doing this kind of thing (because I like it better for readability), and I would have expected the merge join to be much faster than a hash join. The query optimizer seems to agree with me, estimating a much cheaper plan. Yet on my system, after populating both tables with 500000 rows (integers 1 to 500000 in both cases), the first method, with an estimated cost of 4.55, runs in 9 seconds. The second method, with an estimated cost of 13.4, runs in 1 second! I use the estimated cost as a rough guide, but I always use careful testing to get the final, best answer. Show quote "vuht2000" <vuht2***@yahoo.com> wrote in message news:1123879527.282056.121120@g49g2000cwa.googlegroups.com... > a simplified problem: > > create table tbl1 (c1 int) > create clustered index c1_idx on tbl1(c1) > > create table tbl2 (c1 int) > create clustered index c1_idx on tbl2(c1) > > -- method 1 > select a.c1 > from tbl1 a where not exists(select * from tbl2 b where a.c1 = b.c1) > > -- method 2 > select a.c1 > from tbl1 a left join tbl2 b on a.c1 = b.c1 > where b.c1 is null > > my intuition is, the 2 methods are the same, until I read in a book > that says method 2 is better (no more explanation why it is better). > But when I try on my SQL Server 2000 (SP4) both queries in a batch, the > execution plan shows that query 1 takes 27% of the batch cost, the > second 73%, and the 2 plans are different, as below: > -- query 1 > StmtText > > -------------------------------------------------------------------------- ------------------------------------ > > |--Merge Join(Right Anti Semi Join, MANY-TO-MANY > MERGE:([b].[c1])=([a].[c1]), RESIDUAL:([a].[c1]=[b].[c1])) > |--Clustered Index Scan(OBJECT:([laureate].[dbo].[tbl2].[c1_idx] > AS [b]), ORDERED FORWARD) > |--Clustered Index Scan(OBJECT:([laureate].[dbo].[tbl1].[c1_idx] > AS [a]), ORDERED FORWARD) > > -- query 2 > -------------------------------------------------------------------------- ----------------------- > > |--Filter(WHERE:([b].[c1]=NULL)) > |--Hash Match(Right Outer Join, HASH:([b].[c1])=([a].[c1]), > RESIDUAL:([a].[c1]=[b].[c1])) > |--Clustered Index > Scan(OBJECT:([laureate].[dbo].[tbl2].[c1_idx] AS [b])) > |--Clustered Index > Scan(OBJECT:([laureate].[dbo].[tbl1].[c1_idx] AS [a])) > > Yet the execution times are about the same, and other statistics show > no big difference. As a matter of size, each has about 500K rows, > returning about 100K rows. These all lost me. My questions: > 1) is there a method better than the other? which one and why? > 2) why do the 2 queries show different execution plans (one use merge, > one use hash/match? > 3) is the "show execution plan" in query analyser a reliable source > about relative cost of a query in a batch? in my run, it shows that the > first query costs about 1/3 of the other, while actual running times > and logical/physical reads are about the same > > thanks, > Tam > Adam,
in this case (your test), estimated cost is not a rough guide, it's misleading, and in my case it's misleading too. Is it a bug/limited feature or is it something one has to be aware of when working in SQL Server ? Colin, I run both queries several times and the number of logical reads are constant, at about 1100 for both (zero physical read). Actual running times are about 2 seconds. It will be very interesting for me to know how to tweak the indexes of these 2 tables so that index seeks can be used. What are your ideas? thanks a lot guys and have a good weekend, Tam That's interesting. Wish I had a SQL Server handy to test stuff out on.
(It's at work) Wish I'd looked closer at your tables when writing that post last night, you've already got the best indexing for these tables. Normally, on a real table, there will be more columns, so creating an index will produce the same performance as you've just achieved. Regards Colin Dawson www.cjdawson.com Show quote "vuht2000" <vuht2***@yahoo.com> wrote in message news:1123902359.078910.103400@g14g2000cwa.googlegroups.com... > Adam, > in this case (your test), estimated cost is not a rough guide, it's > misleading, and in my case it's misleading too. Is it a bug/limited > feature or is it something one has to be aware of when working in SQL > Server ? > Colin, > I run both queries several times and the number of logical reads are > constant, at about 1100 for both (zero physical read). Actual running > times are about 2 seconds. > It will be very interesting for me to know how to tweak the indexes of > these 2 tables so that index seeks can be used. What are your ideas? > > thanks a lot guys and have a good weekend, > > Tam > Looking at this I'd also think that method 1 would be better. It's
interesting that the execution times are about the same. I'd run SQL Profiler, to see what the cost is for each query in terms of the number of database reads and writes needed to complete the operation. This can often be dramatically different. Something else worth noting is that your execution plans show that clustered index scans are being performed. It would be worth adding the correct indexes, so that index seeks can be used. This will change the execution plan too as SQL will compile the query to make use of the indexes. Regards Colin Dawson www.cjdawson.com Show quote "vuht2000" <vuht2***@yahoo.com> wrote in message news:1123879527.282056.121120@g49g2000cwa.googlegroups.com... >a simplified problem: > > create table tbl1 (c1 int) > create clustered index c1_idx on tbl1(c1) > > create table tbl2 (c1 int) > create clustered index c1_idx on tbl2(c1) > > -- method 1 > select a.c1 > from tbl1 a where not exists(select * from tbl2 b where a.c1 = b.c1) > > -- method 2 > select a.c1 > from tbl1 a left join tbl2 b on a.c1 = b.c1 > where b.c1 is null > > my intuition is, the 2 methods are the same, until I read in a book > that says method 2 is better (no more explanation why it is better). > But when I try on my SQL Server 2000 (SP4) both queries in a batch, the > execution plan shows that query 1 takes 27% of the batch cost, the > second 73%, and the 2 plans are different, as below: > -- query 1 > StmtText > > -------------------------------------------------------------------------------------------------------------- > > |--Merge Join(Right Anti Semi Join, MANY-TO-MANY > MERGE:([b].[c1])=([a].[c1]), RESIDUAL:([a].[c1]=[b].[c1])) > |--Clustered Index Scan(OBJECT:([laureate].[dbo].[tbl2].[c1_idx] > AS [b]), ORDERED FORWARD) > |--Clustered Index Scan(OBJECT:([laureate].[dbo].[tbl1].[c1_idx] > AS [a]), ORDERED FORWARD) > > -- query 2 > ------------------------------------------------------------------------------------------------- > > |--Filter(WHERE:([b].[c1]=NULL)) > |--Hash Match(Right Outer Join, HASH:([b].[c1])=([a].[c1]), > RESIDUAL:([a].[c1]=[b].[c1])) > |--Clustered Index > Scan(OBJECT:([laureate].[dbo].[tbl2].[c1_idx] AS [b])) > |--Clustered Index > Scan(OBJECT:([laureate].[dbo].[tbl1].[c1_idx] AS [a])) > > Yet the execution times are about the same, and other statistics show > no big difference. As a matter of size, each has about 500K rows, > returning about 100K rows. These all lost me. My questions: > 1) is there a method better than the other? which one and why? > 2) why do the 2 queries show different execution plans (one use merge, > one use hash/match? > 3) is the "show execution plan" in query analyser a reliable source > about relative cost of a query in a batch? in my run, it shows that the > first query costs about 1/3 of the other, while actual running times > and logical/physical reads are about the same > > thanks, > Tam > |
|||||||||||||||||||||||