Home All Groups Group Topic Archive Search About

JOIN Process Order and Performance Comparisons

Author
29 Jul 2005 2:12 AM
HardKhor
Hi all,

A common SQL that I do is joining parent and child tables together (1-M
relationship), e.g. Invoice and InvoiceItem tables. These tables have huge
number of rows.

Q1) Compare the two statements (that give the same result) below, from a
programming point of view, which one is more efficient?

Statement 1
--------------
SELECT *
FROM Invoice Ivo
INNER JOIN InvoiceItem IvoItem ON Ivo.RecNum = IvoItm.InvRecNum
WHERE Ivo.Date IS BETWEEN '2004-01-01 00:00:00' TO '2004-12-31 23:59:59'
AND IvoItm.ProductType = 1    --This line is processed in WHERE.

Statement 2
--------------
SELECT *
FROM Invoice Ivo
INNER JOIN InvoiceItem IvoItm ON Ivo.RecNum = IvoItm.InvRecNum
AND IvoItm.ProductType = 1    --This line is processed in JOIN.
WHERE Ivo.Date IS BETWEEN '2004-01-01 00:00:00' TO '2004-12-31 23:59:59'

This is something which I have been wondering for quite sometime. After
reading MSDN article "Join Fundamentals" stating, it says the JOIN statements
are processed first.

Q2) So in statement 2, does SQL Server process the JOIN 1st, then process
this filter "AND IvoItm.ProductType = 1", OR process that filter 1st, then
process the JOIN?

Q3) If it does the latter 1st, would it filter out the MANY rows in IvoItm,
before doing the JOINS? Therefore improving performance, as the amount of
data to join is reduced in the IvoItm?

Q4) Using the same analogy in Q3, would there be performance gain if I
rewrite the statement using sub-query to do the filtering 1st?

SELECT *
FROM Invoice Ivo
INNER JOIN (
    SELECT *
    FROM InvoiceItem
    WHERE ProductType = 1    --This line is processed in sub-query.
) IvoItm ON Ivo.RecNum = IvoItm.InvRecNum
WHERE Ivo.Date IS BETWEEN '2004-01-01 00:00:00' TO '2004-12-31 23:59:59'

Q5) And would above be efficient than using the Statement 1 and 2?

Author
29 Jul 2005 3:08 AM
Louis Davidson
Answers inline:
--
----------------------------------------------------------------------------
Louis Davidson - http://spaces.msn.com/members/drsql/
SQL Server MVP


Show quote
"HardKhor" <HardK***@discussions.microsoft.com> wrote in message
news:3E92F8CB-3E8A-4A4C-A0D9-71B707658D1A@microsoft.com...
> Hi all,
>
> A common SQL that I do is joining parent and child tables together (1-M
> relationship), e.g. Invoice and InvoiceItem tables. These tables have huge
> number of rows.
>
> Q1) Compare the two statements (that give the same result) below, from a
> programming point of view, which one is more efficient?
>
> Statement 1
> --------------
> SELECT *
> FROM Invoice Ivo
> INNER JOIN InvoiceItem IvoItem ON Ivo.RecNum = IvoItm.InvRecNum
> WHERE Ivo.Date IS BETWEEN '2004-01-01 00:00:00' TO '2004-12-31 23:59:59'
> AND IvoItm.ProductType = 1 --This line is processed in WHERE.
>
> Statement 2
> --------------
> SELECT *
> FROM Invoice Ivo
> INNER JOIN InvoiceItem IvoItm ON Ivo.RecNum = IvoItm.InvRecNum
> AND IvoItm.ProductType = 1 --This line is processed in JOIN.
> WHERE Ivo.Date IS BETWEEN '2004-01-01 00:00:00' TO '2004-12-31 23:59:59'

These two statements are probably going to perform equivalently, as they are
mathematically equivalent.  If this was an outer join, then it will make a
difference. First thing to do is to check the plan using Query analyzer.  It
should be the exact same plan.

Logically, all of the JOIN operators will be dealt with, building the set
with Invoice.*  + InvoiceLineItem, eliminating rows where the join criteria
fails.  Then for every row in the output you apply the where clause.
However, the optimizer can can reorganize the query to make it execute
better as long as the same results would be the same.

For the rest of your questions, try looking at the plan first. It may answer
the questions for you as the answers may be different based on the number of
rows in each table.

>
> This is something which I have been wondering for quite sometime. After
> reading MSDN article "Join Fundamentals" stating, it says the JOIN
> statements
> are processed first.
>
This is true logically, but it is not required if the results are the same

Show quote
> Q2) So in statement 2, does SQL Server process the JOIN 1st, then process
> this filter "AND IvoItm.ProductType = 1", OR process that filter 1st, then
> process the JOIN?
>
> Q3) If it does the latter 1st, would it filter out the MANY rows in
> IvoItm,
> before doing the JOINS? Therefore improving performance, as the amount of
> data to join is reduced in the IvoItm?
>
> Q4) Using the same analogy in Q3, would there be performance gain if I
> rewrite the statement using sub-query to do the filtering 1st?
>
> SELECT *
> FROM Invoice Ivo
> INNER JOIN (
> SELECT *
> FROM InvoiceItem
> WHERE ProductType = 1 --This line is processed in sub-query.
> ) IvoItm ON Ivo.RecNum = IvoItm.InvRecNum
> WHERE Ivo.Date IS BETWEEN '2004-01-01 00:00:00' TO '2004-12-31 23:59:59'
>
> Q5) And would above be efficient than using the Statement 1 and 2?
Author
29 Jul 2005 6:52 PM
Gert-Jan Strik
As noted by Louis, for Inner Joins it doesn't matter whether you specify
the predicates in the WHERE clause or in the JOIN ON clause. For Outer
Joins the meaning is different.

What the query optimizer will do, is analyse which indexes your tables
have and whether they can be seeked or scanned in order to reduce the
I/O needed to retrieve the actual data. Then it will do an access path
analysis to see in which order the joins would be fastest. If there is
an appropriate index, then physically, the (partial) filtering will
occur before the join.

Suppose you have a clustered index on Invoice(Date). Then you will
probably see a clustered index seek on table Invoice, regardless whether
you used syntax 1 or 2. BTW: the only way to really tell is check the
query plan.

HTH,
Gert-Jan


HardKhor wrote:
Show quote
>
> Hi all,
>
> A common SQL that I do is joining parent and child tables together (1-M
> relationship), e.g. Invoice and InvoiceItem tables. These tables have huge
> number of rows.
>
> Q1) Compare the two statements (that give the same result) below, from a
> programming point of view, which one is more efficient?
>
> Statement 1
> --------------
> SELECT *
> FROM Invoice Ivo
> INNER JOIN InvoiceItem IvoItem ON Ivo.RecNum = IvoItm.InvRecNum
> WHERE Ivo.Date IS BETWEEN '2004-01-01 00:00:00' TO '2004-12-31 23:59:59'
> AND IvoItm.ProductType = 1      --This line is processed in WHERE.
>
> Statement 2
> --------------
> SELECT *
> FROM Invoice Ivo
> INNER JOIN InvoiceItem IvoItm ON Ivo.RecNum = IvoItm.InvRecNum
> AND IvoItm.ProductType = 1      --This line is processed in JOIN.
> WHERE Ivo.Date IS BETWEEN '2004-01-01 00:00:00' TO '2004-12-31 23:59:59'
>
> This is something which I have been wondering for quite sometime. After
> reading MSDN article "Join Fundamentals" stating, it says the JOIN statements
> are processed first.
>
> Q2) So in statement 2, does SQL Server process the JOIN 1st, then process
> this filter "AND IvoItm.ProductType = 1", OR process that filter 1st, then
> process the JOIN?
>
> Q3) If it does the latter 1st, would it filter out the MANY rows in IvoItm,
> before doing the JOINS? Therefore improving performance, as the amount of
> data to join is reduced in the IvoItm?
>
> Q4) Using the same analogy in Q3, would there be performance gain if I
> rewrite the statement using sub-query to do the filtering 1st?
>
> SELECT *
> FROM Invoice Ivo
> INNER JOIN (
>         SELECT *
>         FROM InvoiceItem
>         WHERE ProductType = 1   --This line is processed in sub-query.
> ) IvoItm ON Ivo.RecNum = IvoItm.InvRecNum
> WHERE Ivo.Date IS BETWEEN '2004-01-01 00:00:00' TO '2004-12-31 23:59:59'
>
> Q5) And would above be efficient than using the Statement 1 and 2?

AddThis Social Bookmark Button