|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
JOIN Process Order and Performance ComparisonsA common SQL that I do is joining parent and child tables together (1-M relationship), e.g. Invoice and InvoiceItem tables. These tables have huge number of rows. Q1) Compare the two statements (that give the same result) below, from a programming point of view, which one is more efficient? Statement 1 -------------- SELECT * FROM Invoice Ivo INNER JOIN InvoiceItem IvoItem ON Ivo.RecNum = IvoItm.InvRecNum WHERE Ivo.Date IS BETWEEN '2004-01-01 00:00:00' TO '2004-12-31 23:59:59' AND IvoItm.ProductType = 1 --This line is processed in WHERE. Statement 2 -------------- SELECT * FROM Invoice Ivo INNER JOIN InvoiceItem IvoItm ON Ivo.RecNum = IvoItm.InvRecNum AND IvoItm.ProductType = 1 --This line is processed in JOIN. WHERE Ivo.Date IS BETWEEN '2004-01-01 00:00:00' TO '2004-12-31 23:59:59' This is something which I have been wondering for quite sometime. After reading MSDN article "Join Fundamentals" stating, it says the JOIN statements are processed first. Q2) So in statement 2, does SQL Server process the JOIN 1st, then process this filter "AND IvoItm.ProductType = 1", OR process that filter 1st, then process the JOIN? Q3) If it does the latter 1st, would it filter out the MANY rows in IvoItm, before doing the JOINS? Therefore improving performance, as the amount of data to join is reduced in the IvoItm? Q4) Using the same analogy in Q3, would there be performance gain if I rewrite the statement using sub-query to do the filtering 1st? SELECT * FROM Invoice Ivo INNER JOIN ( SELECT * FROM InvoiceItem WHERE ProductType = 1 --This line is processed in sub-query. ) IvoItm ON Ivo.RecNum = IvoItm.InvRecNum WHERE Ivo.Date IS BETWEEN '2004-01-01 00:00:00' TO '2004-12-31 23:59:59' Q5) And would above be efficient than using the Statement 1 and 2? Answers inline:
-- Show quote---------------------------------------------------------------------------- Louis Davidson - http://spaces.msn.com/members/drsql/ SQL Server MVP "HardKhor" <HardK***@discussions.microsoft.com> wrote in message These two statements are probably going to perform equivalently, as they are news:3E92F8CB-3E8A-4A4C-A0D9-71B707658D1A@microsoft.com... > Hi all, > > A common SQL that I do is joining parent and child tables together (1-M > relationship), e.g. Invoice and InvoiceItem tables. These tables have huge > number of rows. > > Q1) Compare the two statements (that give the same result) below, from a > programming point of view, which one is more efficient? > > Statement 1 > -------------- > SELECT * > FROM Invoice Ivo > INNER JOIN InvoiceItem IvoItem ON Ivo.RecNum = IvoItm.InvRecNum > WHERE Ivo.Date IS BETWEEN '2004-01-01 00:00:00' TO '2004-12-31 23:59:59' > AND IvoItm.ProductType = 1 --This line is processed in WHERE. > > Statement 2 > -------------- > SELECT * > FROM Invoice Ivo > INNER JOIN InvoiceItem IvoItm ON Ivo.RecNum = IvoItm.InvRecNum > AND IvoItm.ProductType = 1 --This line is processed in JOIN. > WHERE Ivo.Date IS BETWEEN '2004-01-01 00:00:00' TO '2004-12-31 23:59:59' mathematically equivalent. If this was an outer join, then it will make a difference. First thing to do is to check the plan using Query analyzer. It should be the exact same plan. Logically, all of the JOIN operators will be dealt with, building the set with Invoice.* + InvoiceLineItem, eliminating rows where the join criteria fails. Then for every row in the output you apply the where clause. However, the optimizer can can reorganize the query to make it execute better as long as the same results would be the same. For the rest of your questions, try looking at the plan first. It may answer the questions for you as the answers may be different based on the number of rows in each table. > This is true logically, but it is not required if the results are the same> This is something which I have been wondering for quite sometime. After > reading MSDN article "Join Fundamentals" stating, it says the JOIN > statements > are processed first. > Show quote > Q2) So in statement 2, does SQL Server process the JOIN 1st, then process > this filter "AND IvoItm.ProductType = 1", OR process that filter 1st, then > process the JOIN? > > Q3) If it does the latter 1st, would it filter out the MANY rows in > IvoItm, > before doing the JOINS? Therefore improving performance, as the amount of > data to join is reduced in the IvoItm? > > Q4) Using the same analogy in Q3, would there be performance gain if I > rewrite the statement using sub-query to do the filtering 1st? > > SELECT * > FROM Invoice Ivo > INNER JOIN ( > SELECT * > FROM InvoiceItem > WHERE ProductType = 1 --This line is processed in sub-query. > ) IvoItm ON Ivo.RecNum = IvoItm.InvRecNum > WHERE Ivo.Date IS BETWEEN '2004-01-01 00:00:00' TO '2004-12-31 23:59:59' > > Q5) And would above be efficient than using the Statement 1 and 2? As noted by Louis, for Inner Joins it doesn't matter whether you specify
the predicates in the WHERE clause or in the JOIN ON clause. For Outer Joins the meaning is different. What the query optimizer will do, is analyse which indexes your tables have and whether they can be seeked or scanned in order to reduce the I/O needed to retrieve the actual data. Then it will do an access path analysis to see in which order the joins would be fastest. If there is an appropriate index, then physically, the (partial) filtering will occur before the join. Suppose you have a clustered index on Invoice(Date). Then you will probably see a clustered index seek on table Invoice, regardless whether you used syntax 1 or 2. BTW: the only way to really tell is check the query plan. HTH, Gert-Jan HardKhor wrote: Show quote > > Hi all, > > A common SQL that I do is joining parent and child tables together (1-M > relationship), e.g. Invoice and InvoiceItem tables. These tables have huge > number of rows. > > Q1) Compare the two statements (that give the same result) below, from a > programming point of view, which one is more efficient? > > Statement 1 > -------------- > SELECT * > FROM Invoice Ivo > INNER JOIN InvoiceItem IvoItem ON Ivo.RecNum = IvoItm.InvRecNum > WHERE Ivo.Date IS BETWEEN '2004-01-01 00:00:00' TO '2004-12-31 23:59:59' > AND IvoItm.ProductType = 1 --This line is processed in WHERE. > > Statement 2 > -------------- > SELECT * > FROM Invoice Ivo > INNER JOIN InvoiceItem IvoItm ON Ivo.RecNum = IvoItm.InvRecNum > AND IvoItm.ProductType = 1 --This line is processed in JOIN. > WHERE Ivo.Date IS BETWEEN '2004-01-01 00:00:00' TO '2004-12-31 23:59:59' > > This is something which I have been wondering for quite sometime. After > reading MSDN article "Join Fundamentals" stating, it says the JOIN statements > are processed first. > > Q2) So in statement 2, does SQL Server process the JOIN 1st, then process > this filter "AND IvoItm.ProductType = 1", OR process that filter 1st, then > process the JOIN? > > Q3) If it does the latter 1st, would it filter out the MANY rows in IvoItm, > before doing the JOINS? Therefore improving performance, as the amount of > data to join is reduced in the IvoItm? > > Q4) Using the same analogy in Q3, would there be performance gain if I > rewrite the statement using sub-query to do the filtering 1st? > > SELECT * > FROM Invoice Ivo > INNER JOIN ( > SELECT * > FROM InvoiceItem > WHERE ProductType = 1 --This line is processed in sub-query. > ) IvoItm ON Ivo.RecNum = IvoItm.InvRecNum > WHERE Ivo.Date IS BETWEEN '2004-01-01 00:00:00' TO '2004-12-31 23:59:59' > > Q5) And would above be efficient than using the Statement 1 and 2? |
|||||||||||||||||||||||