|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Subquery performance concernI have 2 tables: x and y. this is my query select x.abc, (select y.def from y where y.id=123) as comment1, (select y.def from y where y.id=456) as comment2 from x where x.id=789 let's say, there are 1 million records in table x, how many times this sub-query will be executed? what is the better way to do it? select y.def from y where y.id=123 Thanks. Dean Each subquery will be executed just once, as they do not depend on the
outer query at all. Roy Harvey Beacon Falls, CT On 22 Jun 2006 06:30:19 -0700, trtran***@yahoo.ca wrote: Show quote >Hi Group, > >I have 2 tables: x and y. this is my query > >select x.abc, > (select y.def from y where y.id=123) as comment1, > (select y.def from y where y.id=456) as comment2 >from x >where x.id=789 > >let's say, there are 1 million records in table x, how many times this >sub-query >will be executed? what is the better way to do it? > >select y.def from y where y.id=123 > >Thanks. > >Dean Hello, trtranger
If the number of rows in the table x is small (less than a few hundred rows) and if there is an index on the y.id column, SQL Server 2000 would perform an index seek for each row in the table x. If the number of rows in the table x is greater (such as your case), SQL Server 2000, would perform the index seek only once and use a "Table spool" to use the same result for each row from the table x. If there is no index on the y.id column, the threshold (for using a table spool) would be much smaller (around 10 rows, instead of a few hundred rows), because the index scans are much more costly than index seeks. To be sure what happens in your particular case, run your query with the "Include actual execution plan" option in Query Analyzer / Management Studio. Another way to do this would be: SELECT x.abc, y1.def as comment1, y2.def as comment2 FROM x, y as y1, y as y2 WHERE y1.id=123 and y2.id=456 But this may give different results, in the following cases: a) if there are more than one row in the table y with the specified id, the original query would return an error, but the above query would multiply the number of rows in the resultset b) if there is no row in the table y for one (or both) of the specified id-s, the original query would return a NULL in that column, but the above query would return no rows. The later problem can be solved by using a query like this: SELECT x.abc, y1.def as comment1, y2.def as comment2 FROM x LEFT JOIN y as y1 ON y1.id=123 LEFT JOIN y as y2 ON y2.id=456 But this query may still give different results if the id column is not a unique key in the y table. In some circumstances, the first query may give a better performance than the original query, but the second query usually has worse performance. However, I wouldn't say any of the above queries are better than the original query (because the different results can cause a lot of problems and the overall comprehensibility is much worse). Conclusion: use the original query, but make sure that you have an index on the y.id column. Razvan Thank you for the clarification.
Just want to better understand how sub-query works, if I modify the original script like this: select x.abc from x where x.id in (select y.id from y where y.def like 'a%') Let's assume there are 1 million records in table x, both tables have index. will sub-query (select id from y where def like 'a%') will be executed milliion times? if I make it a correlated sub query, will things be any different? select x.abc from x where x.id in (select y.id from y where y.def like 'a%' and y.date = x.date) Thanks in advance. The optimizer does not always create a plan that operates the way you
expect based on the way the query is structred. So subqueries can execute in all sorts of different ways. The only constraint it has is to produce the correct results. Some more specifc comments are included below. On 22 Jun 2006 08:15:36 -0700, trtran***@yahoo.ca wrote: Show quote >Thank you for the clarification. It depends. Since the subquery does not depend on the outer query the> >Just want to better understand how sub-query works, if I modify >the original script like this: > >select x.abc >from x >where x.id in (select y.id > from y > where y.def like 'a%') > >Let's assume there are 1 million records in table x, both tables have >index. >will sub-query (select id from y where def like 'a%') will be executed >milliion times? IN list can be resolved once. So, the optimizer could extract the list of matching values once and search it 1 million times. But suppose that the subquery against table Y returns 100000 rows. The optimizer probably can check statistics on Y.def to estimate this. earching through that many rows 1 million times probably is not a good idea. One alternative would be for it to build them into a temp table with an index. If table Y already is indexed on Y.id it might simply perform an index lookup 1 million times. >if I make it a correlated sub query, will things be any different? This will be completely different, if only because it defines a> >select x.abc >from x >where x.id in (select y.id > from y > where y.def like 'a%' > and y.date = x.date) different query! But lets consider that this query is equivelent to: select x.abc from x where EXISTS (select * from y where x.id = y.id and y.date = x.date and y.def like 'a%') The optimizer generally will come up with the same execution plan for both versions. But the plan will vary according to what indexes are in place, and what statistics it has about the data. >Thanks in advance. Good luck!Roy Harvey Beacon Falls, CT trtran***@yahoo.ca wrote:
Show quote > Thank you for the clarification. No, it won't. If the number of rows in the table x is so high, then SQL> > Just want to better understand how sub-query works, if I modify > the original script like this: > > select x.abc > from x > where x.id in (select y.id > from y > where y.def like 'a%') > > Let's assume there are 1 million records in table x, both tables have > index. > will sub-query (select id from y where def like 'a%') will be executed > milliion times? Server will use a "Table spool" operation so it performs the operation on table y only one time and then reuse the results for each row in table x. > if I make it a correlated sub query, will things be any different? Yes. If the subquery is corelated, the results cannot be reused (asthey can be different for each row of the table x). However, instead of executing the correlated subquery for a million times, the query optimizer may choose to: a) change the order in which the tables are considered (so it executes an operation on table x, for the number of rows from table y) or b) use some other method of joining (instead of "Nested Loops"), for example "Hash Join" or "Merge Join" (so it executes the operation on each table only one time). This join types are possible only if the condition that correlates the subquery is an "=", not other types of comparisons. Razvan Razvan Socol wrote:
> trtran***@yahoo.ca wrote: I wrote the above assuming the subquery will return a small number of> > will sub-query will be executed milliion times? > > No, it won't. If the number of rows in the table x is so high, then SQL > Server will use a "Table spool" operation so it performs the operation > on table y only one time and then reuse the results for each row in > table x. rows. If the optimizer estimates that the subquery will return a greater number of rows, then it will probably use a merge or hash join. The exact execution plan is influenced by many factors. To see what happens in a particular condition, execute the query in Query Analyzer / Management Studio with the "Include actual execution plan" option turned on (after making sure that the statistics are up-to-date, by running sp_updatestats). Razvan trtran***@yahoo.ca wrote:
Show quote > Hi Group, Another way would be:> > I have 2 tables: x and y. this is my query > > select x.abc, > (select y.def from y where y.id=123) as comment1, > (select y.def from y where y.id=456) as comment2 > from x > where x.id=789 > > let's say, there are 1 million records in table x, how many times this > sub-query > will be executed? what is the better way to do it? > > select y.def from y where y.id=123 > > Thanks. > > Dean > SELECT x.abc, y1.def AS comment1, y2.def AS comment2 FROM x INNER JOIN y AS y1 ON y1.id = 123 INNER JOIN y AS y2 ON y2.id = 456 WHERE x.id = 789 This assumes both records (id = 123 and id = 456) exist. Compare the query plan for this to your original query, determine which is best. Impossible to predict without knowing your schema and what indexes are available. |
|||||||||||||||||||||||