Home All Groups Group Topic Archive Search About

Subquery performance concern

Author
22 Jun 2006 1:30 PM
trtranger
Hi Group,

I have 2 tables: x and y. this is my query

select x.abc,
         (select y.def from y where y.id=123) as comment1,
         (select y.def from y where y.id=456) as comment2
from   x
where x.id=789

let's say, there are 1 million records in table x, how many times this
sub-query
will be executed? what is the better way to do it?

select y.def from y where y.id=123

Thanks.

Dean

Author
22 Jun 2006 1:56 PM
Roy Harvey
Each subquery will be executed just once, as they do not depend on the
outer query at all.

Roy Harvey
Beacon Falls, CT


On 22 Jun 2006 06:30:19 -0700, trtran***@yahoo.ca wrote:

Show quote
>Hi Group,
>
>I have 2 tables: x and y. this is my query
>
>select x.abc,
>         (select y.def from y where y.id=123) as comment1,
>         (select y.def from y where y.id=456) as comment2
>from   x
>where x.id=789
>
>let's say, there are 1 million records in table x, how many times this
>sub-query
>will be executed? what is the better way to do it?
>
>select y.def from y where y.id=123
>
>Thanks.
>
>Dean
Author
22 Jun 2006 2:04 PM
Razvan Socol
Hello, trtranger

If the number of rows in the table x is small (less than a few hundred
rows) and if there is an index on the y.id column, SQL Server 2000
would perform an index seek for each row in the table x. If the number
of rows in the table x is greater (such as your case), SQL Server 2000,
would perform the index seek only once and use a "Table spool" to use
the same result for each row from the table x. If there is no index on
the y.id column, the threshold (for using a table spool) would be much
smaller (around 10 rows, instead of a few hundred rows), because the
index scans are much more costly than index seeks.

To be sure what happens in your particular case, run your query with
the "Include actual execution plan" option in Query Analyzer /
Management Studio.

Another way to do this would be:

SELECT x.abc, y1.def as comment1, y2.def as comment2
FROM x, y as y1, y as y2
WHERE y1.id=123 and y2.id=456

But this may give different results, in the following cases:
a) if there are more than one row in the table y with the specified id,
the original query would return an error, but the above query would
multiply the number of rows in the resultset
b) if there is no row in the table y for one (or both) of the specified
id-s, the original query would return a NULL in that column, but the
above query would return no rows.

The later problem can be solved by using a query like this:

SELECT x.abc, y1.def as comment1, y2.def as comment2
FROM x
LEFT JOIN y as y1 ON y1.id=123
LEFT JOIN y as y2 ON y2.id=456

But this query may still give different results if the id column is not
a unique key in the y table.

In some circumstances, the first query may give a better performance
than the original query, but the second query usually has worse
performance.

However, I wouldn't say any of the above queries are better than the
original query (because the different results can cause a lot of
problems and the overall comprehensibility is much worse). Conclusion:
use the original query, but make sure that you have an index on the
y.id column.

Razvan
Author
22 Jun 2006 3:15 PM
trtranger
Thank you for the clarification.

Just want to better understand how sub-query works,  if I modify
the original script like this:

select x.abc
from   x
where x.id in (select y.id
               from    y
               where  y.def like 'a%')

Let's assume there are 1 million records in table x, both tables have
index.
will sub-query (select id from y where  def like 'a%') will be executed
milliion times?

if I make it a correlated sub query, will things be any different?

select x.abc
from   x
where x.id in (select y.id
               from    y
               where  y.def like 'a%'
               and     y.date = x.date)

Thanks in advance.
Author
22 Jun 2006 3:51 PM
Roy Harvey
The optimizer does not always create a plan that operates the way you
expect based on the way the query is structred.  So subqueries can
execute in all sorts of different ways.  The only constraint it has is
to produce the correct results.  Some more specifc comments are
included below.

On 22 Jun 2006 08:15:36 -0700, trtran***@yahoo.ca wrote:

Show quote
>Thank you for the clarification.
>
>Just want to better understand how sub-query works,  if I modify
>the original script like this:
>
>select x.abc
>from   x
>where x.id in (select y.id
>               from    y
>               where  y.def like 'a%')
>
>Let's assume there are 1 million records in table x, both tables have
>index.
>will sub-query (select id from y where  def like 'a%') will be executed
>milliion times?

It depends.  Since the subquery does not depend on the outer query the
IN list can be resolved once.  So, the optimizer could extract the
list of matching values once and search it 1 million times.

But suppose that the subquery against table Y returns 100000 rows. The
optimizer probably can check statistics on Y.def to estimate this.
earching through that many rows 1 million times probably is not a good
idea.  One alternative would be for it to build them into a temp table
with an index.  If table Y already is indexed on Y.id it might simply
perform an index lookup 1 million times.

>if I make it a correlated sub query, will things be any different?
>
>select x.abc
>from   x
>where x.id in (select y.id
>               from    y
>               where  y.def like 'a%'
>               and     y.date = x.date)

This will be completely different, if only because it defines a
different query!  But lets consider that this query is equivelent to:

select x.abc
from   x
where EXISTS
      (select * from y
        where x.id = y.id
          and y.date = x.date
          and y.def like 'a%')

The optimizer generally will come up with the same execution plan for
both versions.  But the plan will vary according to what indexes are
in place, and what statistics it has about the data.

>Thanks in advance.

Good luck!

Roy Harvey
Beacon Falls, CT
Author
22 Jun 2006 3:56 PM
Razvan Socol
trtran***@yahoo.ca wrote:
Show quote
> Thank you for the clarification.
>
> Just want to better understand how sub-query works,  if I modify
> the original script like this:
>
> select x.abc
> from   x
> where x.id in (select y.id
>                from    y
>                where  y.def like 'a%')
>
> Let's assume there are 1 million records in table x, both tables have
> index.
> will sub-query (select id from y where  def like 'a%') will be executed
> milliion times?

No, it won't. If the number of rows in the table x is so high, then SQL
Server will use a "Table spool" operation so it performs the operation
on table y only one time and then reuse the results for each row in
table x.


> if I make it a correlated sub query, will things be any different?

Yes. If the subquery is corelated, the results cannot be reused (as
they can be different for each row of the table x). However, instead of
executing the correlated subquery for a million times, the query
optimizer may choose to:
a) change the order in which the tables are considered (so it executes
an operation on table x, for the number of rows from table y)
or
b) use some other method of joining (instead of "Nested Loops"), for
example "Hash Join" or "Merge Join" (so it executes the operation on
each table only one time). This join types are possible only if the
condition that correlates the subquery is an "=", not other types of
comparisons.

Razvan
Author
22 Jun 2006 4:08 PM
Razvan Socol
Razvan Socol wrote:
> trtran***@yahoo.ca wrote:
> > will sub-query will be executed milliion times?
>
> No, it won't. If the number of rows in the table x is so high, then SQL
> Server will use a "Table spool" operation so it performs the operation
> on table y only one time and then reuse the results for each row in
> table x.

I wrote the above assuming the subquery will return a small number of
rows. If the optimizer estimates that the subquery will return a
greater number of rows, then it will probably use a merge or hash join.

The exact execution plan is influenced by many factors. To see what
happens in a particular condition, execute the query in Query Analyzer
/ Management Studio with the "Include actual execution plan" option
turned on (after making sure that the statistics are up-to-date, by
running sp_updatestats).

Razvan
Author
22 Jun 2006 2:36 PM
Tracy McKibben
trtran***@yahoo.ca wrote:
Show quote
> Hi Group,
>
> I have 2 tables: x and y. this is my query
>
> select x.abc,
>          (select y.def from y where y.id=123) as comment1,
>          (select y.def from y where y.id=456) as comment2
> from   x
> where x.id=789
>
> let's say, there are 1 million records in table x, how many times this
> sub-query
> will be executed? what is the better way to do it?
>
> select y.def from y where y.id=123
>
> Thanks.
>
> Dean
>

Another way would be:

SELECT
    x.abc,
    y1.def AS comment1,
    y2.def AS comment2
FROM x
INNER JOIN y AS y1
    ON y1.id = 123
INNER JOIN y AS y2
    ON y2.id = 456
WHERE x.id = 789

This assumes both records (id = 123 and id = 456) exist.  Compare the
query plan for this to your original query, determine which is best.
Impossible to predict without knowing your schema and what indexes are
available.

AddThis Social Bookmark Button