Home All Groups Group Topic Archive Search About
Author
3 Sep 2005 8:09 PM
Paul Pedersen
Working my way through t-sql here...

In a book, I was given this example:

SELECT Pub_name
FROM Publishers
WHERE EXISTS
    (SELECT * FROM Titles
    WHERE Titles.Pub_id = Publishers.Pub_id
    AND Type = 'business')


I found that a little difficult to interpret at first, and also it seems
inefficient, with that "SELECT *" clause - I don't care about all the fields
after all; only Pub_id.

In my inexperience, it seems to me that this query would be preferable from
a couple of standpoints:

SELECT Pub_name FROM Publishers
WHERE Pub_id IN
(SELECT Pub_id FROM Titles WHERE Type = 'business')


Will these two queries return the same result? Unfortunately I do not have
the data set, so I cannot run them.

Is there a difference in efficiency between the two? Is either preferable to
the other for some reason?

Any other considerations?

Author
3 Sep 2005 8:22 PM
Tibor Karaszi
They will probably give same execution plan = same performance. But you are never 100% sure without
trying. They will produce same result (NOT IN can give surprising result if inner query has NULL in
result). SELECT * is not inefficient, as SQL Server will never materialize any rows from the inner
SELECT, it is only an existence check.

Show quote
"Paul Pedersen" <no-re***@swen.com> wrote in message news:udIYyNMsFHA.3264@TK2MSFTNGP12.phx.gbl...
> Working my way through t-sql here...
>
> In a book, I was given this example:
>
> SELECT Pub_name
> FROM Publishers
> WHERE EXISTS
>    (SELECT * FROM Titles
>    WHERE Titles.Pub_id = Publishers.Pub_id
>    AND Type = 'business')
>
>
> I found that a little difficult to interpret at first, and also it seems inefficient, with that
> "SELECT *" clause - I don't care about all the fields after all; only Pub_id.
>
> In my inexperience, it seems to me that this query would be preferable from a couple of
> standpoints:
>
> SELECT Pub_name FROM Publishers
> WHERE Pub_id IN
> (SELECT Pub_id FROM Titles WHERE Type = 'business')
>
>
> Will these two queries return the same result? Unfortunately I do not have the data set, so I
> cannot run them.
>
> Is there a difference in efficiency between the two? Is either preferable to the other for some
> reason?
>
> Any other considerations?
>
>
>
Author
3 Sep 2005 9:22 PM
Brian Selzer
Both produce the same execution plan.  In general, you should avoid
correlated subqueries whenever possible, because they frequently insert an
iteration step into the execution plan.  Often a correlated subquery can be
decomposed into a simple inner join, which gives the optimizer many
different options on how to satisfy the query.  It is also easier to read
and understand a T-SQL statement that contains a simple join than one that
contains a correlated subquery.  Having had the optimizer blow up on me a
couple times, I've begun to lose faith with its ability to choose a correct
plan in the face of an increasingly complex T-SQL statement.  I therefore
recommend keeping the queries as simple as possible.  If it's possible to
decompose a correlated subquery into a simple join, then you should by all
means do it.  This makes it possible to coerce the optimizer with join hints
(LOOP, HASH, MERGE) if necessary.

This produces the same result as both of the queries you specified, but
removes duplicates after the join.

SELECT DISTINCT Pub_name
    FROM Publishers
        JOIN Titles
            ON (Titles.Pub_id = Publishers.Pub_id)
    WHERE Titles.Type = 'business'

This produces the same execution plan as both of the queries you specified,
but enables the use of join hints:

SELECT Pub_name
    FROM Publishers
        JOIN (SELECT DISTINCT Pub_id
                    FROM Titles
                    WHERE Titles.Type = 'business') Titles
            ON (Titles.Pub_id = Publishers.Pub_id)



Show quote
"Paul Pedersen" <no-re***@swen.com> wrote in message
news:udIYyNMsFHA.3264@TK2MSFTNGP12.phx.gbl...
> Working my way through t-sql here...
>
> In a book, I was given this example:
>
> SELECT Pub_name
> FROM Publishers
> WHERE EXISTS
>     (SELECT * FROM Titles
>     WHERE Titles.Pub_id = Publishers.Pub_id
>     AND Type = 'business')
>
>
> I found that a little difficult to interpret at first, and also it seems
> inefficient, with that "SELECT *" clause - I don't care about all the
fields
> after all; only Pub_id.
>
> In my inexperience, it seems to me that this query would be preferable
from
> a couple of standpoints:
>
> SELECT Pub_name FROM Publishers
> WHERE Pub_id IN
> (SELECT Pub_id FROM Titles WHERE Type = 'business')
>
>
> Will these two queries return the same result? Unfortunately I do not have
> the data set, so I cannot run them.
>
> Is there a difference in efficiency between the two? Is either preferable
to
> the other for some reason?
>
> Any other considerations?
>
>
>
Author
4 Sep 2005 10:40 PM
Daniel Walzenbach
Brian,

thanks for your detailed explanation! Can you tell me why your second query
enables the use of join hints?

Thank you!

   Daniel Walzenbach

Show quote
"Brian Selzer" <br***@selzer-software.com> schrieb im Newsbeitrag
news:e9LzN2MsFHA.912@TK2MSFTNGP11.phx.gbl...
> Both produce the same execution plan.  In general, you should avoid
> correlated subqueries whenever possible, because they frequently insert an
> iteration step into the execution plan.  Often a correlated subquery can
> be
> decomposed into a simple inner join, which gives the optimizer many
> different options on how to satisfy the query.  It is also easier to read
> and understand a T-SQL statement that contains a simple join than one that
> contains a correlated subquery.  Having had the optimizer blow up on me a
> couple times, I've begun to lose faith with its ability to choose a
> correct
> plan in the face of an increasingly complex T-SQL statement.  I therefore
> recommend keeping the queries as simple as possible.  If it's possible to
> decompose a correlated subquery into a simple join, then you should by all
> means do it.  This makes it possible to coerce the optimizer with join
> hints
> (LOOP, HASH, MERGE) if necessary.
>
> This produces the same result as both of the queries you specified, but
> removes duplicates after the join.
>
> SELECT DISTINCT Pub_name
>    FROM Publishers
>        JOIN Titles
>            ON (Titles.Pub_id = Publishers.Pub_id)
>    WHERE Titles.Type = 'business'
>
> This produces the same execution plan as both of the queries you
> specified,
> but enables the use of join hints:
>
> SELECT Pub_name
>    FROM Publishers
>        JOIN (SELECT DISTINCT Pub_id
>                    FROM Titles
>                    WHERE Titles.Type = 'business') Titles
>            ON (Titles.Pub_id = Publishers.Pub_id)
>
>
>
> "Paul Pedersen" <no-re***@swen.com> wrote in message
> news:udIYyNMsFHA.3264@TK2MSFTNGP12.phx.gbl...
>> Working my way through t-sql here...
>>
>> In a book, I was given this example:
>>
>> SELECT Pub_name
>> FROM Publishers
>> WHERE EXISTS
>>     (SELECT * FROM Titles
>>     WHERE Titles.Pub_id = Publishers.Pub_id
>>     AND Type = 'business')
>>
>>
>> I found that a little difficult to interpret at first, and also it seems
>> inefficient, with that "SELECT *" clause - I don't care about all the
> fields
>> after all; only Pub_id.
>>
>> In my inexperience, it seems to me that this query would be preferable
> from
>> a couple of standpoints:
>>
>> SELECT Pub_name FROM Publishers
>> WHERE Pub_id IN
>> (SELECT Pub_id FROM Titles WHERE Type = 'business')
>>
>>
>> Will these two queries return the same result? Unfortunately I do not
>> have
>> the data set, so I cannot run them.
>>
>> Is there a difference in efficiency between the two? Is either preferable
> to
>> the other for some reason?
>>
>> Any other considerations?
>>
>>
>>
>
>
Author
5 Sep 2005 1:08 AM
Brian Selzer
Instead of

SELECT Pub_name
    FROM Publishers
        JOIN (SELECT DISTINCT Pub_id
                    FROM Titles
                    WHERE Titles.Type = 'business') Titles
            ON (Titles.Pub_id = Publishers.Pub_id)

you can use something like:

SELECT Pub_name
    FROM Publishers
        INNER MERGE JOIN (SELECT DISTINCT Pub_id
                    FROM Titles
                    WHERE Titles.Type = 'business') Titles
            ON (Titles.Pub_id = Publishers.Pub_id)

to coerce the optimizer.

Note: this method is more granular than the OPTION clause if you're joining
more than two tables.


Show quote
"Daniel Walzenbach" <daniel.walzenbach@newsgroup.nospam> wrote in message
news:OacQtGasFHA.2212@TK2MSFTNGP15.phx.gbl...
> Brian,
>
> thanks for your detailed explanation! Can you tell me why your second
query
> enables the use of join hints?
>
> Thank you!
>
>    Daniel Walzenbach
>
> "Brian Selzer" <br***@selzer-software.com> schrieb im Newsbeitrag
> news:e9LzN2MsFHA.912@TK2MSFTNGP11.phx.gbl...
> > Both produce the same execution plan.  In general, you should avoid
> > correlated subqueries whenever possible, because they frequently insert
an
> > iteration step into the execution plan.  Often a correlated subquery can
> > be
> > decomposed into a simple inner join, which gives the optimizer many
> > different options on how to satisfy the query.  It is also easier to
read
> > and understand a T-SQL statement that contains a simple join than one
that
> > contains a correlated subquery.  Having had the optimizer blow up on me
a
> > couple times, I've begun to lose faith with its ability to choose a
> > correct
> > plan in the face of an increasingly complex T-SQL statement.  I
therefore
> > recommend keeping the queries as simple as possible.  If it's possible
to
> > decompose a correlated subquery into a simple join, then you should by
all
> > means do it.  This makes it possible to coerce the optimizer with join
> > hints
> > (LOOP, HASH, MERGE) if necessary.
> >
> > This produces the same result as both of the queries you specified, but
> > removes duplicates after the join.
> >
> > SELECT DISTINCT Pub_name
> >    FROM Publishers
> >        JOIN Titles
> >            ON (Titles.Pub_id = Publishers.Pub_id)
> >    WHERE Titles.Type = 'business'
> >
> > This produces the same execution plan as both of the queries you
> > specified,
> > but enables the use of join hints:
> >
> > SELECT Pub_name
> >    FROM Publishers
> >        JOIN (SELECT DISTINCT Pub_id
> >                    FROM Titles
> >                    WHERE Titles.Type = 'business') Titles
> >            ON (Titles.Pub_id = Publishers.Pub_id)
> >
> >
> >
> > "Paul Pedersen" <no-re***@swen.com> wrote in message
> > news:udIYyNMsFHA.3264@TK2MSFTNGP12.phx.gbl...
> >> Working my way through t-sql here...
> >>
> >> In a book, I was given this example:
> >>
> >> SELECT Pub_name
> >> FROM Publishers
> >> WHERE EXISTS
> >>     (SELECT * FROM Titles
> >>     WHERE Titles.Pub_id = Publishers.Pub_id
> >>     AND Type = 'business')
> >>
> >>
> >> I found that a little difficult to interpret at first, and also it
seems
> >> inefficient, with that "SELECT *" clause - I don't care about all the
> > fields
> >> after all; only Pub_id.
> >>
> >> In my inexperience, it seems to me that this query would be preferable
> > from
> >> a couple of standpoints:
> >>
> >> SELECT Pub_name FROM Publishers
> >> WHERE Pub_id IN
> >> (SELECT Pub_id FROM Titles WHERE Type = 'business')
> >>
> >>
> >> Will these two queries return the same result? Unfortunately I do not
> >> have
> >> the data set, so I cannot run them.
> >>
> >> Is there a difference in efficiency between the two? Is either
preferable
> > to
> >> the other for some reason?
> >>
> >> Any other considerations?
> >>
> >>
> >>
> >
> >
>
>
Author
6 Sep 2005 12:15 AM
Paul Pedersen
It wasn't as simple a question as I imagined... Thanks to both of you.


Show quote
"Brian Selzer" <br***@selzer-software.com> wrote in message
news:e9LzN2MsFHA.912@TK2MSFTNGP11.phx.gbl...
> Both produce the same execution plan.  In general, you should avoid
> correlated subqueries whenever possible, because they frequently insert an
> iteration step into the execution plan.  Often a correlated subquery can
> be
> decomposed into a simple inner join, which gives the optimizer many
> different options on how to satisfy the query.  It is also easier to read
> and understand a T-SQL statement that contains a simple join than one that
> contains a correlated subquery.  Having had the optimizer blow up on me a
> couple times, I've begun to lose faith with its ability to choose a
> correct
> plan in the face of an increasingly complex T-SQL statement.  I therefore
> recommend keeping the queries as simple as possible.  If it's possible to
> decompose a correlated subquery into a simple join, then you should by all
> means do it.  This makes it possible to coerce the optimizer with join
> hints
> (LOOP, HASH, MERGE) if necessary.
>
> This produces the same result as both of the queries you specified, but
> removes duplicates after the join.
>
> SELECT DISTINCT Pub_name
>    FROM Publishers
>        JOIN Titles
>            ON (Titles.Pub_id = Publishers.Pub_id)
>    WHERE Titles.Type = 'business'
>
> This produces the same execution plan as both of the queries you
> specified,
> but enables the use of join hints:
>
> SELECT Pub_name
>    FROM Publishers
>        JOIN (SELECT DISTINCT Pub_id
>                    FROM Titles
>                    WHERE Titles.Type = 'business') Titles
>            ON (Titles.Pub_id = Publishers.Pub_id)
>
>
>
> "Paul Pedersen" <no-re***@swen.com> wrote in message
> news:udIYyNMsFHA.3264@TK2MSFTNGP12.phx.gbl...
>> Working my way through t-sql here...
>>
>> In a book, I was given this example:
>>
>> SELECT Pub_name
>> FROM Publishers
>> WHERE EXISTS
>>     (SELECT * FROM Titles
>>     WHERE Titles.Pub_id = Publishers.Pub_id
>>     AND Type = 'business')
>>
>>
>> I found that a little difficult to interpret at first, and also it seems
>> inefficient, with that "SELECT *" clause - I don't care about all the
> fields
>> after all; only Pub_id.
>>
>> In my inexperience, it seems to me that this query would be preferable
> from
>> a couple of standpoints:
>>
>> SELECT Pub_name FROM Publishers
>> WHERE Pub_id IN
>> (SELECT Pub_id FROM Titles WHERE Type = 'business')
>>
>>
>> Will these two queries return the same result? Unfortunately I do not
>> have
>> the data set, so I cannot run them.
>>
>> Is there a difference in efficiency between the two? Is either preferable
> to
>> the other for some reason?
>>
>> Any other considerations?
>>
>>
>>
>
>

AddThis Social Bookmark Button