|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Understanding EXISTSIn a book, I was given this example: SELECT Pub_name FROM Publishers WHERE EXISTS (SELECT * FROM Titles WHERE Titles.Pub_id = Publishers.Pub_id AND Type = 'business') I found that a little difficult to interpret at first, and also it seems inefficient, with that "SELECT *" clause - I don't care about all the fields after all; only Pub_id. In my inexperience, it seems to me that this query would be preferable from a couple of standpoints: SELECT Pub_name FROM Publishers WHERE Pub_id IN (SELECT Pub_id FROM Titles WHERE Type = 'business') Will these two queries return the same result? Unfortunately I do not have the data set, so I cannot run them. Is there a difference in efficiency between the two? Is either preferable to the other for some reason? Any other considerations? They will probably give same execution plan = same performance. But you are never 100% sure without
trying. They will produce same result (NOT IN can give surprising result if inner query has NULL in result). SELECT * is not inefficient, as SQL Server will never materialize any rows from the inner SELECT, it is only an existence check. -- Show quoteTibor Karaszi, SQL Server MVP http://www.karaszi.com/sqlserver/default.asp http://www.solidqualitylearning.com/ Blog: http://solidqualitylearning.com/blogs/tibor/ "Paul Pedersen" <no-re***@swen.com> wrote in message news:udIYyNMsFHA.3264@TK2MSFTNGP12.phx.gbl... > Working my way through t-sql here... > > In a book, I was given this example: > > SELECT Pub_name > FROM Publishers > WHERE EXISTS > (SELECT * FROM Titles > WHERE Titles.Pub_id = Publishers.Pub_id > AND Type = 'business') > > > I found that a little difficult to interpret at first, and also it seems inefficient, with that > "SELECT *" clause - I don't care about all the fields after all; only Pub_id. > > In my inexperience, it seems to me that this query would be preferable from a couple of > standpoints: > > SELECT Pub_name FROM Publishers > WHERE Pub_id IN > (SELECT Pub_id FROM Titles WHERE Type = 'business') > > > Will these two queries return the same result? Unfortunately I do not have the data set, so I > cannot run them. > > Is there a difference in efficiency between the two? Is either preferable to the other for some > reason? > > Any other considerations? > > > Both produce the same execution plan. In general, you should avoid
correlated subqueries whenever possible, because they frequently insert an iteration step into the execution plan. Often a correlated subquery can be decomposed into a simple inner join, which gives the optimizer many different options on how to satisfy the query. It is also easier to read and understand a T-SQL statement that contains a simple join than one that contains a correlated subquery. Having had the optimizer blow up on me a couple times, I've begun to lose faith with its ability to choose a correct plan in the face of an increasingly complex T-SQL statement. I therefore recommend keeping the queries as simple as possible. If it's possible to decompose a correlated subquery into a simple join, then you should by all means do it. This makes it possible to coerce the optimizer with join hints (LOOP, HASH, MERGE) if necessary. This produces the same result as both of the queries you specified, but removes duplicates after the join. SELECT DISTINCT Pub_name FROM Publishers JOIN Titles ON (Titles.Pub_id = Publishers.Pub_id) WHERE Titles.Type = 'business' This produces the same execution plan as both of the queries you specified, but enables the use of join hints: SELECT Pub_name FROM Publishers JOIN (SELECT DISTINCT Pub_id FROM Titles WHERE Titles.Type = 'business') Titles ON (Titles.Pub_id = Publishers.Pub_id) Show quote "Paul Pedersen" <no-re***@swen.com> wrote in message news:udIYyNMsFHA.3264@TK2MSFTNGP12.phx.gbl... > Working my way through t-sql here... > > In a book, I was given this example: > > SELECT Pub_name > FROM Publishers > WHERE EXISTS > (SELECT * FROM Titles > WHERE Titles.Pub_id = Publishers.Pub_id > AND Type = 'business') > > > I found that a little difficult to interpret at first, and also it seems > inefficient, with that "SELECT *" clause - I don't care about all the fields > after all; only Pub_id. > > In my inexperience, it seems to me that this query would be preferable from > a couple of standpoints: > > SELECT Pub_name FROM Publishers > WHERE Pub_id IN > (SELECT Pub_id FROM Titles WHERE Type = 'business') > > > Will these two queries return the same result? Unfortunately I do not have > the data set, so I cannot run them. > > Is there a difference in efficiency between the two? Is either preferable to > the other for some reason? > > Any other considerations? > > > Brian,
thanks for your detailed explanation! Can you tell me why your second query enables the use of join hints? Thank you! Daniel Walzenbach Show quote "Brian Selzer" <br***@selzer-software.com> schrieb im Newsbeitrag news:e9LzN2MsFHA.912@TK2MSFTNGP11.phx.gbl... > Both produce the same execution plan. In general, you should avoid > correlated subqueries whenever possible, because they frequently insert an > iteration step into the execution plan. Often a correlated subquery can > be > decomposed into a simple inner join, which gives the optimizer many > different options on how to satisfy the query. It is also easier to read > and understand a T-SQL statement that contains a simple join than one that > contains a correlated subquery. Having had the optimizer blow up on me a > couple times, I've begun to lose faith with its ability to choose a > correct > plan in the face of an increasingly complex T-SQL statement. I therefore > recommend keeping the queries as simple as possible. If it's possible to > decompose a correlated subquery into a simple join, then you should by all > means do it. This makes it possible to coerce the optimizer with join > hints > (LOOP, HASH, MERGE) if necessary. > > This produces the same result as both of the queries you specified, but > removes duplicates after the join. > > SELECT DISTINCT Pub_name > FROM Publishers > JOIN Titles > ON (Titles.Pub_id = Publishers.Pub_id) > WHERE Titles.Type = 'business' > > This produces the same execution plan as both of the queries you > specified, > but enables the use of join hints: > > SELECT Pub_name > FROM Publishers > JOIN (SELECT DISTINCT Pub_id > FROM Titles > WHERE Titles.Type = 'business') Titles > ON (Titles.Pub_id = Publishers.Pub_id) > > > > "Paul Pedersen" <no-re***@swen.com> wrote in message > news:udIYyNMsFHA.3264@TK2MSFTNGP12.phx.gbl... >> Working my way through t-sql here... >> >> In a book, I was given this example: >> >> SELECT Pub_name >> FROM Publishers >> WHERE EXISTS >> (SELECT * FROM Titles >> WHERE Titles.Pub_id = Publishers.Pub_id >> AND Type = 'business') >> >> >> I found that a little difficult to interpret at first, and also it seems >> inefficient, with that "SELECT *" clause - I don't care about all the > fields >> after all; only Pub_id. >> >> In my inexperience, it seems to me that this query would be preferable > from >> a couple of standpoints: >> >> SELECT Pub_name FROM Publishers >> WHERE Pub_id IN >> (SELECT Pub_id FROM Titles WHERE Type = 'business') >> >> >> Will these two queries return the same result? Unfortunately I do not >> have >> the data set, so I cannot run them. >> >> Is there a difference in efficiency between the two? Is either preferable > to >> the other for some reason? >> >> Any other considerations? >> >> >> > > Instead of
SELECT Pub_name FROM Publishers JOIN (SELECT DISTINCT Pub_id FROM Titles WHERE Titles.Type = 'business') Titles ON (Titles.Pub_id = Publishers.Pub_id) you can use something like: SELECT Pub_name FROM Publishers INNER MERGE JOIN (SELECT DISTINCT Pub_id FROM Titles WHERE Titles.Type = 'business') Titles ON (Titles.Pub_id = Publishers.Pub_id) to coerce the optimizer. Note: this method is more granular than the OPTION clause if you're joining more than two tables. Show quote "Daniel Walzenbach" <daniel.walzenbach@newsgroup.nospam> wrote in message news:OacQtGasFHA.2212@TK2MSFTNGP15.phx.gbl... > Brian, > > thanks for your detailed explanation! Can you tell me why your second query > enables the use of join hints? > > Thank you! > > Daniel Walzenbach > > "Brian Selzer" <br***@selzer-software.com> schrieb im Newsbeitrag > news:e9LzN2MsFHA.912@TK2MSFTNGP11.phx.gbl... > > Both produce the same execution plan. In general, you should avoid > > correlated subqueries whenever possible, because they frequently insert an > > iteration step into the execution plan. Often a correlated subquery can > > be > > decomposed into a simple inner join, which gives the optimizer many > > different options on how to satisfy the query. It is also easier to read > > and understand a T-SQL statement that contains a simple join than one that > > contains a correlated subquery. Having had the optimizer blow up on me a > > couple times, I've begun to lose faith with its ability to choose a > > correct > > plan in the face of an increasingly complex T-SQL statement. I therefore > > recommend keeping the queries as simple as possible. If it's possible to > > decompose a correlated subquery into a simple join, then you should by all > > means do it. This makes it possible to coerce the optimizer with join > > hints > > (LOOP, HASH, MERGE) if necessary. > > > > This produces the same result as both of the queries you specified, but > > removes duplicates after the join. > > > > SELECT DISTINCT Pub_name > > FROM Publishers > > JOIN Titles > > ON (Titles.Pub_id = Publishers.Pub_id) > > WHERE Titles.Type = 'business' > > > > This produces the same execution plan as both of the queries you > > specified, > > but enables the use of join hints: > > > > SELECT Pub_name > > FROM Publishers > > JOIN (SELECT DISTINCT Pub_id > > FROM Titles > > WHERE Titles.Type = 'business') Titles > > ON (Titles.Pub_id = Publishers.Pub_id) > > > > > > > > "Paul Pedersen" <no-re***@swen.com> wrote in message > > news:udIYyNMsFHA.3264@TK2MSFTNGP12.phx.gbl... > >> Working my way through t-sql here... > >> > >> In a book, I was given this example: > >> > >> SELECT Pub_name > >> FROM Publishers > >> WHERE EXISTS > >> (SELECT * FROM Titles > >> WHERE Titles.Pub_id = Publishers.Pub_id > >> AND Type = 'business') > >> > >> > >> I found that a little difficult to interpret at first, and also it seems > >> inefficient, with that "SELECT *" clause - I don't care about all the > > fields > >> after all; only Pub_id. > >> > >> In my inexperience, it seems to me that this query would be preferable > > from > >> a couple of standpoints: > >> > >> SELECT Pub_name FROM Publishers > >> WHERE Pub_id IN > >> (SELECT Pub_id FROM Titles WHERE Type = 'business') > >> > >> > >> Will these two queries return the same result? Unfortunately I do not > >> have > >> the data set, so I cannot run them. > >> > >> Is there a difference in efficiency between the two? Is either preferable > > to > >> the other for some reason? > >> > >> Any other considerations? > >> > >> > >> > > > > > > It wasn't as simple a question as I imagined... Thanks to both of you.
Show quote "Brian Selzer" <br***@selzer-software.com> wrote in message news:e9LzN2MsFHA.912@TK2MSFTNGP11.phx.gbl... > Both produce the same execution plan. In general, you should avoid > correlated subqueries whenever possible, because they frequently insert an > iteration step into the execution plan. Often a correlated subquery can > be > decomposed into a simple inner join, which gives the optimizer many > different options on how to satisfy the query. It is also easier to read > and understand a T-SQL statement that contains a simple join than one that > contains a correlated subquery. Having had the optimizer blow up on me a > couple times, I've begun to lose faith with its ability to choose a > correct > plan in the face of an increasingly complex T-SQL statement. I therefore > recommend keeping the queries as simple as possible. If it's possible to > decompose a correlated subquery into a simple join, then you should by all > means do it. This makes it possible to coerce the optimizer with join > hints > (LOOP, HASH, MERGE) if necessary. > > This produces the same result as both of the queries you specified, but > removes duplicates after the join. > > SELECT DISTINCT Pub_name > FROM Publishers > JOIN Titles > ON (Titles.Pub_id = Publishers.Pub_id) > WHERE Titles.Type = 'business' > > This produces the same execution plan as both of the queries you > specified, > but enables the use of join hints: > > SELECT Pub_name > FROM Publishers > JOIN (SELECT DISTINCT Pub_id > FROM Titles > WHERE Titles.Type = 'business') Titles > ON (Titles.Pub_id = Publishers.Pub_id) > > > > "Paul Pedersen" <no-re***@swen.com> wrote in message > news:udIYyNMsFHA.3264@TK2MSFTNGP12.phx.gbl... >> Working my way through t-sql here... >> >> In a book, I was given this example: >> >> SELECT Pub_name >> FROM Publishers >> WHERE EXISTS >> (SELECT * FROM Titles >> WHERE Titles.Pub_id = Publishers.Pub_id >> AND Type = 'business') >> >> >> I found that a little difficult to interpret at first, and also it seems >> inefficient, with that "SELECT *" clause - I don't care about all the > fields >> after all; only Pub_id. >> >> In my inexperience, it seems to me that this query would be preferable > from >> a couple of standpoints: >> >> SELECT Pub_name FROM Publishers >> WHERE Pub_id IN >> (SELECT Pub_id FROM Titles WHERE Type = 'business') >> >> >> Will these two queries return the same result? Unfortunately I do not >> have >> the data set, so I cannot run them. >> >> Is there a difference in efficiency between the two? Is either preferable > to >> the other for some reason? >> >> Any other considerations? >> >> >> > > |
|||||||||||||||||||||||