Home All Groups Group Topic Archive Search About
Author
12 Aug 2005 4:30 AM
psandler70
Hello all,

I was writing a demo case for instructional purposes on the virtues of
using joins instead of subqueries, and I was planning on using test
cases like these (using Northwind):

Code:
SELECT
    o.orderid,
    o.orderdate,
    (SELECT companyname FROM shippers s WHERE s.shipperid = o.shipvia)
companyname,
    (SELECT lastname + ', ' + firstname FROM employees e WHERE
e.employeeid = o.employeeid) FullName
FROM
    orders o
WHERE
    o.orderdate between '1-1-1996' and '12-31-1996'
ORDER BY
    o.orderdate



Code:
SELECT
    o.orderid,
    o.orderdate,
    s.companyname,
    e.lastname + ', ' + e.firstname FullName
FROM
    orders o
    INNER JOIN shippers s ON s.shipperid = o.shipvia
    INNER JOIN employees e ON e.employeeid = o.employeeid
WHERE
    o.orderdate between '1-1-1996' and '12-31-1996'
ORDER BY
    o.orderdate


The first uses subqueries, the second uses joins.  (Please ignore for
the moment that I would probably never return first + last name in this
manner, or that this query is inane to begin with.)

The point was going to be: don't use a subquery when you can use a
join, as it's less efficient.

If you run these queries as-is, the execution plan shows a (slightly)
better "cost" efficiency for the one that uses joins.

However, if you remove the WHERE and ORDER BY clauses, the subquery has
a (significantly!) better cost efficiency (running them together in QA
and then checking the combined execution plan is the easiest way to get
an idea of relative costs, if you want to try this yourself).

Anyone know why the subqueries would be more efficient in this case?

If the tables had 50,000 records in them, do you think the joins would
be more efficient?

Anyone know of a good source on this subject?


Thanks for any insight,

Phil

Author
12 Aug 2005 4:36 AM
Uri Dimant
Hi
The point is  does an optimizer available to use any indexes defined on the
tables , doesn't it?
Personally, I'd prefer using JOIN's with an index on FK columns .

http://www.sql-server-performance.com/tuning_joins.asp
http://www.sql-server-performance.com/query_execution_plan_analysis.asp
http://www.sql-server-performance.com/transact_sql_where.asp



<psandle***@hotmail.com> wrote in message
Show quote
news:1123821016.890254.39970@z14g2000cwz.googlegroups.com...
> Hello all,
>
> I was writing a demo case for instructional purposes on the virtues of
> using joins instead of subqueries, and I was planning on using test
> cases like these (using Northwind):
>
> Code:
> SELECT
>    o.orderid,
>    o.orderdate,
>    (SELECT companyname FROM shippers s WHERE s.shipperid = o.shipvia)
> companyname,
>    (SELECT lastname + ', ' + firstname FROM employees e WHERE
> e.employeeid = o.employeeid) FullName
> FROM
>    orders o
> WHERE
>    o.orderdate between '1-1-1996' and '12-31-1996'
> ORDER BY
>    o.orderdate
>
>
>
> Code:
> SELECT
>    o.orderid,
>    o.orderdate,
>    s.companyname,
>    e.lastname + ', ' + e.firstname FullName
> FROM
>    orders o
>    INNER JOIN shippers s ON s.shipperid = o.shipvia
>    INNER JOIN employees e ON e.employeeid = o.employeeid
> WHERE
>    o.orderdate between '1-1-1996' and '12-31-1996'
> ORDER BY
>    o.orderdate
>
>
> The first uses subqueries, the second uses joins.  (Please ignore for
> the moment that I would probably never return first + last name in this
> manner, or that this query is inane to begin with.)
>
> The point was going to be: don't use a subquery when you can use a
> join, as it's less efficient.
>
> If you run these queries as-is, the execution plan shows a (slightly)
> better "cost" efficiency for the one that uses joins.
>
> However, if you remove the WHERE and ORDER BY clauses, the subquery has
> a (significantly!) better cost efficiency (running them together in QA
> and then checking the combined execution plan is the easiest way to get
> an idea of relative costs, if you want to try this yourself).
>
> Anyone know why the subqueries would be more efficient in this case?
>
> If the tables had 50,000 records in them, do you think the joins would
> be more efficient?
>
> Anyone know of a good source on this subject?
>
>
> Thanks for any insight,
>
> Phil
>
Author
12 Aug 2005 6:02 AM
Razvan Socol
Phil wrote:
> If you run these queries as-is, the execution plan shows a (slightly)
> better "cost" efficiency for the one that uses joins.

Not in my Northwind: the first query has a cost of 0.0952 and the
second query has a cost of 0.101 (I'm using SQL Server 2000 SP4). The
first query seems to have a better cost because it uses a "Hash
Match/Cache" to minimize the number of "Clustered Index Seek"-s in the
Employees table.

Anyway, I think the difference in performance is too small to be
significant in this case. I think that joins are more appropriate (for
this query), because they seem easier to read (at least, for me). The
optimizer may perform better in later versions of SQL Server, i.e. it
may choose the same (best) plan for both queries.

Razvan
Author
14 Aug 2005 2:38 AM
Brian Selzer
The queries below are not just subqueries, they are correlated subqueries.
There is a big difference.  Sometimes I will use a subquery in the FROM
clause--usually when I need to aggregate interim results before joining.

The optimizer can most of the time decompose a correlated subquery into a
join, so most of the time a similar execution plan will be generated whether
you use a join or a correlated subquery.  In my experience the optimizer
seems to have the most trouble when one or more self-joins are involved in a
query, and many times in those cases a cursor will actually perform better
because you can eliminate one or more iteration steps from the execution
plan.

The questions posed below have no precise answer.  The execution plan that
the optimizer chooses depends on a number of different factors: the
cardinality of each interim result, the availability of indexes and whether
or not they're clustered or covering, the existence of statistics and how
current they are, etc.

I disagree with your point.  The optimizer can most of the time decompose a
correlated subquery into a join, so most of the time a similar execution
plan will be generated whether you use a join or a correlated subquery.  In
my experience the optimizer seems to have the most trouble when one or more
self-joins are involved in a query, and many times in those cases a cursor
will actually perform better because you can eliminate one or more iteration
steps from the execution plan.

Query optimization is a really difficult subject to cover in a newsgroup
post.  There is some information in BOL, and there are a number of books on
the subject, but I find that the best way to optimize queries is on a
case-by-case basis by using Query Analyzer and Profiler.

<psandle***@hotmail.com> wrote in message
Show quote
news:1123821016.890254.39970@z14g2000cwz.googlegroups.com...
> Hello all,
>
> I was writing a demo case for instructional purposes on the virtues of
> using joins instead of subqueries, and I was planning on using test
> cases like these (using Northwind):
>
> Code:
> SELECT
>     o.orderid,
>     o.orderdate,
>     (SELECT companyname FROM shippers s WHERE s.shipperid = o.shipvia)
> companyname,
>     (SELECT lastname + ', ' + firstname FROM employees e WHERE
> e.employeeid = o.employeeid) FullName
> FROM
>     orders o
> WHERE
>     o.orderdate between '1-1-1996' and '12-31-1996'
> ORDER BY
>     o.orderdate
>
>
>
> Code:
> SELECT
>     o.orderid,
>     o.orderdate,
>     s.companyname,
>     e.lastname + ', ' + e.firstname FullName
> FROM
>     orders o
>     INNER JOIN shippers s ON s.shipperid = o.shipvia
>     INNER JOIN employees e ON e.employeeid = o.employeeid
> WHERE
>     o.orderdate between '1-1-1996' and '12-31-1996'
> ORDER BY
>     o.orderdate
>
>
> The first uses subqueries, the second uses joins.  (Please ignore for
> the moment that I would probably never return first + last name in this
> manner, or that this query is inane to begin with.)
>
> The point was going to be: don't use a subquery when you can use a
> join, as it's less efficient.
>
> If you run these queries as-is, the execution plan shows a (slightly)
> better "cost" efficiency for the one that uses joins.
>
> However, if you remove the WHERE and ORDER BY clauses, the subquery has
> a (significantly!) better cost efficiency (running them together in QA
> and then checking the combined execution plan is the easiest way to get
> an idea of relative costs, if you want to try this yourself).
>
> Anyone know why the subqueries would be more efficient in this case?
>
> If the tables had 50,000 records in them, do you think the joins would
> be more efficient?
>
> Anyone know of a good source on this subject?
>
>
> Thanks for any insight,
>
> Phil
>

AddThis Social Bookmark Button