Home All Groups Group Topic Archive Search About

Grouping in Derived Tables

Author
13 Jul 2006 6:21 PM
IraG
Suppose you have a query involving an aggregate on a derived table such
as:

SELECT C.CustomerID, C.CustomerName,  C.CustomerType, C.Address1,
C.City, C.State, S.TotalSales
FROM  Customers C
INNER JOIN
  (SELECT
     CustomerID, SUM(SalesAmount) as TotalSales
   FROM Sales
   GROUP BY  CustomerID) S ON C.CustomerID = S.CustomerID
WHERE C.CustomerID = 12345

Will the SUM(SalesAmount) aggregation in the derived table be performed
for *all* rows in in the Sales table, just as if the derived table is a
standalone query?  Or will the summation be limited to the single
CustomerID specified in the WHERE of the overall query?

If the the SUM is limited to just CustomerID 12345, is this at the
mercy of the optimizer, or is there a documented guarantee that this
will happen?

If the aggregation in the derived table is limited to the single
CustomerID, the query would seem to be fairly efficient.  But not if
all rows in the Sales table need to be processed and aggregated.

Author
13 Jul 2006 6:38 PM
David Browne
Show quote
"IraG" <ij***@clas.ucsb.edu> wrote in message
news:1152814865.959236.296640@m73g2000cwd.googlegroups.com...
> Suppose you have a query involving an aggregate on a derived table such
> as:
>
> SELECT C.CustomerID, C.CustomerName,  C.CustomerType, C.Address1,
> C.City, C.State, S.TotalSales
> FROM  Customers C
> INNER JOIN
>  (SELECT
>     CustomerID, SUM(SalesAmount) as TotalSales
>   FROM Sales
>   GROUP BY  CustomerID) S ON C.CustomerID = S.CustomerID
> WHERE C.CustomerID = 12345
>
> Will the SUM(SalesAmount) aggregation in the derived table be performed
> for *all* rows in in the Sales table, just as if the derived table is a
> standalone query?  Or will the summation be limited to the single
> CustomerID specified in the WHERE of the overall query?
>
> If the the SUM is limited to just CustomerID 12345, is this at the
> mercy of the optimizer, or is there a documented guarantee that this
> will happen?
>
> If the aggregation in the derived table is limited to the single
> CustomerID, the query would seem to be fairly efficient.  But not if
> all rows in the Sales table need to be processed and aggregated.
>

While there's no guarantee about what the optimizer will do, I've done this
alot (usually with views).  So long as teh GROUP BY columns appear in the
SELECT list of the view or derived table, I've never had a problem.

David
Author
13 Jul 2006 6:39 PM
Alejandro Mesa
IraG,

All of them. You can include the filter also in the derived table.

....
   (SELECT
      CustomerID, SUM(SalesAmount) as TotalSales
    FROM Sales
    where CustomerID = 12345
    GROUP BY  CustomerID) S
....

or join both tables and group by.

SELECT
   C.CustomerID, C.CustomerName,  C.CustomerType, C.Address1,
   C.City, C.State, SUM(SalesAmount) as TotalSales
FROM
   Customers C
   INNER JOIN
   Sales as S
   ON C.CustomerID = S.CustomerID
WHERE
   C.CustomerID = 12345
   and S.CustomerID = 12345
GROUP BY
   C.CustomerID, C.CustomerName,  C.CustomerType, C.Address1,
   C.City, C.State
go


AMB

Show quote
"IraG" wrote:

> Suppose you have a query involving an aggregate on a derived table such
> as:
>
> SELECT C.CustomerID, C.CustomerName,  C.CustomerType, C.Address1,
> C.City, C.State, S.TotalSales
> FROM  Customers C
> INNER JOIN
>   (SELECT
>      CustomerID, SUM(SalesAmount) as TotalSales
>    FROM Sales
>    GROUP BY  CustomerID) S ON C.CustomerID = S.CustomerID
> WHERE C.CustomerID = 12345
>
> Will the SUM(SalesAmount) aggregation in the derived table be performed
> for *all* rows in in the Sales table, just as if the derived table is a
> standalone query?  Or will the summation be limited to the single
> CustomerID specified in the WHERE of the overall query?
>
> If the the SUM is limited to just CustomerID 12345, is this at the
> mercy of the optimizer, or is there a documented guarantee that this
> will happen?
>
> If the aggregation in the derived table is limited to the single
> CustomerID, the query would seem to be fairly efficient.  But not if
> all rows in the Sales table need to be processed and aggregated.
>
>
Author
13 Jul 2006 6:49 PM
Alexander Kuznetsov
Ira,

Good point. That is a well known weakness of SQL Server optimizer - it
is quite likely that it will aggregate all the data in the table, then
apply the "C.CustomerID = 12345".
Even if the predicate is highly selective, the optimizer may fail to
push it down the subquery. Even if you have seen otherwise, do not make
the assumption that the optimizer will always push down a highly
selective predicate. You want it done, do it yourself...
Author
13 Jul 2006 7:32 PM
David Browne
"Alexander Kuznetsov" <AK_TIREDOFSPAM@hotmail.COM> wrote in message
news:1152816542.759726.270770@75g2000cwc.googlegroups.com...
> Ira,
>
> Good point. That is a well known weakness of SQL Server optimizer
> . . .

Well known by who?  Any refernces?  Any examples?

I've been relying on this behavior since SQL 6.5.

A quick test on 2005 shows the critera pushed down to the inner table:

SELECT C.CustomerID, C.AccountNumber, C.CustomerType, S.TotalSales
FROM  Sales.Customer C
INNER JOIN
  (SELECT
     CustomerID, SUM(TotalDue) as TotalSales
   FROM Sales.SalesOrderHeader H
   GROUP BY  CustomerID) S ON C.CustomerID = S.CustomerID
WHERE C.CustomerID = 12345

David
Author
13 Jul 2006 7:55 PM
Tracy McKibben
Alexander Kuznetsov wrote:
> Ira,
>
> Good point. That is a well known weakness of SQL Server optimizer - it
> is quite likely that it will aggregate all the data in the table, then
> apply the "C.CustomerID = 12345".
> Even if the predicate is highly selective, the optimizer may fail to
> push it down the subquery. Even if you have seen otherwise, do not make
> the assumption that the optimizer will always push down a highly
> selective predicate. You want it done, do it yourself...
>

Ummm, no, I wrote a query just today that did something similar,
unioning together two similar recordsets, then applying a WHERE clause
to the result, filtering out a specific ID.  The optimizer applied the
filter to each table in the union, not to the unioned result set.  Am I
misunderstanding what you wrote?



--
Tracy McKibben
MCDBA
http://www.realsqlguy.com
Author
13 Jul 2006 7:07 PM
Arnie Rowland
Add the WHERE criteria to the derived table sub-query as well.

--
Arnie Rowland*
"To be successful, your heart must accompany your knowledge."



Show quote
"IraG" <ij***@clas.ucsb.edu> wrote in message
news:1152814865.959236.296640@m73g2000cwd.googlegroups.com...
> Suppose you have a query involving an aggregate on a derived table such
> as:
>
> SELECT C.CustomerID, C.CustomerName,  C.CustomerType, C.Address1,
> C.City, C.State, S.TotalSales
> FROM  Customers C
> INNER JOIN
>  (SELECT
>     CustomerID, SUM(SalesAmount) as TotalSales
>   FROM Sales
>   GROUP BY  CustomerID) S ON C.CustomerID = S.CustomerID
> WHERE C.CustomerID = 12345
>
> Will the SUM(SalesAmount) aggregation in the derived table be performed
> for *all* rows in in the Sales table, just as if the derived table is a
> standalone query?  Or will the summation be limited to the single
> CustomerID specified in the WHERE of the overall query?
>
> If the the SUM is limited to just CustomerID 12345, is this at the
> mercy of the optimizer, or is there a documented guarantee that this
> will happen?
>
> If the aggregation in the derived table is limited to the single
> CustomerID, the query would seem to be fairly efficient.  But not if
> all rows in the Sales table need to be processed and aggregated.
>
Author
13 Jul 2006 7:28 PM
IraG
I should have been clearer in my original posting:

Clearly, yes, the WHERE should be applied to the derived table in the
simple scenario that I presented.

I was considering the more general situation, such as creating a view
that will be used as part of a larger query where the match might be on
a single CustomerID or on multiple CustomerID's.

I believe I read somewhere that the ANSI standard allows for correlated
derived tables, which would precisely take care of this kind of
situation.

Likewise, my understanding is that SQL Server 2005 allows you to pass a
column reference to a table-valued udf used in a join, which would also
be a solutions except for the fact that my shop is using SQL 2000.

Arnie Rowland wrote:
Show quote
> Add the WHERE criteria to the derived table sub-query as well.
>
> --
> Arnie Rowland*
> "To be successful, your heart must accompany your knowledge."
>

AddThis Social Bookmark Button