|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Grouping in Derived Tablesas: SELECT C.CustomerID, C.CustomerName, C.CustomerType, C.Address1, C.City, C.State, S.TotalSales FROM Customers C INNER JOIN (SELECT CustomerID, SUM(SalesAmount) as TotalSales FROM Sales GROUP BY CustomerID) S ON C.CustomerID = S.CustomerID WHERE C.CustomerID = 12345 Will the SUM(SalesAmount) aggregation in the derived table be performed for *all* rows in in the Sales table, just as if the derived table is a standalone query? Or will the summation be limited to the single CustomerID specified in the WHERE of the overall query? If the the SUM is limited to just CustomerID 12345, is this at the mercy of the optimizer, or is there a documented guarantee that this will happen? If the aggregation in the derived table is limited to the single CustomerID, the query would seem to be fairly efficient. But not if all rows in the Sales table need to be processed and aggregated.
Show quote
"IraG" <ij***@clas.ucsb.edu> wrote in message While there's no guarantee about what the optimizer will do, I've done this news:1152814865.959236.296640@m73g2000cwd.googlegroups.com... > Suppose you have a query involving an aggregate on a derived table such > as: > > SELECT C.CustomerID, C.CustomerName, C.CustomerType, C.Address1, > C.City, C.State, S.TotalSales > FROM Customers C > INNER JOIN > (SELECT > CustomerID, SUM(SalesAmount) as TotalSales > FROM Sales > GROUP BY CustomerID) S ON C.CustomerID = S.CustomerID > WHERE C.CustomerID = 12345 > > Will the SUM(SalesAmount) aggregation in the derived table be performed > for *all* rows in in the Sales table, just as if the derived table is a > standalone query? Or will the summation be limited to the single > CustomerID specified in the WHERE of the overall query? > > If the the SUM is limited to just CustomerID 12345, is this at the > mercy of the optimizer, or is there a documented guarantee that this > will happen? > > If the aggregation in the derived table is limited to the single > CustomerID, the query would seem to be fairly efficient. But not if > all rows in the Sales table need to be processed and aggregated. > alot (usually with views). So long as teh GROUP BY columns appear in the SELECT list of the view or derived table, I've never had a problem. David IraG,
All of them. You can include the filter also in the derived table. .... (SELECT CustomerID, SUM(SalesAmount) as TotalSales FROM Sales where CustomerID = 12345 GROUP BY CustomerID) S .... or join both tables and group by. SELECT C.CustomerID, C.CustomerName, C.CustomerType, C.Address1, C.City, C.State, SUM(SalesAmount) as TotalSales FROM Customers C INNER JOIN Sales as S ON C.CustomerID = S.CustomerID WHERE C.CustomerID = 12345 and S.CustomerID = 12345 GROUP BY C.CustomerID, C.CustomerName, C.CustomerType, C.Address1, C.City, C.State go AMB Show quote "IraG" wrote: > Suppose you have a query involving an aggregate on a derived table such > as: > > SELECT C.CustomerID, C.CustomerName, C.CustomerType, C.Address1, > C.City, C.State, S.TotalSales > FROM Customers C > INNER JOIN > (SELECT > CustomerID, SUM(SalesAmount) as TotalSales > FROM Sales > GROUP BY CustomerID) S ON C.CustomerID = S.CustomerID > WHERE C.CustomerID = 12345 > > Will the SUM(SalesAmount) aggregation in the derived table be performed > for *all* rows in in the Sales table, just as if the derived table is a > standalone query? Or will the summation be limited to the single > CustomerID specified in the WHERE of the overall query? > > If the the SUM is limited to just CustomerID 12345, is this at the > mercy of the optimizer, or is there a documented guarantee that this > will happen? > > If the aggregation in the derived table is limited to the single > CustomerID, the query would seem to be fairly efficient. But not if > all rows in the Sales table need to be processed and aggregated. > > Ira,
Good point. That is a well known weakness of SQL Server optimizer - it is quite likely that it will aggregate all the data in the table, then apply the "C.CustomerID = 12345". Even if the predicate is highly selective, the optimizer may fail to push it down the subquery. Even if you have seen otherwise, do not make the assumption that the optimizer will always push down a highly selective predicate. You want it done, do it yourself... "Alexander Kuznetsov" <AK_TIREDOFSPAM@hotmail.COM> wrote in message Well known by who? Any refernces? Any examples?news:1152816542.759726.270770@75g2000cwc.googlegroups.com... > Ira, > > Good point. That is a well known weakness of SQL Server optimizer > . . . I've been relying on this behavior since SQL 6.5. A quick test on 2005 shows the critera pushed down to the inner table: SELECT C.CustomerID, C.AccountNumber, C.CustomerType, S.TotalSales FROM Sales.Customer C INNER JOIN (SELECT CustomerID, SUM(TotalDue) as TotalSales FROM Sales.SalesOrderHeader H GROUP BY CustomerID) S ON C.CustomerID = S.CustomerID WHERE C.CustomerID = 12345 David Alexander Kuznetsov wrote:
> Ira, Ummm, no, I wrote a query just today that did something similar, > > Good point. That is a well known weakness of SQL Server optimizer - it > is quite likely that it will aggregate all the data in the table, then > apply the "C.CustomerID = 12345". > Even if the predicate is highly selective, the optimizer may fail to > push it down the subquery. Even if you have seen otherwise, do not make > the assumption that the optimizer will always push down a highly > selective predicate. You want it done, do it yourself... > unioning together two similar recordsets, then applying a WHERE clause to the result, filtering out a specific ID. The optimizer applied the filter to each table in the union, not to the unioned result set. Am I misunderstanding what you wrote? Add the WHERE criteria to the derived table sub-query as well.
-- Show quoteArnie Rowland* "To be successful, your heart must accompany your knowledge." "IraG" <ij***@clas.ucsb.edu> wrote in message news:1152814865.959236.296640@m73g2000cwd.googlegroups.com... > Suppose you have a query involving an aggregate on a derived table such > as: > > SELECT C.CustomerID, C.CustomerName, C.CustomerType, C.Address1, > C.City, C.State, S.TotalSales > FROM Customers C > INNER JOIN > (SELECT > CustomerID, SUM(SalesAmount) as TotalSales > FROM Sales > GROUP BY CustomerID) S ON C.CustomerID = S.CustomerID > WHERE C.CustomerID = 12345 > > Will the SUM(SalesAmount) aggregation in the derived table be performed > for *all* rows in in the Sales table, just as if the derived table is a > standalone query? Or will the summation be limited to the single > CustomerID specified in the WHERE of the overall query? > > If the the SUM is limited to just CustomerID 12345, is this at the > mercy of the optimizer, or is there a documented guarantee that this > will happen? > > If the aggregation in the derived table is limited to the single > CustomerID, the query would seem to be fairly efficient. But not if > all rows in the Sales table need to be processed and aggregated. > I should have been clearer in my original posting:
Clearly, yes, the WHERE should be applied to the derived table in the simple scenario that I presented. I was considering the more general situation, such as creating a view that will be used as part of a larger query where the match might be on a single CustomerID or on multiple CustomerID's. I believe I read somewhere that the ANSI standard allows for correlated derived tables, which would precisely take care of this kind of situation. Likewise, my understanding is that SQL Server 2005 allows you to pass a column reference to a table-valued udf used in a join, which would also be a solutions except for the fact that my shop is using SQL 2000. Arnie Rowland wrote: Show quote > Add the WHERE criteria to the derived table sub-query as well. > > -- > Arnie Rowland* > "To be successful, your heart must accompany your knowledge." > |
|||||||||||||||||||||||