Home All Groups Group Topic Archive Search About
Author
16 Jul 2005 7:07 AM
Arpan
Should a column that is being used in the HAVING clause always be
referenced in the GROUP BY clause even if that column is contained in
an aggregate function in the SELECT list (as  shownin the second
example below) unless the column being used in the HAVING clause is
contained in an aggregate function (as shown in the third example
below)?

For e.g. consider the following query:

----------------------------------------------
SELECT PID,Product,SUM(Price) AS TotalPrice
FROM Customers
GROUP BY PID
HAVING Price>100
ORDER BY PID
----------------------------------------------

The above query, when executed, generates an error saying

----------------------------------------------
Column 'Customers.Price' is invalid in the HAVING clause because it is
not contained in either an aggregate function or the GROUP BY clause.
----------------------------------------------

The error says that Price isn't contained in an aggregate function but
in the SELECT list, Price has been contained in the aggregate function
SUM! So why the error?

If the GROUP BY clause in the above query is slightly modified to this
(with the rest remaining as it is):

----------------------------------------------
SELECT PID,Product,SUM(Price) AS TotalPrice
FROM Customers
GROUP BY PID,Price  /*Price added to the GROUP BY clause*/
HAVING Price>100
ORDER BY PID
----------------------------------------------

then it works fine!

If the very first query is again modified slightly by including the
aggregate function SUM in the HAVING clause:

----------------------------------------------
SELECT PID,Product,SUM(Price) AS TotalPrice
FROM Customers
GROUP BY PID     /*No Price column in the GROUP BY clause; only PID*/
HAVING SUM(Price)>100  /*Using SUM in the HAVING clause*/
ORDER BY PID
----------------------------------------------

this also works fine (of course, the resultset won't be the same as the
resultset of the second query)!

Confusions galore!!!

Thanks,

Arpan

Author
16 Jul 2005 11:24 AM
David Portas
Logically HAVING is performed AFTER the GROUP BY so it must consist only of
expressions that can be constructed from the grouped columns or from
aggregates. You didn't explain what your requirement is but both of the
following are legal:

/* Total price for the PID is greater than 100 */
SELECT pid, SUM(price)
FROM Customers
GROUP BY pid
HAVING SUM(price)>100
ORDER BY pid

/* Total price for all PIDs where the price is greater than 100 */
SELECT pid, SUM(price)
FROM Customers
WHERE price >100
GROUP BY pid
ORDER BY pid

Your query:

SELECT PID,Product,SUM(Price) AS TotalPrice
FROM Customers
GROUP BY PID
HAVING Price>100
ORDER BY PID

isn't legal because there is no Price in the aggregated result (only
SUM(price)) and since multiple prices may theoretically apply to a single
PID there is no logical way that Price can be applied as a selection
condition. Also, you used Product in the SELECT list and not in the GROUP BY
list. For a similar reason this also doesn't make sense and will cause an
error.

--
David Portas
SQL Server MVP
--
Author
16 Jul 2005 12:17 PM
ML
Instead of testing for 'price > 100' in the having clause, test for it in the
where clause.


ML
Author
17 Jul 2005 1:36 AM
--CELKO--
Here is how a SELECT works in SQL ... at least in theory.  Real
products will optimize things, but the code has to produce the same
results.

a) Start in the FROM clause and build a working table from all of the
joins, unions, intersections, and whatever other table constructors are
there.  The <table expression> AS <correlation name> option allows you
give a name to this working table which you then have to use for the
rest of the containing query.

b) Go to the WHERE clause and remove rows that do not pass criteria;
that is, that do not test to TRUE (i.e. reject UNKNOWN and FALSE).  The
WHERE clause is applied to the working set in the FROM clause.

c) Go to the optional GROUP BY clause, make groups and reduce each
group to a single row, replacing the original working table with the
new grouped table. The rows of a grouped table must be group
characteristics: (1) a grouping column (2) a statistic about the group
(i.e. aggregate functions) (3) a function or (4) an expression made up
those three items.

d) Go to the optional HAVING clause and apply it against the grouped
working table; if there was no GROUP BY clause, treat the entire table
as one group.

e) Go to the SELECT clause and construct the expressions in the list.
This means that the scalar subqueries, function calls and expressions
in the SELECT are done after all the other clauses are done.  The
"AS" operator can also give names to expressions in the SELECT
list.  These new names come into existence all at once, but after the
WHERE clause, GROUP BY clause and HAVING clause has been executed; you
cannot use them in the SELECT list or the WHERE clause for that reason.


If there is a SELECT DISTINCT, then redundant duplicate rows are
removed.  For purposes of defining a duplicate row, NULLs are treated
as matching (just like in the GROUP BY).

f) Nested query expressions follow the usual scoping rules you would
expect from a block structured language like C, Pascal, Algol, etc.
Namely, the innermost queries can reference columns and tables in the
queries in which they are contained.

g) The ORDER BY clause is part of a cursor, not a query. The result
set is passed to the cursor, which can only see the names in the SELECT
clause list, and the sorting is done there.  The ORDER BY clause cannot
have expression in it, or references to other columns because the
result set has been converted into a sequential file structure and that
is what is being sorted.

As you can see, things happen "all at once" in SQL, not "from left to
right" as they would in a sequential file/procedural language model. In
those languages, these two statements produce different results:
  READ (a, b, c) FROM File_X;
  READ (c, a, b) FROM File_X;

while these two statements return the same data:

SELECT a, b, c FROM Table_X;
SELECT c, a, b FROM Table_X;

Think about what a confused mess this statement is in the SQL model.

SELECT f(c2) AS c1, f(c1) AS c2 FROM Foobar;

That is why such nonsense is illegal syntax.

AddThis Social Bookmark Button