|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Wrong Number of Rows with Correlated Subquery + Group Bywe ended up with a query like the one below. The query as written returns one row for each row in the temp table (7 rows total). We just want one row for each category. If you uncomment the "distinct", you'll get two rows as desired. But if you instead uncomment the "group by t.category", you get 14 rows (# rows * number categories). I want to know WHY this happens. It doesn't make any sense to me. The group by and distinct should act exactly the same in this case. Is this a bug, or is there something about subqueries that I don't understand? create table #temp (category int, i int) insert #temp select 1, 1 insert #temp select 1, 3 insert #temp select 1, 5 insert #temp select 1, 5 insert #temp select 2, 1 insert #temp select 2, 3 insert #temp select 2, 3 select /*distinct*/ t.category, ((select max(x1.i) from (select top 50 percent i from #temp t1 where t1.category = t.category order by t1.i ) x1) + (select min(x2.i) from (select top 50 percent i from #temp t2 where t2.category = t.category order by t2.i desc ) x2))/2.0 median from #temp t /*group by t.category*/ order by category drop table #temp Try this instead. (If you could change the datatype for the column 'i' to decimal, you would have to CAST in the query.)
NOTE: This provides the 'Financial' median -NOT the true or statistical median. DECLARE @MyTable table ( Category int , i int ) INSERT INTO @MyTable VALUES ( 1, 1 ) INSERT INTO @MyTable VALUES ( 1, 3 ) INSERT INTO @MyTable VALUES ( 1, 5 ) INSERT INTO @MyTable VALUES ( 1, 5 ) INSERT INTO @MyTable VALUES ( 2, 1 ) INSERT INTO @MyTable VALUES ( 2, 3 ) INSERT INTO @MyTable VALUES ( 2, 3 ) SELECT Category , avg( cast( i AS decimal )) AS 'Average' , ceiling( sum( cast(i as decimal) ) / count(1)) AS 'Median' FROM @MyTable GROUP BY Category I included the average for verification purposes. -- Show quoteHide quoteArnie Rowland* "To be successful, your heart must accompany your knowledge." <mbrie***@gmail.com> wrote in message news:1152298174.409940.159850@m79g2000cwm.googlegroups.com... > While working on calculating medians for a table (in SQL server 2000), > we ended up with a query like the one below. The query as written > returns one row for each row in the temp table (7 rows total). We just > want one row for each category. If you uncomment the "distinct", > you'll get two rows as desired. But if you instead uncomment the > "group by t.category", you get 14 rows (# rows * number categories). I > want to know WHY this happens. It doesn't make any sense to me. The > group by and distinct should act exactly the same in this case. Is > this a bug, or is there something about subqueries that I don't > understand? > > create table #temp (category int, i int) > insert #temp select 1, 1 > insert #temp select 1, 3 > insert #temp select 1, 5 > insert #temp select 1, 5 > insert #temp select 2, 1 > insert #temp select 2, 3 > insert #temp select 2, 3 > select /*distinct*/ t.category, > ((select max(x1.i) > from (select top 50 percent i > from #temp t1 > where t1.category = t.category > order by t1.i > ) x1) + > (select min(x2.i) > from (select top 50 percent i > from #temp t2 > where t2.category = t.category > order by t2.i desc > ) x2))/2.0 median > from #temp t > /*group by t.category*/ > order by category > drop table #temp > Arnie Rowland wrote:
> Try this instead. (If you could change the datatype for the column 'i' to decimal, you would have to CAST in the query.) Thanks for your suggestion, but I'm not interested in another way ofdoing this calculation. I know there are many ways to work around this issue, and many ways to calculate the median. What I really want to know is WHY it acts like it does, with too many rows returned when using a group by. Sorry, that wasn't complete. I was trying to do something with this, but it still doesn't work ... -Bummer. (Well, it worked with the limited dataset provided. But once I began expanding the data, I saw that it's just not so simple...-Friday afternoon, what can I say...)
SELECT Category , avg( cast( i AS decimal )) AS 'Average' , CASE ( count(1) % 2 ) WHEN 0 THEN ceiling( sum( cast(i AS decimal)) / count(1)) ELSE floor( sum( cast(i AS decimal)) / count(1)) END AS 'Median' FROM @MyTable GROUP BY Category -- Arnie Rowland* "To be successful, your heart must accompany your knowledge." "Arnie Rowland" <ar***@1568.com> wrote in message news:OPtaZKgoGHA.2268@TK2MSFTNGP04.phx.gbl... Try this instead. (If you could change the datatype for the column 'i' to decimal, you would have to CAST in the query.) NOTE: This provides the 'Financial' median -NOT the true or statistical median. DECLARE @MyTable table ( Category int , i int ) INSERT INTO @MyTable VALUES ( 1, 1 ) INSERT INTO @MyTable VALUES ( 1, 3 ) INSERT INTO @MyTable VALUES ( 1, 5 ) INSERT INTO @MyTable VALUES ( 1, 5 ) INSERT INTO @MyTable VALUES ( 2, 1 ) INSERT INTO @MyTable VALUES ( 2, 3 ) INSERT INTO @MyTable VALUES ( 2, 3 ) SELECT Category , avg( cast( i AS decimal )) AS 'Average' , ceiling( sum( cast(i as decimal) ) / count(1)) AS 'Median' FROM @MyTable GROUP BY Category I included the average for verification purposes. -- Arnie Rowland* "To be successful, your heart must accompany your knowledge." Show quoteHide quote <mbrie***@gmail.com> wrote in message news:1152298174.409940.159850@m79g2000cwm.googlegroups.com... > While working on calculating medians for a table (in SQL server 2000), > we ended up with a query like the one below. The query as written > returns one row for each row in the temp table (7 rows total). We just > want one row for each category. If you uncomment the "distinct", > you'll get two rows as desired. But if you instead uncomment the > "group by t.category", you get 14 rows (# rows * number categories). I > want to know WHY this happens. It doesn't make any sense to me. The > group by and distinct should act exactly the same in this case. Is > this a bug, or is there something about subqueries that I don't > understand? > > create table #temp (category int, i int) > insert #temp select 1, 1 > insert #temp select 1, 3 > insert #temp select 1, 5 > insert #temp select 1, 5 > insert #temp select 2, 1 > insert #temp select 2, 3 > insert #temp select 2, 3 > select /*distinct*/ t.category, > ((select max(x1.i) > from (select top 50 percent i > from #temp t1 > where t1.category = t.category > order by t1.i > ) x1) + > (select min(x2.i) > from (select top 50 percent i > from #temp t2 > where t2.category = t.category > order by t2.i desc > ) x2))/2.0 median > from #temp t > /*group by t.category*/ > order by category > drop table #temp > >> I want to know WHY this happens. It doesn't make any sense to me. The This is a bug and is fixed in SQL 2005.>> group by and distinct should act exactly the same in this case. Is this >> a bug, or is there something about subqueries that I don't understand? In SQL 2000 re-writing is your best option, since the query formulation is error-prone and the order of evaluation is potentially a bug. Just compare the execution plans and it is easy to find out the extra branch in the execution plan introduced by the GROUP BY that treats the correlation as an entirely separate subset. SELECT t.category, ( SELECT MAX( x1.i ) FROM ( SELECT i FROM #temp t1 WHERE t1.category = t.category ) x1 ) FROM #temp t; SELECT t.category, ( SELECT MAX( x1.i ) FROM ( SELECT i FROM #temp t1 WHERE t1.category = t.category ) x1 ) FROM #temp t GROUP BY t.category; Even if it works, it is not easy to give a logical explanation for the behaviour since you have no keys and you use ordered subsets in derived tables -- this is essentially similar to "faking" ORDER BY clause in view with a TOP clause in the SELECT. Also, in SQL 2005, you can simply use the ROW_NUMBER() function to get this done more efficiently. -- Anith Anith Sen wrote:
> >> I want to know WHY this happens. It doesn't make any sense to me. The Thanks very much for verifying that this is a bug. It's not important,> >> group by and distinct should act exactly the same in this case. Is this > >> a bug, or is there something about subqueries that I don't understand? > > This is a bug and is fixed in SQL 2005. > but do you by any chance have a reference to somewhere on the microsoft site where this might be mentioned?
HOWTO: Mimic a constraint from a trigger?
Select data between braces [] Deadlock @@rowcount without displaying select Get Max Value and Second Max Value DDL and DML in same stored procedure Efficient update of column based on earlier records. Microsoft stored procedure aspnet_Users_DeleteUser questions Table function question performance: table value UDF vs view |
|||||||||||||||||||||||