|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Mode and Median functions in SQL 2000I would like to know whether statistical functions like Mode and Median are
available for use in SQL 2000? If Yes, what's the syntax? If No, what do I do now? There are not available as native functions but can be derived.
Thomas Show quoteHide quote "Krishnaprasad Paralikar" <Krishnaprasad Parali***@discussions.microsoft.com> wrote in message news:52D711E0-7FC6-4BF2-9D82-C0AB3CB7E338@microsoft.com... >I would like to know whether statistical functions like Mode and Median are > available for use in SQL 2000? > If Yes, what's the syntax? > If No, what do I do now? > Google for "celko" + "median".
AMB Show quoteHide quote "Krishnaprasad Paralikar" wrote: > I would like to know whether statistical functions like Mode and Median are > available for use in SQL 2000? > If Yes, what's the syntax? > If No, what do I do now? > "Krishnaprasad Paralikar" <Krishnaprasad
Parali***@discussions.microsoft.com> wrote in message news:52D711E0-7FC6-4BF2-9D82-C0AB3CB7E338@microsoft.com... Don't think so.>I would like to know whether statistical functions like Mode and Median are > available for use in SQL 2000? > If Yes, what's the syntax? > If No, what do I do now? SQL programmers have long viewed the problem of finding a "loop-free"> SQL solution to finding a median problem as a big challenge. Assume a table 'data' with one field 'value.' Assume no nulls or duplicates. Then the query is: Select x.value from data x, data y group by x.value having SUM(D(y.value <=x.value)) = ((COUNT(*)+1)/2) where D(A<=B) is a Boolean characteristic function which returns 1 for values A <= B and 0 for all other values. Now consider the expression SUM(D(y.value <=x.value)) For every x.value, this expression counts how many y.values are less than or equal to that value. The having clause then chooses the x.value where this count is equal to half the number of elements. We depend here on the division of two integers returning a truncated integer result, which SQL Server conveniently does. This solution supports the statistical definition of median where the median of a set must be one of the members of the set. Given an even number of values, it will choose the smaller of the middle two. You could also write it to support the financial definition of median and for the case of an even number of values return the median as the average of the middle two numbers. D(A<=B) is defined as (sign(1-sign(A-B))) where sign is a function that returns -1 for values less than 0, 0, or 1 for values greater than 0. Inspection will show that this does in fact return 1 for values A <= B and 0 for all other values. So the full SQL is Select x.value from data x, data y group by x.value having SUM((sign(1-sign(A-B)))) = ((COUNT(*)+1)/2) Credit to David Rozenshtein and Anatoly Abramovich for developing the technique of characteristic functions, published in 1992. I assume by mode you mean the most common value. I'm not seeing right off a way to do modes in a single select, but if you really need it I could probably come up with a way to do it using a subselect. -- HTH, Brad. You might to look at the chapter on Medians in SQL FOR SMARTIES that
gives several versions of this function. >> I'm not seeing right off a way to do modes in a single select, but if you really need it I could probably come up with a way to do it using a subselect. << Try a derived table:WITH X(fleeb, tally) AS SELECT CT COUNT(*) FROM Foobar GROUP BY Fleeb) SELECT Fleeb FROM X GROUP BY fleeb HAVING tally = (SELECT MAX(tally) FROM X); Here is an old newsgroup thread with several interesting solutions
for calculating the median: http://groups.google.co.uk/group/microsoft.public.sqlserver.programming/browse_frm/thread/47895eda324d6b86/6b7214069c8b989e Here are two solutions for finding the mode, one proprietary and one standard: select top 1 with ties Quantity, count(*) as Frequency from Northwind..[Order Details] group by Quantity order by count(*) desc select Quantity, count(*) as Frequency from Northwind..[Order Details] group by Quantity having count(*) = ( select max(ct) from ( select count(*) as ct from Northwind..[Order Details] group by Quantity ) T ) There are other solutions, but they don't generate good query plans in SQL Server 2000: select Quantity, count(*) as Frequency from Northwind..[Order Details] as D1 group by Quantity having not exists ( select * from Northwind..[Order Details] as D2 group by D2.Quantity having count(D2.Quantity) > count(D1.Quantity) ) -- Steve Kass -- Drew University Krishnaprasad Paralikar wrote: Show quoteHide quote >I would like to know whether statistical functions like Mode and Median are >available for use in SQL 2000? >If Yes, what's the syntax? >If No, what do I do now? > > >
Other interesting topics
|
|||||||||||||||||||||||