Home All Groups Group Topic Archive Search About

Mode and Median functions in SQL 2000

Author
27 May 2005 3:28 PM
Krishnaprasad Paralikar
I would like to know whether statistical functions like Mode and Median are
available for use in SQL 2000?
If Yes, what's the syntax?
If No, what do I do now?

Author
27 May 2005 3:40 PM
Thomas Coleman
There are not available as native functions but can be derived.


Thomas


Show quoteHide quote
"Krishnaprasad Paralikar" <Krishnaprasad Parali***@discussions.microsoft.com>
wrote in message news:52D711E0-7FC6-4BF2-9D82-C0AB3CB7E338@microsoft.com...
>I would like to know whether statistical functions like Mode and Median are
> available for use in SQL 2000?
> If Yes, what's the syntax?
> If No, what do I do now?
>
Are all your drivers up to date? click for free checkup

Author
27 May 2005 4:06 PM
Alejandro Mesa
Google for "celko" + "median".


AMB

Show quoteHide quote
"Krishnaprasad Paralikar" wrote:

> I would like to know whether statistical functions like Mode and Median are
> available for use in SQL 2000?
> If Yes, what's the syntax?
> If No, what do I do now?
>
Author
27 May 2005 10:39 PM
Brad White
"Krishnaprasad Paralikar" <Krishnaprasad
Parali***@discussions.microsoft.com> wrote in message
news:52D711E0-7FC6-4BF2-9D82-C0AB3CB7E338@microsoft.com...
>I would like to know whether statistical functions like Mode and Median are
> available for use in SQL 2000?
> If Yes, what's the syntax?
Don't think so.

> If No, what do I do now?
>
SQL programmers have long viewed the problem of finding a "loop-free"
SQL solution to finding a median problem as a big challenge.

Assume a table 'data' with one field 'value.'
Assume no nulls or duplicates.

Then the query is:
Select x.value from data x, data y
group by x.value
having SUM(D(y.value <=x.value)) = ((COUNT(*)+1)/2)

where D(A<=B) is a Boolean characteristic function which returns 1
for values A <= B and 0 for all other values.

Now consider the expression SUM(D(y.value <=x.value))
For every x.value, this expression counts how many y.values are less
than or equal to that value.  The having clause then chooses the x.value
where this count is equal to half the number of elements.  We depend here
on the division of two integers returning a truncated integer result, which
SQL Server conveniently does.  This solution supports the statistical
definition of median where the median of a set must be one of the members
of the set.  Given an even number of values, it will choose the smaller of
the middle two.  You could also write it to support the financial definition
of median and for the case of an even number of values return the median
as the average of the middle two numbers.

D(A<=B) is defined as (sign(1-sign(A-B)))
where sign is a function that returns -1 for values less than 0, 0, or 1
for values greater than 0.
Inspection will  show that this does in fact return 1 for values A <= B
and 0 for all other values.

So the full SQL is

Select x.value from data x, data y
group by x.value
having SUM((sign(1-sign(A-B)))) = ((COUNT(*)+1)/2)

Credit to David Rozenshtein and Anatoly Abramovich for developing
the technique of characteristic functions, published in 1992.

I assume by mode you mean the most common value.
I'm not seeing right off a way to do modes in a single select, but if you
really need it I could probably come up with a way to do it using
a subselect.
--
HTH,
Brad.
Author
28 May 2005 1:22 AM
--CELKO--
You might to look at the chapter on Medians in SQL FOR SMARTIES that
gives several versions of this function.

>> I'm not seeing right off a way to do modes in a single select, but if you really need it I could probably come up with a way to do it using a subselect. <<

Try a derived table:

WITH X(fleeb, tally)
   AS SELECT CT COUNT(*)
                 FROM Foobar
                GROUP BY Fleeb)
SELECT Fleeb
  FROM X
GROUP BY fleeb
HAVING tally = (SELECT MAX(tally) FROM X);
Author
28 May 2005 12:16 AM
Steve Kass
Here is an old newsgroup thread with several interesting solutions
for calculating the median:

http://groups.google.co.uk/group/microsoft.public.sqlserver.programming/browse_frm/thread/47895eda324d6b86/6b7214069c8b989e

Here are two solutions for finding the mode, one proprietary and
one standard:

select top 1 with ties Quantity, count(*) as Frequency
from Northwind..[Order Details]
group by Quantity
order by count(*) desc

select Quantity, count(*) as Frequency
from Northwind..[Order Details]
group by Quantity
having count(*) = (
  select max(ct) from (
    select count(*) as ct
    from Northwind..[Order Details]
    group by Quantity
  ) T
)

There are other solutions, but they don't generate good query plans
in SQL Server 2000:

select Quantity, count(*) as Frequency
from Northwind..[Order Details] as D1
group by Quantity
having not exists (
  select * from Northwind..[Order Details] as D2
  group by D2.Quantity
  having count(D2.Quantity) > count(D1.Quantity)
)

-- Steve Kass
-- Drew University

Krishnaprasad Paralikar wrote:

Show quoteHide quote
>I would like to know whether statistical functions like Mode and Median are
>available for use in SQL 2000?
>If Yes, what's the syntax?
>If No, what do I do now?
>

>

Bookmark and Share