Home All Groups Group Topic Archive Search About
Author
14 Sep 2006 8:04 PM
BK-Chicago
Hi

I would like help with the following query.
The table:
CREATE TABLE [dbo].[MyTable] (
[ID]  [char] (5) AS NOT NULL ,
[period_date]  [datetime] NOT NULL ,
[value] [numeric](18, 6) NULL
) ON [PRIMARY]


Sample Data:
            0004N   9/30/1989                 67.309
            0004N   12/31/1989               68.989
            0004N   3/31/1990                 68.126
            0004N   3/31/1990                 68.126
            0004N   6/30/1990                 66.361
            0004N   9/30/1990                76.689
            0005N  9/25/1989                 1.0
            0005N  12/25/1989               35.0
            0005N  3/25/1992                 25.0
Notes:
        There are no primary keys. ie., the period_date is not unique across
a ID
        There are about 20000 distinct ID's in the table, making this table
Gnormous.

I would like to find out the most efficient way to extract (ID, Value(1),
Value(2), Value(3), Value(4), Value(6))
     Group the table by ID's
     Gather distinct period_dates for each ID
     ORDER (sort-Descending)  by period_dates within each ID
     Select the top N(6 in this case) dates
     Transpose/Crosstab the data for display.

In essence the result for the above sample data will look like

ID  period_date(1) val(1)  period_date(2) Val(2) ...  .... period_date(6)  
val(6)
________________________________________________________________
0004N 9/30/1989   67.309 12/31/1989  68.989.......
0005N 9/25/1989   1         12/25/1989  35   3/25/1992  25  null null


Any help would be apprecaited.  Either a SQL or Stored.Proc would do.

ENV:
SQL Server 2000, Windows2000/XP, No Analysis services

Regards
B

Author
14 Sep 2006 8:38 PM
Arnie Rowland
Steve?

--
Arnie Rowland, Ph.D.
Westwood Consulting, Inc

Most good judgment comes from experience.
Most experience comes from bad judgment.
- Anonymous


Show quote
"BK-Chicago" <BKChic***@discussions.microsoft.com> wrote in message
news:5BB5C232-CAAA-49D4-841E-6AF2874B74D1@microsoft.com...
> Hi
>
> I would like help with the following query.
> The table:
> CREATE TABLE [dbo].[MyTable] (
> [ID]  [char] (5) AS NOT NULL ,
> [period_date]  [datetime] NOT NULL ,
> [value] [numeric](18, 6) NULL
> ) ON [PRIMARY]
>
>
> Sample Data:
>            0004N   9/30/1989                 67.309
>            0004N   12/31/1989               68.989
>            0004N   3/31/1990                 68.126
>            0004N   3/31/1990                 68.126
>            0004N   6/30/1990                 66.361
>            0004N   9/30/1990                76.689
>            0005N  9/25/1989                 1.0
>            0005N  12/25/1989               35.0
>            0005N  3/25/1992                 25.0
> Notes:
>        There are no primary keys. ie., the period_date is not unique
> across
> a ID
>        There are about 20000 distinct ID's in the table, making this table
> Gnormous.
>
> I would like to find out the most efficient way to extract (ID, Value(1),
> Value(2), Value(3), Value(4), Value(6))
>     Group the table by ID's
>     Gather distinct period_dates for each ID
>     ORDER (sort-Descending)  by period_dates within each ID
>     Select the top N(6 in this case) dates
>     Transpose/Crosstab the data for display.
>
> In essence the result for the above sample data will look like
>
> ID  period_date(1) val(1)  period_date(2) Val(2) ...  .... period_date(6)
> val(6)
> ________________________________________________________________
> 0004N 9/30/1989   67.309 12/31/1989  68.989.......
> 0005N 9/25/1989   1         12/25/1989  35   3/25/1992  25  null null
>
>
> Any help would be apprecaited.  Either a SQL or Stored.Proc would do.
>
> ENV:
> SQL Server 2000, Windows2000/XP, No Analysis services
>
> Regards
> B
Author
14 Sep 2006 9:53 PM
BK-Chicago
Is that a question? If you are wondering whether bk-Chicago is steve, i am
sorry to disappoint you.

Regards
Bala

Show quote
"Arnie Rowland" wrote:

> Steve?
>
> --
> Arnie Rowland, Ph.D.
> Westwood Consulting, Inc
>
> Most good judgment comes from experience.
> Most experience comes from bad judgment.
> - Anonymous
>
>
> "BK-Chicago" <BKChic***@discussions.microsoft.com> wrote in message
> news:5BB5C232-CAAA-49D4-841E-6AF2874B74D1@microsoft.com...
> > Hi
> >
> > I would like help with the following query.
> > The table:
> > CREATE TABLE [dbo].[MyTable] (
> > [ID]  [char] (5) AS NOT NULL ,
> > [period_date]  [datetime] NOT NULL ,
> > [value] [numeric](18, 6) NULL
> > ) ON [PRIMARY]
> >
> >
> > Sample Data:
> >            0004N   9/30/1989                 67.309
> >            0004N   12/31/1989               68.989
> >            0004N   3/31/1990                 68.126
> >            0004N   3/31/1990                 68.126
> >            0004N   6/30/1990                 66.361
> >            0004N   9/30/1990                76.689
> >            0005N  9/25/1989                 1.0
> >            0005N  12/25/1989               35.0
> >            0005N  3/25/1992                 25.0
> > Notes:
> >        There are no primary keys. ie., the period_date is not unique
> > across
> > a ID
> >        There are about 20000 distinct ID's in the table, making this table
> > Gnormous.
> >
> > I would like to find out the most efficient way to extract (ID, Value(1),
> > Value(2), Value(3), Value(4), Value(6))
> >     Group the table by ID's
> >     Gather distinct period_dates for each ID
> >     ORDER (sort-Descending)  by period_dates within each ID
> >     Select the top N(6 in this case) dates
> >     Transpose/Crosstab the data for display.
> >
> > In essence the result for the above sample data will look like
> >
> > ID  period_date(1) val(1)  period_date(2) Val(2) ...  .... period_date(6)
> > val(6)
> > ________________________________________________________________
> > 0004N 9/30/1989   67.309 12/31/1989  68.989.......
> > 0005N 9/25/1989   1         12/25/1989  35   3/25/1992  25  null null
> >
> >
> > Any help would be apprecaited.  Either a SQL or Stored.Proc would do.
> >
> > ENV:
> > SQL Server 2000, Windows2000/XP, No Analysis services
> >
> > Regards
> > B
>
>
>
Author
14 Sep 2006 10:27 PM
Anith Sen
>>          0004N   3/31/1990                 68.126
>>          0004N   3/31/1990                 68.126

With duplicate data in the table, you cannot formulate a logical approach.
Once you eliminate the duplicates, you can use the any of the commonly used
cross tabulation routines. Make sure to declare a formal key so that
duplicate rows can be prevented in the future.

To remove duplicates, see : KBA 139444
To apply a cross tabulation method see KBA : 175574

--
Anith
Author
14 Sep 2006 10:41 PM
BK-Chicago
Thank you. but that doesn't help. Eliminating the duplicate record can be
easily accomplished using AVG() or MAX(). and creating a select statement is
not that difficult either. The issue is creating a column that would have the
number of the row for a ID before i can crosstab it.

SELECT    t.ID, t.period_date, MIN(t.value)
FROM        MyTable t
GROUP BY    t.ID, t.period_date
HAVING    t.period_date in
(
    SELECT TOP 3 period_date
    FROM        MyTable I
    WHERE        i.ID = t.ID
    GROUP BY    period_date
    ORDER BY    period_date desc
)
order by t.ID, t.period_date desc


Show quote
"Anith Sen" wrote:

> >>          0004N   3/31/1990                 68.126
> >>          0004N   3/31/1990                 68.126
>
> With duplicate data in the table, you cannot formulate a logical approach.
> Once you eliminate the duplicates, you can use the any of the commonly used
> cross tabulation routines. Make sure to declare a formal key so that
> duplicate rows can be prevented in the future.
>
> To remove duplicates, see : KBA 139444
> To apply a cross tabulation method see KBA : 175574
>
> --
> Anith
>
>
>
Author
14 Sep 2006 10:54 PM
Anith Sen
Once you eliminate the duplicates, the query in SQL 2000 can be along the
lines of:

SELECT id,
        MAX( CASE seq WHEN 1 THEN period_date END ) AS dt_1,
        MAX( CASE seq WHEN 1 THEN value END ) AS val_1,
        MAX( CASE seq WHEN 2 THEN period_date END ) AS dt_2,
        MAX( CASE seq WHEN 2 THEN value END ) AS val_2,
        MAX( CASE seq WHEN 3 THEN period_date END ) AS dt_3,
        MAX( CASE seq WHEN 3 THEN value END ) AS val_3
  FROM ( SELECT id, period_date, value,
                ( SELECT COUNT(*)
                    FROM t t2
                   WHERE t2.id = t1.id
                     AND t2.period_date <= t1.period_date )
           FROM t t1 ) D ( id, period_date, value, seq )
GROUP BY id ;

Using SQL 2005, you can eliminate the entire subquery inside the derived
table using a windowing function like ROW_NUMBER() or RANK(). Also as an
alternative, you could use PIVOT instead of the series of MAX( CASE .. ).

--
Anith
Author
14 Sep 2006 11:10 PM
BK-Chicago
Anith.

Thanks for the query, But unfortunately i will not be able to elimiate
duplicate rows. The dataset can have values that need to be averaged if all
column elements but the value of it match ie, For the data set
ID 30 date 10/10/2004 value 10
ID 30 date 10/10/2004 value 20

my resultset should be

ID30 date 10/10/2004 should have (10+30)/2.

Wouldn't it make sense to ranks the items within a group(in this case ID)
with numbers?

Regards
Bala


Show quote
"Anith Sen" wrote:

> Once you eliminate the duplicates, the query in SQL 2000 can be along the
> lines of:
>
> SELECT id,
>         MAX( CASE seq WHEN 1 THEN period_date END ) AS dt_1,
>         MAX( CASE seq WHEN 1 THEN value END ) AS val_1,
>         MAX( CASE seq WHEN 2 THEN period_date END ) AS dt_2,
>         MAX( CASE seq WHEN 2 THEN value END ) AS val_2,
>         MAX( CASE seq WHEN 3 THEN period_date END ) AS dt_3,
>         MAX( CASE seq WHEN 3 THEN value END ) AS val_3
>   FROM ( SELECT id, period_date, value,
>                 ( SELECT COUNT(*)
>                     FROM t t2
>                    WHERE t2.id = t1.id
>                      AND t2.period_date <= t1.period_date )
>            FROM t t1 ) D ( id, period_date, value, seq )
>  GROUP BY id ;
>
> Using SQL 2005, you can eliminate the entire subquery inside the derived
> table using a windowing function like ROW_NUMBER() or RANK(). Also as an
> alternative, you could use PIVOT instead of the series of MAX( CASE .. ).
>
> --
> Anith
>
>
>
Author
15 Sep 2006 3:09 AM
Steve Dassin
Hello,

Steve (U of I in Urbana)...I am he.

Here is a solution using the Rac utility on XP/S2k sp4.
Take it for what it's worth.

CREATE TABLE ##MyTable (
[ID]  char(5) NOT NULL ,
[period_date] datetime NOT NULL ,
[value] decimal(18, 6) NULL)
go
        insert ##mytable ([ID],period_date,[value])
        select '0004N' as [ID],'9/30/1989' as period_date, 67.309 as [value]
        union
        select '0004N','12/31/1989',68.989
        union
        select '0004N','3/31/1990',68.126
        union
        select '0004N','3/31/1990',68.126
        union
        select '0004N','6/30/1990',66.361  /* dup date */
        union
        select '0004N','9/30/1990',76.689  /* dup date */
        union
        select '0005N','9/25/1989',1.0     /* dup date */
        union
        select '0005N','9/25/1989',2.0     /* dup date */
        union
        select '0005N','12/25/1989',35.0   /* dup date */
        union
        select '0005N','12/25/1989',36.0   /* dup date */
        union
        select '0005N','3/25/1992',25.0

Exec Rac
-- Cells of the xtab are period_date and avg(value).
@transform='convert(varchar(10),period_date,101) as period_date &
            cast(Avg([value]) as decimal(18,2)) as [value]',
-- Row of xtab.
@rows='[ID]',
-- Columns based on period_date in descending order.
@pvtcol='period_date',@pvtdate='y',@pvtsortype='d',
-- Change the pivot columns to ranks of period_date descending.
@rank='',
-- Limit pivoted ranks to 4 (in descending order).
@ranklimit='4',
@from='##MyTable',
-- Rotate the @transform expressions so they appear period_date
-- then value for each rank (column).
@rotate='nest',
-- What you want to appear in an empy cell.
@emptycell='null',
-- Just some other stuff -:)
@grand_totals='n',@row_totals='n',@rowbreak='n',@racheck='y'

ID     period_date_1 value_1    period_date_2 value_2    period_date_3
value_3    period_date_4 value_4
------ ------------- ---------- ------------- ---------- ------------- -----
----- ------------- ----------
0004N  09/30/1990    76.69      06/30/1990    66.36      03/31/1990    68.13
12/31/1989    68.99
0005N  03/25/1992    25.00      12/25/1989    35.50      09/25/1989    1.50
null          null

There are many other options to modify the result.

If you have ~10m rows to process there are other considerations.
Rac does not work like anything in a textbook:) The fastest
option in Rac for processing involves bcp/bulk insert.
You have to have sufficient disk space and you would have
to see about the best batchsize.

More info @
www.rac4sql.net

best,
steve
http://racster.blogspot.com

AddThis Social Bookmark Button