|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
SQL queryI would like help with the following query. The table: CREATE TABLE [dbo].[MyTable] ( [ID] [char] (5) AS NOT NULL , [period_date] [datetime] NOT NULL , [value] [numeric](18, 6) NULL ) ON [PRIMARY] Sample Data: 0004N 9/30/1989 67.309 0004N 12/31/1989 68.989 0004N 3/31/1990 68.126 0004N 3/31/1990 68.126 0004N 6/30/1990 66.361 0004N 9/30/1990 76.689 0005N 9/25/1989 1.0 0005N 12/25/1989 35.0 0005N 3/25/1992 25.0 Notes: There are no primary keys. ie., the period_date is not unique across a ID There are about 20000 distinct ID's in the table, making this table Gnormous. I would like to find out the most efficient way to extract (ID, Value(1), Value(2), Value(3), Value(4), Value(6)) Group the table by ID's Gather distinct period_dates for each ID ORDER (sort-Descending) by period_dates within each ID Select the top N(6 in this case) dates Transpose/Crosstab the data for display. In essence the result for the above sample data will look like ID period_date(1) val(1) period_date(2) Val(2) ... .... period_date(6) val(6) ________________________________________________________________ 0004N 9/30/1989 67.309 12/31/1989 68.989....... 0005N 9/25/1989 1 12/25/1989 35 3/25/1992 25 null null Any help would be apprecaited. Either a SQL or Stored.Proc would do. ENV: SQL Server 2000, Windows2000/XP, No Analysis services Regards B Steve?
-- Show quoteHide quoteArnie Rowland, Ph.D. Westwood Consulting, Inc Most good judgment comes from experience. Most experience comes from bad judgment. - Anonymous "BK-Chicago" <BKChic***@discussions.microsoft.com> wrote in message news:5BB5C232-CAAA-49D4-841E-6AF2874B74D1@microsoft.com... > Hi > > I would like help with the following query. > The table: > CREATE TABLE [dbo].[MyTable] ( > [ID] [char] (5) AS NOT NULL , > [period_date] [datetime] NOT NULL , > [value] [numeric](18, 6) NULL > ) ON [PRIMARY] > > > Sample Data: > 0004N 9/30/1989 67.309 > 0004N 12/31/1989 68.989 > 0004N 3/31/1990 68.126 > 0004N 3/31/1990 68.126 > 0004N 6/30/1990 66.361 > 0004N 9/30/1990 76.689 > 0005N 9/25/1989 1.0 > 0005N 12/25/1989 35.0 > 0005N 3/25/1992 25.0 > Notes: > There are no primary keys. ie., the period_date is not unique > across > a ID > There are about 20000 distinct ID's in the table, making this table > Gnormous. > > I would like to find out the most efficient way to extract (ID, Value(1), > Value(2), Value(3), Value(4), Value(6)) > Group the table by ID's > Gather distinct period_dates for each ID > ORDER (sort-Descending) by period_dates within each ID > Select the top N(6 in this case) dates > Transpose/Crosstab the data for display. > > In essence the result for the above sample data will look like > > ID period_date(1) val(1) period_date(2) Val(2) ... .... period_date(6) > val(6) > ________________________________________________________________ > 0004N 9/30/1989 67.309 12/31/1989 68.989....... > 0005N 9/25/1989 1 12/25/1989 35 3/25/1992 25 null null > > > Any help would be apprecaited. Either a SQL or Stored.Proc would do. > > ENV: > SQL Server 2000, Windows2000/XP, No Analysis services > > Regards > B Is that a question? If you are wondering whether bk-Chicago is steve, i am
sorry to disappoint you. Regards Bala Show quoteHide quote "Arnie Rowland" wrote: > Steve? > > -- > Arnie Rowland, Ph.D. > Westwood Consulting, Inc > > Most good judgment comes from experience. > Most experience comes from bad judgment. > - Anonymous > > > "BK-Chicago" <BKChic***@discussions.microsoft.com> wrote in message > news:5BB5C232-CAAA-49D4-841E-6AF2874B74D1@microsoft.com... > > Hi > > > > I would like help with the following query. > > The table: > > CREATE TABLE [dbo].[MyTable] ( > > [ID] [char] (5) AS NOT NULL , > > [period_date] [datetime] NOT NULL , > > [value] [numeric](18, 6) NULL > > ) ON [PRIMARY] > > > > > > Sample Data: > > 0004N 9/30/1989 67.309 > > 0004N 12/31/1989 68.989 > > 0004N 3/31/1990 68.126 > > 0004N 3/31/1990 68.126 > > 0004N 6/30/1990 66.361 > > 0004N 9/30/1990 76.689 > > 0005N 9/25/1989 1.0 > > 0005N 12/25/1989 35.0 > > 0005N 3/25/1992 25.0 > > Notes: > > There are no primary keys. ie., the period_date is not unique > > across > > a ID > > There are about 20000 distinct ID's in the table, making this table > > Gnormous. > > > > I would like to find out the most efficient way to extract (ID, Value(1), > > Value(2), Value(3), Value(4), Value(6)) > > Group the table by ID's > > Gather distinct period_dates for each ID > > ORDER (sort-Descending) by period_dates within each ID > > Select the top N(6 in this case) dates > > Transpose/Crosstab the data for display. > > > > In essence the result for the above sample data will look like > > > > ID period_date(1) val(1) period_date(2) Val(2) ... .... period_date(6) > > val(6) > > ________________________________________________________________ > > 0004N 9/30/1989 67.309 12/31/1989 68.989....... > > 0005N 9/25/1989 1 12/25/1989 35 3/25/1992 25 null null > > > > > > Any help would be apprecaited. Either a SQL or Stored.Proc would do. > > > > ENV: > > SQL Server 2000, Windows2000/XP, No Analysis services > > > > Regards > > B > > > >> 0004N 3/31/1990 68.126 With duplicate data in the table, you cannot formulate a logical approach. >> 0004N 3/31/1990 68.126 Once you eliminate the duplicates, you can use the any of the commonly used cross tabulation routines. Make sure to declare a formal key so that duplicate rows can be prevented in the future. To remove duplicates, see : KBA 139444 To apply a cross tabulation method see KBA : 175574 -- Anith Thank you. but that doesn't help. Eliminating the duplicate record can be
easily accomplished using AVG() or MAX(). and creating a select statement is not that difficult either. The issue is creating a column that would have the number of the row for a ID before i can crosstab it. SELECT t.ID, t.period_date, MIN(t.value) FROM MyTable t GROUP BY t.ID, t.period_date HAVING t.period_date in ( SELECT TOP 3 period_date FROM MyTable I WHERE i.ID = t.ID GROUP BY period_date ORDER BY period_date desc ) order by t.ID, t.period_date desc Show quoteHide quote "Anith Sen" wrote: > >> 0004N 3/31/1990 68.126 > >> 0004N 3/31/1990 68.126 > > With duplicate data in the table, you cannot formulate a logical approach. > Once you eliminate the duplicates, you can use the any of the commonly used > cross tabulation routines. Make sure to declare a formal key so that > duplicate rows can be prevented in the future. > > To remove duplicates, see : KBA 139444 > To apply a cross tabulation method see KBA : 175574 > > -- > Anith > > > Once you eliminate the duplicates, the query in SQL 2000 can be along the
lines of: SELECT id, MAX( CASE seq WHEN 1 THEN period_date END ) AS dt_1, MAX( CASE seq WHEN 1 THEN value END ) AS val_1, MAX( CASE seq WHEN 2 THEN period_date END ) AS dt_2, MAX( CASE seq WHEN 2 THEN value END ) AS val_2, MAX( CASE seq WHEN 3 THEN period_date END ) AS dt_3, MAX( CASE seq WHEN 3 THEN value END ) AS val_3 FROM ( SELECT id, period_date, value, ( SELECT COUNT(*) FROM t t2 WHERE t2.id = t1.id AND t2.period_date <= t1.period_date ) FROM t t1 ) D ( id, period_date, value, seq ) GROUP BY id ; Using SQL 2005, you can eliminate the entire subquery inside the derived table using a windowing function like ROW_NUMBER() or RANK(). Also as an alternative, you could use PIVOT instead of the series of MAX( CASE .. ). -- Anith Anith.
Thanks for the query, But unfortunately i will not be able to elimiate duplicate rows. The dataset can have values that need to be averaged if all column elements but the value of it match ie, For the data set ID 30 date 10/10/2004 value 10 ID 30 date 10/10/2004 value 20 my resultset should be ID30 date 10/10/2004 should have (10+30)/2. Wouldn't it make sense to ranks the items within a group(in this case ID) with numbers? Regards Bala Show quoteHide quote "Anith Sen" wrote: > Once you eliminate the duplicates, the query in SQL 2000 can be along the > lines of: > > SELECT id, > MAX( CASE seq WHEN 1 THEN period_date END ) AS dt_1, > MAX( CASE seq WHEN 1 THEN value END ) AS val_1, > MAX( CASE seq WHEN 2 THEN period_date END ) AS dt_2, > MAX( CASE seq WHEN 2 THEN value END ) AS val_2, > MAX( CASE seq WHEN 3 THEN period_date END ) AS dt_3, > MAX( CASE seq WHEN 3 THEN value END ) AS val_3 > FROM ( SELECT id, period_date, value, > ( SELECT COUNT(*) > FROM t t2 > WHERE t2.id = t1.id > AND t2.period_date <= t1.period_date ) > FROM t t1 ) D ( id, period_date, value, seq ) > GROUP BY id ; > > Using SQL 2005, you can eliminate the entire subquery inside the derived > table using a windowing function like ROW_NUMBER() or RANK(). Also as an > alternative, you could use PIVOT instead of the series of MAX( CASE .. ). > > -- > Anith > > > Hello,
Steve (U of I in Urbana)...I am he. Here is a solution using the Rac utility on XP/S2k sp4. Take it for what it's worth. CREATE TABLE ##MyTable ( [ID] char(5) NOT NULL , [period_date] datetime NOT NULL , [value] decimal(18, 6) NULL) go insert ##mytable ([ID],period_date,[value]) select '0004N' as [ID],'9/30/1989' as period_date, 67.309 as [value] union select '0004N','12/31/1989',68.989 union select '0004N','3/31/1990',68.126 union select '0004N','3/31/1990',68.126 union select '0004N','6/30/1990',66.361 /* dup date */ union select '0004N','9/30/1990',76.689 /* dup date */ union select '0005N','9/25/1989',1.0 /* dup date */ union select '0005N','9/25/1989',2.0 /* dup date */ union select '0005N','12/25/1989',35.0 /* dup date */ union select '0005N','12/25/1989',36.0 /* dup date */ union select '0005N','3/25/1992',25.0 Exec Rac -- Cells of the xtab are period_date and avg(value). @transform='convert(varchar(10),period_date,101) as period_date & cast(Avg([value]) as decimal(18,2)) as [value]', -- Row of xtab. @rows='[ID]', -- Columns based on period_date in descending order. @pvtcol='period_date',@pvtdate='y',@pvtsortype='d', -- Change the pivot columns to ranks of period_date descending. @rank='', -- Limit pivoted ranks to 4 (in descending order). @ranklimit='4', @from='##MyTable', -- Rotate the @transform expressions so they appear period_date -- then value for each rank (column). @rotate='nest', -- What you want to appear in an empy cell. @emptycell='null', -- Just some other stuff -:) @grand_totals='n',@row_totals='n',@rowbreak='n',@racheck='y' ID period_date_1 value_1 period_date_2 value_2 period_date_3 value_3 period_date_4 value_4 ------ ------------- ---------- ------------- ---------- ------------- ----- ----- ------------- ---------- 0004N 09/30/1990 76.69 06/30/1990 66.36 03/31/1990 68.13 12/31/1989 68.99 0005N 03/25/1992 25.00 12/25/1989 35.50 09/25/1989 1.50 null null There are many other options to modify the result. If you have ~10m rows to process there are other considerations. Rac does not work like anything in a textbook:) The fastest option in Rac for processing involves bcp/bulk insert. You have to have sufficient disk space and you would have to see about the best batchsize. More info @ www.rac4sql.net best, steve http://racster.blogspot.com
Update Statement
Date Range UNDO XP_SENDMAIL Overload in SQL 2005 Problem with repeated use of temp tables Small problem with NOT EXISTS... Union with the results of two SP Statistic Problem Linked Server Error: Login failed for user NT AUTHORITY\ANONYMOUS LOGON How best to create a unique identifier across two or more tables... |
|||||||||||||||||||||||