|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Most recent data point?The general format is: DECLARE @Data TABLE ( DataID int, TimeCollected int, DataValue decimal(9,9) ) INSERT @Data VALUES (1, 1000, 0.75) INSERT @Data VALUES (1, 1001, 0.69) INSERT @Data VALUES (1, 1002, 0.77) INSERT @Data VALUES (2, 1000, 0.45) INSERT @Data VALUES (2, 1001, 0.49) INSERT @Data VALUES (2, 1002, 0.53) INSERT @Data VALUES (3, 1000, 0.50) INSERT @Data VALUES (3, 1001, 0.35) INSERT @Data VALUES (3, 1002, 0.41) I want to pull the most recent datapoint for each of the DataIDs. I can do it like this: SELECT d.* FROM @Data d INNER JOIN ( SELECT DataID, MAX(TimeCollected) AS MaxTime FROM @Data GROUP BY DataID ) dt ON (d.DataID=dt.DataID) AND (d.TimeCollected=dt.MaxTime) Is there a more efficient way than this? I may have hundreds of different DataIDs with tens or hundreds of thousands of data points for each DataID. Thanks! -mdb First, THANK YOU FOR INCLUDING INSERTs. You rock and made this so much more
fun/easy to do! It is good (assuming you have unique values in the combination of columns DataId,TimeCollected, otherwise you might get duplicates.) I might suggest to change to an exists, since you really aren't joining, you are selecting from d, based on a criteria. It makes the query easier to understand, though it does seem to make it more complex. (I redid the data with temp tables so I could rerun the queries and look at the plan sans all of the inserts.) When I ran the two queries together, this one took 32% to the others 68: SELECT d.* FROM #Data d WHERE EXISTS (SELECT * FROM ( SELECT DataID, MAX(TimeCollected) AS MaxTime FROM #Data GROUP BY DataID ) dt WHERE (d.DataID=dt.DataID) AND (d.TimeCollected=dt.MaxTime)) StmtText --------------------------------------------------------------------------------------------------------------------------------------------------------- |--Nested Loops(Left Semi Join, OUTER REFERENCES:([d].[DataID], [d].[TimeCollected]=[Expr1007]))[d].[TimeCollected])) |--Table Scan(OBJECT:([tempdb].[dbo].[#Data] AS [d])) |--Filter(WHERE:([tempdb].[dbo].[#Data].[TimeCollected] as |--Stream Aggregate(DEFINE:([Expr1007]=MAX([tempdb].[dbo].[#Data].[TimeCollected]))) |--Table Scan(OBJECT:([tempdb].[dbo].[#Data]), WHERE:([tempdb].[dbo].[#Data].[DataID] as [d].[DataID]=[tempdb].[dbo].[#Data].[DataID])) The biggest increase in cost was that the orignal query needed a sort, as it took 78% of the execution: StmtText ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SELECT d.* FROM #Data d INNER JOIN ( SELECT DataID, MAX(TimeCollected) AS MaxTime FROM #Data GROUP BY DataID ) dt ON (d.DataID=dt.DataID) AND (d.TimeCollected=dt.MaxTime) StmtText -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |--Top(TOP EXPRESSION:((1))) WHERE:([tempdb].[dbo].[#Data].[DataID] as [d].[DataID] IS NOT NULL AND |--Segment |--Sort(ORDER BY:([d].[DataID] DESC, [d].[TimeCollected] DESC)) |--Table Scan(OBJECT:([tempdb].[dbo].[#Data] AS [d]), [tempdb].[dbo].[#Data].[TimeCollected] as [d].[TimeCollected] IS NOT NULL)) When I added a unique constraint: alter table #data add unique (dataId, timeCollected) The plans were equivalent. So you might test it out and see for yourself: |--Top(TOP EXPRESSION:((1))) Seek(OBJECT:([tempdb].[dbo].[#Data].[UQ__#Data_____________15502E78] AS |--Segment |--Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1000])) |--Index [d]), SEEK:([d].[DataID] IsNotNull), WHERE:([tempdb].[dbo].[#Data].[TimeCollected] as [d].[TimeCollected] IS NOT NULL) ORDERED BACKWARD) |--RID Lookup(OBJECT:([tempdb].[dbo].[#Data] AS [d]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD)---------------------------------------------------------------------------- Louis Davidson - http://spaces.msn.com/members/drsql/ SQL Server MVP "Arguments are to be avoided: they are always vulgar and often convincing." (Oscar Wilde) Show quote "Michael Bray" <mbray@makeDIntoDot_ctiusaDcom> wrote in message news:Xns974A9E81CA7A0mbrayctiusacom@207.46.248.16... >I have a table that stores data points for several different data sources. > > The general format is: > > DECLARE @Data TABLE > ( > DataID int, > TimeCollected int, > DataValue decimal(9,9) > ) > > INSERT @Data VALUES (1, 1000, 0.75) > INSERT @Data VALUES (1, 1001, 0.69) > INSERT @Data VALUES (1, 1002, 0.77) > INSERT @Data VALUES (2, 1000, 0.45) > INSERT @Data VALUES (2, 1001, 0.49) > INSERT @Data VALUES (2, 1002, 0.53) > INSERT @Data VALUES (3, 1000, 0.50) > INSERT @Data VALUES (3, 1001, 0.35) > INSERT @Data VALUES (3, 1002, 0.41) > > I want to pull the most recent datapoint for each of the DataIDs. I can > do > it like this: > > SELECT d.* > FROM @Data d > INNER JOIN > ( > SELECT DataID, MAX(TimeCollected) AS MaxTime > FROM @Data > GROUP BY DataID > ) dt ON (d.DataID=dt.DataID) AND (d.TimeCollected=dt.MaxTime) > > Is there a more efficient way than this? I may have hundreds of different > DataIDs with tens or hundreds of thousands of data points for each DataID. > > Thanks! > > -mdb "Louis Davidson" <dr_dontspamme_sql@hotmail.com> wrote in That's actually quite funny. When I tested your statement against my news:emLVFsIGGHA.2064@TK2MSFTNGP09.phx.gbl: > When I added a unique constraint: > alter table #data add unique (dataId, timeCollected) > The plans were equivalent. > statement on the REAL table (the example I provided was not the real data table - it was just for presentation here in the newsgroup), I found that your statement and mine had exactly the same execution plan. Then I checked the indexes on the table, and indeed, it does have an index on (dataId, timeCollected). Bravo! -mdb Cool. I am always a bit worried when presenting a plan based on execution
on my single user enterprise (equivalent) edition laptop versus a real live running system with disks that run faster than 4800 RPM :) -- Show quote---------------------------------------------------------------------------- Louis Davidson - http://spaces.msn.com/members/drsql/ SQL Server MVP "Arguments are to be avoided: they are always vulgar and often convincing." (Oscar Wilde) "Michael Bray" <mbray@makeDIntoDot_ctiusaDcom> wrote in message news:Xns974AB04EC7F73mbrayctiusacom@207.46.248.16... > "Louis Davidson" <dr_dontspamme_sql@hotmail.com> wrote in > news:emLVFsIGGHA.2064@TK2MSFTNGP09.phx.gbl: > >> When I added a unique constraint: >> alter table #data add unique (dataId, timeCollected) >> The plans were equivalent. >> > > That's actually quite funny. When I tested your statement against my > statement on the REAL table (the example I provided was not the real data > table - it was just for presentation here in the newsgroup), I found that > your statement and mine had exactly the same execution plan. Then I > checked the indexes on the table, and indeed, it does have an index on > (dataId, timeCollected). > > Bravo! > > -mdb |
|||||||||||||||||||||||