|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
How to group on half-hour increments on a datetime field?Hi,
In my table I have a datetime field (called modtime), and I need to build a select statement that groups by half-hour increments, so that it looks something like: modtime SUM(sales) 7:00 - 7:30 9999.99 7:30 - 8:00 9999.99 8:00 - 8:30 9999.99 etc.... Can something like this be done? Thanks, Edgard L. Riba Try:
declare @date datetime, @hh int set @date = getdate() set @hh = datepart (hh, @date) select @hh , case when datepart (mi, @date) < 30 then str (@hh) +':00 - ' + ltrim (str (@hh)) + ':30' else str (@hh) +':30 - ' + ltrim (str (@hh + 1)) + ':00' end You can then modify you query to use the CASE in the GROUP BY. -- Tom ---------------------------------------------------- Thomas A. Moreau, BSc, PhD, MCSE, MCDBA SQL Server MVP Columnist, SQL Server Professional Toronto, ON Canada www.pinpub.com .. "Edgard L. Riba" <elriba at rimith dot com> wrote in message In my table I have a datetime field (called modtime), and I need to build anews:eL8krx4IGHA.3064@TK2MSFTNGP10.phx.gbl... Hi, select statement that groups by half-hour increments, so that it looks something like: modtime SUM(sales) 7:00 - 7:30 9999.99 7:30 - 8:00 9999.99 8:00 - 8:30 9999.99 etc.... Can something like this be done? Thanks, Edgard L. Riba Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in your schema are. Sample data is also a good idea, along with clear specifications. It is very hard to debug code when you do not let us see it. Let's get back to the basics of an RDBMS. Rows are not records; fields are not columns; tables are not files; there is no sequential access or ordering in an RDBMS, The best way to do this is to build a table of reporting ranges, with start and finish times. It will port, allow you adjust the times, and JOINs usually run faster than row-by-row compuations. On 27 Jan 2006 17:56:16 -0800, --CELKO-- wrote:
(snip) >The best way to do this is to build a table of reporting ranges, with Hi Joe,>start and finish times. It will port, allow you adjust the times, and >JOINs usually run faster than row-by-row compuations. It's a way, but not the best way. Think about scalability. With half-hour increments, it's not too bad - but what if you have to report in half-minute increments? Sure, you can set up a table with half-minute increments - that's about a million rows for a year. If you want a hassle-free DB that won't break down if you call in sick, I'd recommend pre-populating at least three years - that's three million rows in a table, just to get this report done. And if the manager decides that 5 second intervals would be better still, it'd grow to 18,000,000 rows! Simply do a GROUP BY. Either with the CASE expression suggested by Tom, or by FLOOR(DATEDIFF(minute, @basedate, TheDate) / 30). No need for an extra table at all. And changing the interval is easy. -- Hugo Kornelis, SQL Server MVP >> 'd recommend pre-populating at least three years - that's three million rows in a table, just to get this report done. And if the manager decides that 5second intervals would be better still, it'd grow to 18,000,000 rows! << Yes, but since the SQL Server model of time has funny rounding when you super-precise, you will still have problems if you use a functinal approach instead of a table approach. DB2 goes to more decimal seconds more accurately because it came from mainframes and not Sybase/UNIX/16-bit minicomputers. Nothing is perfect, but the table is faster (as well as portable) until you get down to that level. If I have to track things at that level, I woudl get Kx or StreamBase instead of SQL Server, since they are real-time flow databases, not classical RDBMS. Pick the tool for the job. Off subject: My "wish list" would be to have a Sequence () and Calendar () pseudo-table construct something like: CALENDAR ( FROM <start_timestamp> TO <end_timestamp> BY <step_size_interval>) AS <table_name> where <step_size> is an interval in SQL-92. Likewise, it would nice to have: SEQUENCE (FROM <start_int> TO <end_)integer> BY <step_size_int>) AS <table_name> to construct a pseudo-table on the fly in the engine. Not saying a table would not be fast as I did not test this. But just seems
like in no way would it be faster. You got the disk access and the join, etc. It would be interesting to see a comparison however between the two. -- William Stacey [MVP] "--CELKO--" <jcelko***@earthlink.net> wrote in message rows in anews:1138502782.898131.153840@o13g2000cwo.googlegroups.com... | >> 'd recommend pre-populating at least three years - that's three million Show quote | table, just to get this report done. And if the manager decides that 5 | second intervals would be better still, it'd grow to 18,000,000 rows! | << | | Yes, but since the SQL Server model of time has funny rounding when you | super-precise, you will still have problems if you use a functinal | approach instead of a table approach. DB2 goes to more decimal seconds | more accurately because it came from mainframes and not | Sybase/UNIX/16-bit minicomputers. | | Nothing is perfect, but the table is faster (as well as portable) until | you get down to that level. | | If I have to track things at that level, I woudl get Kx or StreamBase | instead of SQL Server, since they are real-time flow databases, not | classical RDBMS. Pick the tool for the job. | | Off subject: | My "wish list" would be to have a Sequence () and Calendar () | pseudo-table construct something like: | CALENDAR ( FROM <start_timestamp> TO <end_timestamp> BY | <step_size_interval>) AS <table_name> | | where <step_size> is an interval in SQL-92. | | Likewise, it would nice to have: | | SEQUENCE (FROM <start_int> TO <end_)integer> BY <step_size_int>) AS | <table_name> | | to construct a pseudo-table on the fly in the engine. | >> seems like in no way would it be faster. You got the disk access and the join,etc. << No, the relatively small time range table tends to go into mainstorage, where joins are pretty fast. But procedural code tends to be executed one row at a time, it cannot use indexing, etc .. I would still need to see the results. Put something together and maybe
myself or someone else will put procedure method to the test. Procedural Shoot Off #1! :-) -- William Stacey [MVP] "--CELKO--" <jcelko***@earthlink.net> wrote in message the join,etc. <<news:1138582060.514914.218050@g44g2000cwa.googlegroups.com... | >> seems like in no way would it be faster. You got the disk access and Show quote | | No, the relatively small time range table tends to go into main | storage, where joins are pretty fast. But procedural code tends to be | executed one row at a time, it cannot use indexing, etc | . | On 29 Jan 2006 16:47:40 -0800, --CELKO-- wrote:
>>> seems like in no way would it be faster. You got the disk access and the join,etc. << Hi Joe,> >No, the relatively small time range table tends to go into main >storage, where joins are pretty fast. But procedural code tends to be >executed one row at a time, it cannot use indexing, etc Procedural code? What procedural code are you talking about??? The suggestion I posted (and that is, in my belief, subject of this discussion) is as set-based as can be. Here's the relevant quote again: >>Simply do a GROUP BY. Either with the CASE expression suggested by Tom, In case you have trouble understanding this plain-English description,>>or by FLOOR(DATEDIFF(minute, @basedate, TheDate) / 30). No need for an >>extra table at all. And changing the interval is easy. check the Jatinder Singh's message - he posted a complete repro based on my suggestion. Sadly, you didn't reply to my challenge to post the code you used to verify that your table-of-intervals approach is faster than my GROUP BY plus date/time calculations. -- Hugo Kornelis, SQL Server MVP >> Procedural code? What procedural code are you talking about? << The calls to functions like FLOOR(DATEDIFF(minute, @basedate, TheDate)/ 30). as opposed to simple comparisons (=, <, >) on raw data types. You cannot get an index on FLOOR(DATEDIFF(minute, @basedate, TheDate) / 30 easily. >> Sadly, you didn't reply to my challenge to post the code you used to verify that your table-of-intervals approach is faster than my GROUP BYplus date/time calculations. << What we need is a neutral thrid party with a big enough machine toscale to a large DB. Remember when we did this with the code for breaking up a CSV string with a single query versus a procedure? my 2 cents:
If there is a need to speed up the query, I would first of all create a covering index on time, sales. If that is not good enough, I would create a computed column, either with the CASE expression suggested by Tom, or by FLOOR(DATEDIFF(minute, @basedate, TheDate) / 30) (Joe: in MS SQl Server world that's very easy, anybody could do that). Then I'd create a covering index on that computed column and sales. Makes sense? >> FLOOR(DATEDIFF(minute, @basedate, TheDate) / 30) (Joe: in MS SQL Server world that's very easy, anybody could do that). Then I'd create a covering index on that computed column and sales. Makes sense? << It makes sense, but you still lose the indexing anfd have to do thecomputations. Maybe my clients have larger machines, so they can store the "working tables" in the query. --CELKO-- wrote:
>>> FLOOR(DATEDIFF(minute, @basedate, TheDate) / 30) (Joe: in MS SQL Server world that's very easy, anybody could do that). Then I'd create a covering index on that computed column and sales. Makes sense? << Not if you turn it into an indexed view. Then it's essentially the same >>> >>> >It makes sense, but you still lose the indexing anfd have to do the >computations. > as looking up a flat table (except for the extra server resources required to keep the data in the indexed view up to date upon modifications to the underlying base table(s)). If you create an appropriate index including the computed column then the optimiser will consider that when compiling the query plan. > It makes sense, but you still lose the indexing anfd have to do the computations. Wrong, in SQL Server as soon as an index on a computed column iscreated, the computed values are stored in it. BTW, I just took a real life table, added a computed column alter table ***.*** add half_hour as ( FLOOR(DATEDIFF(minute, convert(datetime,'19990526',112), lmod_dt) / 30) ) and created an index on it. The query select half_hour, sum(amount) from ***.*** group by half_hour consistently uses 250% less CPU as compared to the query using the covering index on (lmod_dt, amount) On 30 Jan 2006 18:32:26 -0800, --CELKO-- wrote:
>>> Procedural code? What procedural code are you talking about? << Hi Joe,> >The calls to functions like FLOOR(DATEDIFF(minute, @basedate, TheDate) >/ 30). as opposed to simple comparisons (=, <, >) on raw data types. You seem to have a different definition of procedural code than I have. My definition is derived from meaning #2 of "procedure" in the Marriam-Webster Online Dictionary: 2 a : a series of steps followed in a regular definite order <legal procedure> <a surgical procedure> b : a series of instructions for a computer that has a name by which it can be called into action (http://www.m-w.com/dictionary/procedure) The entry in Wikipedia appears to be somewhat biased (http://en.wikipedia.org/wiki/Procedural_code) If anything that goes beyond simple comparisons on raw data types is procedural code, than almost all the code examples in SQL For Smarties are procedural code. >You cannot get an index on FLOOR(DATEDIFF(minute, @basedate, TheDate) / Others have already proven this statement wrong.>30 easily. > I take it, then, that you did NOT perform any test before claiming that>>> Sadly, you didn't reply to my challenge to post the code you used to verify that your table-of-intervals approach is faster than my GROUP BYplus date/time calculations. << > >What we need is a neutral thrid party with a big enough machine to >scale to a large DB. your table-of-intervals version would be faster than my GROUP BY expression version? That you just posted a wild assumption without any checking? > Remember when we did this with the code for Actually, no. You must be confusing me with someone else.>breaking up a CSV string with a single query versus a procedure? -- Hugo Kornelis, SQL Server MVP | If anything that goes beyond simple comparisons on raw data types is Agreed. SQL itself is nothing but a *lot of procedural code that | procedural code, than almost all the code examples in SQL For Smarties | are procedural code. reads/writes from binary files. -- William Stacey [MVP] On 28 Jan 2006 18:46:22 -0800, --CELKO-- wrote:
>>> 'd recommend pre-populating at least three years - that's three million rows in a Hi Joe,>table, just to get this report done. And if the manager decides that 5 >second intervals would be better still, it'd grow to 18,000,000 rows! ><< > >Yes, but since the SQL Server model of time has funny rounding when you >super-precise, Funny? I don't think so. The accuracy is one three-hundredth of a second and values are rounded to increments of .000, .003, or .007 seconds. > you will still have problems if you use a functinal Could you post an example that will have problems in the functional>approach instead of a table approach. approach but not in the table approach? > DB2 goes to more decimal seconds But since 5-second intervals already require an 18,000,000 row table to>more accurately because it came from mainframes and not >Sybase/UNIX/16-bit minicomputers. do the report with the table approach, I'm not sure if more precision should be labeled a good thing in the context of this discussion <g>. > The table is not portable either, since SQL Server's datetime data type>Nothing is perfect, but the table is faster (as well as portable) until >you get down to that level. itself is not portable. Faster? That would be surprising. I assume that you've done research before posting this - could you post the script you used to test performance of both versions? -- Hugo Kornelis, SQL Server MVP Don't hold your breath Hugo.
Remember celko seems to have a complete lack and understanding of present day development and architecture and is stuck in 80's programming models. Personally, if i wanted precision other than what DATETIME can give I'd now do it using a CLR coded data type. Show quote "Hugo Kornelis" <h***@perFact.REMOVETHIS.info> wrote in message news:4fbqt19lgimvhf96ndbftb9qinulp2l01o@4ax.com... > On 28 Jan 2006 18:46:22 -0800, --CELKO-- wrote: > >>>> 'd recommend pre-populating at least three years - that's three million >>>> rows in a >>table, just to get this report done. And if the manager decides that 5 >>second intervals would be better still, it'd grow to 18,000,000 rows! >><< >> >>Yes, but since the SQL Server model of time has funny rounding when you >>super-precise, > > Hi Joe, > > Funny? I don't think so. The accuracy is one three-hundredth of a second > and values are rounded to increments of .000, .003, or .007 seconds. > >> you will still have problems if you use a functinal >>approach instead of a table approach. > > Could you post an example that will have problems in the functional > approach but not in the table approach? > >> DB2 goes to more decimal seconds >>more accurately because it came from mainframes and not >>Sybase/UNIX/16-bit minicomputers. > > But since 5-second intervals already require an 18,000,000 row table to > do the report with the table approach, I'm not sure if more precision > should be labeled a good thing in the context of this discussion <g>. > >> >>Nothing is perfect, but the table is faster (as well as portable) until >>you get down to that level. > > The table is not portable either, since SQL Server's datetime data type > itself is not portable. > > Faster? That would be surprising. I assume that you've done research > before posting this - could you post the script you used to test > performance of both versions? > > -- > Hugo Kornelis, SQL Server MVP Hi There,
Try this DECLARE @t table( ModTime datetime, Sales int); insert into @t values('1/2/2005 7:01:00', 110) insert into @t values('1/2/2005 7:29:00', 200) insert into @t values('1/2/2005 7:45:00', 220) insert into @t values('1/2/2005 7:59:00', 300) Select Min(dateAdd(minute,30 * Floor(DateDiff(minute,0,Modtime) / 30),0 )) MinGroupTime, Max(dateAdd(minute,30 * Floor(DateDiff(minute,0,Modtime) / 30) + 29 ,0 )) MaxGroupTime, --Min(modTime), Max(ModTime), --Floor(DateDiff(minute,0,Modtime) / 30), Sum(Sales) From @t T1 --, (Select Min(ModTime) as BaseDate From @t) T2 Group By Floor(DateDiff(minute,0,Modtime) / 30) This approach is very easy as pointed out by Hugo. With Warm regards Jatinder Singh I just wanted to see how I would do this using my TDateTime UDT. I think
this does it: DECLARE @t table( ModTime datetime, Sales int); insert into @t values('1/2/2005 7:01:00', 110) insert into @t values('1/2/2005 7:30:00', 200) insert into @t values('1/2/2005 7:45:00', 220) insert into @t values('1/2/2005 7:59:00', 300) select min(TDateTime::FromSqlDateTime(ModTime).AddMinutes(-datepart(mi, ModTime)).AddMinutes( datepart(mi, ModTime)/30*30 )).SqlDateTime, sum(Sales) from @t group by TDateTime::FromSqlDateTime(ModTime).AddMinutes(-datepart(mi, ModTime)).AddMinutes( datepart(mi, ModTime)/30*30 ) Output -------------------------------- 2005-01-02 07:00:00.000 110 2005-01-02 07:30:00.000 720 -- Show quoteWilliam Stacey [MVP] "Edgard L. Riba" <elriba at rimith dot com> wrote in message news:eL8krx4IGHA.3064@TK2MSFTNGP10.phx.gbl... | Hi, | | In my table I have a datetime field (called modtime), and I need to build a | select statement that groups by half-hour increments, so that it looks | something like: | | modtime SUM(sales) | 7:00 - 7:30 9999.99 | 7:30 - 8:00 9999.99 | 8:00 - 8:30 9999.99 | | etc.... | | Can something like this be done? | | Thanks, | Edgard L. Riba | | |
|||||||||||||||||||||||