Home All Groups Group Topic Archive Search About

How to group on half-hour increments on a datetime field?

Author
27 Jan 2006 9:55 PM
Edgard L. Riba
Hi,

In my table I have a datetime field (called modtime), and I need to build a
select statement that groups by half-hour increments, so that it looks
something like:

modtime             SUM(sales)
7:00 - 7:30         9999.99
7:30 - 8:00         9999.99
8:00 - 8:30         9999.99

etc....

Can something like this be done?

Thanks,
Edgard L. Riba

Author
27 Jan 2006 10:07 PM
Tom Moreau
Try:

declare @date datetime, @hh int
set @date = getdate()
set @hh = datepart (hh, @date)

select
@hh
, case
  when datepart (mi, @date) < 30
   then str (@hh) +':00 - ' + ltrim (str (@hh)) + ':30'
   else str (@hh) +':30 - ' + ltrim (str (@hh + 1)) + ':00' end

You can then modify you query to use the CASE in the GROUP BY.

--
   Tom

----------------------------------------------------
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON   Canada
www.pinpub.com
..
"Edgard L. Riba" <elriba at rimith dot com> wrote in message
news:eL8krx4IGHA.3064@TK2MSFTNGP10.phx.gbl...
Hi,

In my table I have a datetime field (called modtime), and I need to build a
select statement that groups by half-hour increments, so that it looks
something like:

modtime             SUM(sales)
7:00 - 7:30         9999.99
7:30 - 8:00         9999.99
8:00 - 8:30         9999.99

etc....

Can something like this be done?

Thanks,
Edgard L. Riba
Author
28 Jan 2006 1:56 AM
--CELKO--
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications.  It is very hard to debug code when you do not let us
see it.

Let's get back to the basics of an RDBMS. Rows are not records; fields
are not columns; tables are not files; there is no sequential access or
ordering in an RDBMS,

The best way to do this is to build a table of reporting ranges, with
start and finish times.  It will port, allow you adjust the times, and
JOINs usually run faster than row-by-row compuations.
Author
28 Jan 2006 10:15 PM
Hugo Kornelis
On 27 Jan 2006 17:56:16 -0800, --CELKO-- wrote:

(snip)
>The best way to do this is to build a table of reporting ranges, with
>start and finish times.  It will port, allow you adjust the times, and
>JOINs usually run faster than row-by-row compuations.

Hi Joe,

It's a way, but not the best way. Think about scalability. With
half-hour increments, it's not too bad - but what if you have to report
in half-minute increments? Sure, you can set up a table with half-minute
increments - that's about a million rows for a year. If you want a
hassle-free DB that won't break down if you call in sick, I'd recommend
pre-populating at least three years - that's three million rows in a
table, just to get this report done. And if the manager decides that 5
second intervals would be better still, it'd grow to 18,000,000 rows!

Simply do a GROUP BY. Either with the CASE expression suggested by Tom,
or by FLOOR(DATEDIFF(minute, @basedate, TheDate) / 30). No need for an
extra table at all. And changing the interval is easy.

--
Hugo Kornelis, SQL Server MVP
Author
29 Jan 2006 2:46 AM
--CELKO--
>> 'd recommend pre-populating at least three years - that's three million rows in a
table, just to get this report done. And if the manager decides that 5
second intervals would be better still, it'd grow to 18,000,000 rows!
<<

Yes, but since the SQL Server model of time has funny rounding when you
super-precise, you will still have problems if you use a functinal
approach instead of a table approach.  DB2 goes to more decimal seconds
more accurately because it came from mainframes and not
Sybase/UNIX/16-bit minicomputers.

Nothing is perfect, but the table is faster (as well as portable) until
you get down to that level.

If I have to track things at that level, I woudl get Kx or StreamBase
instead of SQL Server, since they are real-time flow databases, not
classical RDBMS. Pick the tool for the job.

Off subject:
My "wish list" would be to have a Sequence () and Calendar ()
pseudo-table construct something like:
  CALENDAR ( FROM <start_timestamp> TO <end_timestamp> BY
<step_size_interval>) AS <table_name>

where <step_size> is an interval in SQL-92.

Likewise, it would nice to have:

SEQUENCE (FROM <start_int> TO <end_)integer> BY <step_size_int>)  AS
<table_name>

to construct a pseudo-table on the fly in the engine.
Author
29 Jan 2006 4:38 AM
William Stacey [MVP]
Not saying a table would not be fast as I did not test this.  But just seems
like in no way would it be faster.  You got the disk access and the join,
etc.  It would be interesting to see a comparison however between the two.

--
William Stacey [MVP]

"--CELKO--" <jcelko***@earthlink.net> wrote in message
news:1138502782.898131.153840@o13g2000cwo.googlegroups.com...
| >> 'd recommend pre-populating at least three years - that's three million
rows in a
Show quote
| table, just to get this report done. And if the manager decides that 5
| second intervals would be better still, it'd grow to 18,000,000 rows!
| <<
|
| Yes, but since the SQL Server model of time has funny rounding when you
| super-precise, you will still have problems if you use a functinal
| approach instead of a table approach.  DB2 goes to more decimal seconds
| more accurately because it came from mainframes and not
| Sybase/UNIX/16-bit minicomputers.
|
| Nothing is perfect, but the table is faster (as well as portable) until
| you get down to that level.
|
| If I have to track things at that level, I woudl get Kx or StreamBase
| instead of SQL Server, since they are real-time flow databases, not
| classical RDBMS. Pick the tool for the job.
|
| Off subject:
| My "wish list" would be to have a Sequence () and Calendar ()
| pseudo-table construct something like:
|  CALENDAR ( FROM <start_timestamp> TO <end_timestamp> BY
| <step_size_interval>) AS <table_name>
|
| where <step_size> is an interval in SQL-92.
|
| Likewise, it would nice to have:
|
| SEQUENCE (FROM <start_int> TO <end_)integer> BY <step_size_int>)  AS
| <table_name>
|
| to construct a pseudo-table on the fly in the engine.
|
Author
30 Jan 2006 12:47 AM
--CELKO--
>> seems like in no way would it be faster.  You got the disk access and the join,etc.  <<

No, the relatively small time range table tends to go into main
storage, where joins are pretty fast.  But procedural code tends to be
executed one row at a time, it cannot use indexing, etc 
..
Author
30 Jan 2006 1:39 AM
William Stacey [MVP]
I would still need to see the results.  Put something together and maybe
myself or someone else will put procedure method to the test.  Procedural
Shoot Off #1!  :-)

--
William Stacey [MVP]

"--CELKO--" <jcelko***@earthlink.net> wrote in message
news:1138582060.514914.218050@g44g2000cwa.googlegroups.com...
| >> seems like in no way would it be faster.  You got the disk access and
the join,etc.  <<
Show quote
|
| No, the relatively small time range table tends to go into main
| storage, where joins are pretty fast.  But procedural code tends to be
| executed one row at a time, it cannot use indexing, etc
| .
|
Author
30 Jan 2006 10:52 PM
Hugo Kornelis
On 29 Jan 2006 16:47:40 -0800, --CELKO-- wrote:

>>> seems like in no way would it be faster.  You got the disk access and the join,etc.  <<
>
>No, the relatively small time range table tends to go into main
>storage, where joins are pretty fast.  But procedural code tends to be
>executed one row at a time, it cannot use indexing, etc 

Hi Joe,

Procedural code? What procedural code are you talking about???

The suggestion I posted (and that is, in my belief, subject of this
discussion) is as set-based as can be. Here's the relevant quote again:

>>Simply do a GROUP BY. Either with the CASE expression suggested by Tom,
>>or by FLOOR(DATEDIFF(minute, @basedate, TheDate) / 30). No need for an
>>extra table at all. And changing the interval is easy.

In case you have trouble understanding this plain-English description,
check the Jatinder Singh's message - he posted a complete repro based on
my suggestion.


Sadly, you didn't reply to my challenge to post the code you used to
verify that your table-of-intervals approach is faster than my GROUP BY
plus date/time calculations.

--
Hugo Kornelis, SQL Server MVP
Author
31 Jan 2006 2:32 AM
--CELKO--
>> Procedural code? What procedural code are you talking about? <<

The calls to functions like FLOOR(DATEDIFF(minute, @basedate, TheDate)
/ 30). as opposed to simple comparisons (=, <, >) on raw data types.
You cannot get an index on FLOOR(DATEDIFF(minute, @basedate, TheDate) /
30 easily.

>> Sadly, you didn't reply to my challenge to post the code you used to verify that your table-of-intervals approach is faster than my GROUP BYplus date/time calculations. <<

What we need is a neutral thrid party with a big enough machine to
scale to a large DB.  Remember when we did this with the code for
breaking up a CSV string with a single query versus a procedure?
Author
31 Jan 2006 2:59 AM
Alexander Kuznetsov
my 2 cents:

If there is a need to speed up the query, I would first of all create a
covering index on time, sales. If that is not good enough, I would
create a computed column, either with the CASE expression suggested by
Tom,
or by FLOOR(DATEDIFF(minute, @basedate, TheDate) / 30) (Joe: in MS SQl
Server world that's very easy, anybody could do that). Then I'd create
a covering index on that computed column and sales.

Makes sense?
Author
31 Jan 2006 3:23 AM
--CELKO--
>>  FLOOR(DATEDIFF(minute, @basedate, TheDate) / 30) (Joe: in MS SQL Server world that's very easy, anybody could do that). Then I'd create a covering index on that computed column and sales. Makes sense?  <<

It makes sense, but you still lose the indexing anfd have to do the
computations.  Maybe my clients have larger machines, so they can store
the "working tables" in the query.
Author
31 Jan 2006 5:43 AM
Mike Hodgson
--CELKO-- wrote:

>>> FLOOR(DATEDIFF(minute, @basedate, TheDate) / 30) (Joe: in MS SQL Server world that's very easy, anybody could do that). Then I'd create a covering index on that computed column and sales. Makes sense?  <<
>>>     
>>>
>It makes sense, but you still lose the indexing anfd have to do the
>computations. 
>
Not if you turn it into an indexed view.  Then it's essentially the same
as looking up a flat table (except for the extra server resources
required to keep the data in the indexed view up to date upon
modifications to the underlying base table(s)).  If you create an
appropriate index including the computed column then the optimiser will
consider that when compiling the query plan.

Author
31 Jan 2006 6:12 PM
Alexander Kuznetsov
> It makes sense, but you still lose the indexing anfd have to do the computations.

Wrong, in SQL Server as soon as an index on a computed column is
created, the computed values are stored in it.
BTW, I just took a real life table, added a computed column

alter table ***.***
add half_hour as (
FLOOR(DATEDIFF(minute, convert(datetime,'19990526',112), lmod_dt) / 30)
)

and created an index on it. The query

select half_hour,
sum(amount) from ***.***
group by half_hour

consistently uses 250% less CPU as compared to the query using the
covering index on (lmod_dt, amount)
Author
31 Jan 2006 11:11 PM
Hugo Kornelis
On 30 Jan 2006 18:32:26 -0800, --CELKO-- wrote:

>>> Procedural code? What procedural code are you talking about? <<
>
>The calls to functions like FLOOR(DATEDIFF(minute, @basedate, TheDate)
>/ 30). as opposed to simple comparisons (=, <, >) on raw data types.

Hi Joe,

You seem to have a different definition of procedural code than I have.

My definition is derived from meaning #2 of "procedure" in the
Marriam-Webster Online Dictionary:

2 a : a series of steps followed in a regular definite order <legal
procedure> <a surgical procedure> b : a series of instructions for a
computer that has a name by which it can be called into action

(http://www.m-w.com/dictionary/procedure)

The entry in Wikipedia appears to be somewhat biased
(http://en.wikipedia.org/wiki/Procedural_code)


If anything that goes beyond simple comparisons on raw data types is
procedural code, than almost all the code examples in SQL For Smarties
are procedural code.

>You cannot get an index on FLOOR(DATEDIFF(minute, @basedate, TheDate) /
>30 easily.

Others have already proven this statement wrong.

>
>>> Sadly, you didn't reply to my challenge to post the code you used to verify that your table-of-intervals approach is faster than my GROUP BYplus date/time calculations. <<
>
>What we need is a neutral thrid party with a big enough machine to
>scale to a large DB.

I take it, then, that you did NOT perform any test before claiming that
your table-of-intervals version would be faster than my GROUP BY
expression version? That you just posted a wild assumption without any
checking?

>  Remember when we did this with the code for
>breaking up a CSV string with a single query versus a procedure?

Actually, no. You must be confusing me with someone else.

--
Hugo Kornelis, SQL Server MVP
Author
1 Feb 2006 1:55 AM
William Stacey [MVP]
| If anything that goes beyond simple comparisons on raw data types is
| procedural code, than almost all the code examples in SQL For Smarties
| are procedural code.

Agreed.  SQL itself is nothing but a *lot of procedural code that
reads/writes from binary files.

--
William Stacey [MVP]
Author
29 Jan 2006 9:10 PM
Hugo Kornelis
On 28 Jan 2006 18:46:22 -0800, --CELKO-- wrote:

>>> 'd recommend pre-populating at least three years - that's three million rows in a
>table, just to get this report done. And if the manager decides that 5
>second intervals would be better still, it'd grow to 18,000,000 rows!
><<
>
>Yes, but since the SQL Server model of time has funny rounding when you
>super-precise,

Hi Joe,

Funny? I don't think so. The accuracy is one three-hundredth of a second
and values are rounded to increments of .000, .003, or .007 seconds.

> you will still have problems if you use a functinal
>approach instead of a table approach.

Could you post an example that will have problems in the functional
approach but not in the table approach?

>  DB2 goes to more decimal seconds
>more accurately because it came from mainframes and not
>Sybase/UNIX/16-bit minicomputers.

But since 5-second intervals already require an 18,000,000 row table to
do the report with the table approach, I'm not sure if more precision
should be labeled a good thing in the context of this discussion <g>.

>
>Nothing is perfect, but the table is faster (as well as portable) until
>you get down to that level.

The table is not portable either, since SQL Server's datetime data type
itself is not portable.

Faster? That would be surprising. I assume that you've done research
before posting this - could you post the script you used to test
performance of both versions?

--
Hugo Kornelis, SQL Server MVP
Author
30 Jan 2006 8:05 AM
Tony Rogerson
Don't hold your breath Hugo.

Remember celko seems to have a complete lack and understanding of present
day development and architecture and is stuck in 80's programming models.

Personally, if i wanted precision other than what DATETIME can give I'd now
do it using a CLR coded data type.

--
Tony Rogerson
SQL Server MVP
http://sqlserverfaq.com - free video tutorials


Show quote
"Hugo Kornelis" <h***@perFact.REMOVETHIS.info> wrote in message
news:4fbqt19lgimvhf96ndbftb9qinulp2l01o@4ax.com...
> On 28 Jan 2006 18:46:22 -0800, --CELKO-- wrote:
>
>>>> 'd recommend pre-populating at least three years - that's three million
>>>> rows in a
>>table, just to get this report done. And if the manager decides that 5
>>second intervals would be better still, it'd grow to 18,000,000 rows!
>><<
>>
>>Yes, but since the SQL Server model of time has funny rounding when you
>>super-precise,
>
> Hi Joe,
>
> Funny? I don't think so. The accuracy is one three-hundredth of a second
> and values are rounded to increments of .000, .003, or .007 seconds.
>
>> you will still have problems if you use a functinal
>>approach instead of a table approach.
>
> Could you post an example that will have problems in the functional
> approach but not in the table approach?
>
>>  DB2 goes to more decimal seconds
>>more accurately because it came from mainframes and not
>>Sybase/UNIX/16-bit minicomputers.
>
> But since 5-second intervals already require an 18,000,000 row table to
> do the report with the table approach, I'm not sure if more precision
> should be labeled a good thing in the context of this discussion <g>.
>
>>
>>Nothing is perfect, but the table is faster (as well as portable) until
>>you get down to that level.
>
> The table is not portable either, since SQL Server's datetime data type
> itself is not portable.
>
> Faster? That would be surprising. I assume that you've done research
> before posting this - could you post the script you used to test
> performance of both versions?
>
> --
> Hugo Kornelis, SQL Server MVP
Author
30 Jan 2006 11:38 AM
jsfromynr
Hi There,
Try this

DECLARE @t table(
    ModTime datetime,
    Sales int);
insert into @t values('1/2/2005 7:01:00', 110)
insert into @t values('1/2/2005 7:29:00', 200)
insert into @t values('1/2/2005 7:45:00', 220)
insert into @t values('1/2/2005 7:59:00', 300)


Select
Min(dateAdd(minute,30 * Floor(DateDiff(minute,0,Modtime) / 30),0 ))
MinGroupTime,
Max(dateAdd(minute,30 * Floor(DateDiff(minute,0,Modtime) / 30) + 29 ,0
)) MaxGroupTime,
--Min(modTime), Max(ModTime),
--Floor(DateDiff(minute,0,Modtime) / 30),
Sum(Sales) From @t T1
--, (Select Min(ModTime) as BaseDate From @t) T2
Group By Floor(DateDiff(minute,0,Modtime) / 30)

This approach is very easy as pointed out by Hugo.
With Warm regards
Jatinder Singh
Author
28 Jan 2006 2:13 AM
William Stacey [MVP]
I just wanted to see how I would do this using my TDateTime UDT.  I think
this does it:

DECLARE @t table(
    ModTime datetime,
    Sales int);
insert into @t values('1/2/2005 7:01:00', 110)
insert into @t values('1/2/2005 7:30:00', 200)
insert into @t values('1/2/2005 7:45:00', 220)
insert into @t values('1/2/2005 7:59:00', 300)

select min(TDateTime::FromSqlDateTime(ModTime).AddMinutes(-datepart(mi,
ModTime)).AddMinutes( datepart(mi, ModTime)/30*30 )).SqlDateTime, sum(Sales)
from @t
group by TDateTime::FromSqlDateTime(ModTime).AddMinutes(-datepart(mi,
ModTime)).AddMinutes( datepart(mi, ModTime)/30*30 )

Output
--------------------------------
2005-01-02 07:00:00.000 110
2005-01-02 07:30:00.000 720
--
William Stacey [MVP]

Show quote
"Edgard L. Riba" <elriba at rimith dot com> wrote in message
news:eL8krx4IGHA.3064@TK2MSFTNGP10.phx.gbl...
| Hi,
|
| In my table I have a datetime field (called modtime), and I need to build
a
| select statement that groups by half-hour increments, so that it looks
| something like:
|
| modtime             SUM(sales)
| 7:00 - 7:30         9999.99
| 7:30 - 8:00         9999.99
| 8:00 - 8:30         9999.99
|
| etc....
|
| Can something like this be done?
|
| Thanks,
| Edgard L. Riba
|
|

AddThis Social Bookmark Button