Home All Groups Group Topic Archive Search About

smalldatetime or integer

Author
23 Mar 2006 2:42 PM
simonZ
I have to create some statistics based on datetime entry.
In my table I have datetime column, but all my entries for statistic are day
base(I don't need hours, seconds or minutes).

So, I decided to create another column where I'll put just date without
time.

I have 2 options: use smalldatetime column with hours, minutes and second=0
or create integer column : YYYYMMDD

What is better performance? Both columns are 4 bytes.

regards,S

Author
23 Mar 2006 3:05 PM
Raymond D'Anjou
Option 1: a datetime column (you can use smalldatetime)
A few reasons:
automatic validation of the data
easier use of temporal functions

Choose a datatype that actually represents what the data is.
Besides, if you use an INT, your application will crash in the year 214749.

Show quote
"simonZ" <simon.zu***@studio-moderna.com> wrote in message
news:%23NykIgoTGHA.792@TK2MSFTNGP10.phx.gbl...
>I have to create some statistics based on datetime entry.
> In my table I have datetime column, but all my entries for statistic are
> day base(I don't need hours, seconds or minutes).
>
> So, I decided to create another column where I'll put just date without
> time.
>
> I have 2 options: use smalldatetime column with hours, minutes and
> second=0
> or create integer column : YYYYMMDD
>
> What is better performance? Both columns are 4 bytes.
>
> regards,S
>
>
>
Author
23 Mar 2006 3:15 PM
Aaron Bertrand [SQL Server MVP]
If you're going to be doing any date range queries at all, then
SMALLDATETIME is definitely your best bet.

In fact, I would say that using the datatype that actually represents the
data is almost always your best bet (except in cases where, say, you need to
store nanoseconds).

What is better performance?  Who knows.  We don't know enough about your
hardware, network, concurrency, data structure, size of the table, type of
activity, etc. etc.




Show quote
"simonZ" <simon.zu***@studio-moderna.com> wrote in message
news:%23NykIgoTGHA.792@TK2MSFTNGP10.phx.gbl...
>I have to create some statistics based on datetime entry.
> In my table I have datetime column, but all my entries for statistic are
> day base(I don't need hours, seconds or minutes).
>
> So, I decided to create another column where I'll put just date without
> time.
>
> I have 2 options: use smalldatetime column with hours, minutes and
> second=0
> or create integer column : YYYYMMDD
>
> What is better performance? Both columns are 4 bytes.
>
> regards,S
>
>
>
Author
23 Mar 2006 3:18 PM
ML
Let's put the question of performance aside for a moment.

How are you planning on making sure all yyyymmdd values are valid? (e.g.:
the number 20064078 is a valid integer, but not a valid date (not on Earth,
anyway)).

Why don't you store datetime values simply as yyyy-mm-dd 00:00:00.000?


ML

---
http://milambda.blogspot.com/
Author
23 Mar 2006 3:28 PM
JT
As an alternative to the 8 byte [datetime] datatype, there is the 4 byte
[smalldatetime].

There are other alternatives encoding the date:
http://groups.google.com/group/microsoft.public.sqlserver.programming/browse_thread/thread/437cbf87aebdd6a0/21dd771823ff4cb0

One solution is to use a date dimension table and store each date an an
integer based key. If the range of dates is sparse, for example data is only
imported periodically, then date_id could be a 2 byte [smallint]. This is
technique is typically used in OLAP cubes, but can also be implemented for
relational databases as well when you want to conserve storage space.

[dim_date_imported]
import_date_id    import_date
200    3/22/2006
201    3/28/2006
202    4/3/2006
....    ...

Rather than storing a meta data row for each date, the date_id values could
even be an offset from a starting date. For example, let's assume that your
system starts collecting data on Jan 1, 2005. The date_id for a row imported
on Mar 22, 2006 would be calculated as datediff(day,'Jan 1, 2005','Mar 22,
2006') or 445. A 2 byte [smallint] would store a range of up to 65,534 days
or 179.54 years. The dateadd() function can be used to convert date_id to
datetime format.


Show quote
"simonZ" <simon.zu***@studio-moderna.com> wrote in message
news:%23NykIgoTGHA.792@TK2MSFTNGP10.phx.gbl...
>I have to create some statistics based on datetime entry.
> In my table I have datetime column, but all my entries for statistic are
> day base(I don't need hours, seconds or minutes).
>
> So, I decided to create another column where I'll put just date without
> time.
>
> I have 2 options: use smalldatetime column with hours, minutes and
> second=0
> or create integer column : YYYYMMDD
>
> What is better performance? Both columns are 4 bytes.
>
> regards,S
>
>
>
Author
24 Mar 2006 8:14 AM
simonZ
Thank you JT,

that is exactly what I wan't to hear :)

When the performance is crytical it doesn't make any sence to store date as
datetime value when all I need is just the date, not time.
Automatic validation of the data is not necessary because I create this
column from datetime column and I need this column only for search purpose.
Usual practice is to convert this date to some integer and then have integer
search in my queries.
But idea with date_id is great, it gets only 2byte of data and I suppose
performance would be better.

I have also indexed view with clustered index on that column.

Thanks,
Simon


Show quote
"JT" <some***@microsoft.com> wrote in message
news:%23aUYC8oTGHA.4384@tk2msftngp13.phx.gbl...
> As an alternative to the 8 byte [datetime] datatype, there is the 4 byte
> [smalldatetime].
>
> There are other alternatives encoding the date:
> http://groups.google.com/group/microsoft.public.sqlserver.programming/browse_thread/thread/437cbf87aebdd6a0/21dd771823ff4cb0
>
> One solution is to use a date dimension table and store each date an an
> integer based key. If the range of dates is sparse, for example data is
> only imported periodically, then date_id could be a 2 byte [smallint].
> This is technique is typically used in OLAP cubes, but can also be
> implemented for relational databases as well when you want to conserve
> storage space.
>
> [dim_date_imported]
> import_date_id    import_date
> 200    3/22/2006
> 201    3/28/2006
> 202    4/3/2006
> ...    ...
>
> Rather than storing a meta data row for each date, the date_id values
> could even be an offset from a starting date. For example, let's assume
> that your system starts collecting data on Jan 1, 2005. The date_id for a
> row imported on Mar 22, 2006 would be calculated as datediff(day,'Jan 1,
> 2005','Mar 22, 2006') or 445. A 2 byte [smallint] would store a range of
> up to 65,534 days or 179.54 years. The dateadd() function can be used to
> convert date_id to datetime format.
>
>
> "simonZ" <simon.zu***@studio-moderna.com> wrote in message
> news:%23NykIgoTGHA.792@TK2MSFTNGP10.phx.gbl...
>>I have to create some statistics based on datetime entry.
>> In my table I have datetime column, but all my entries for statistic are
>> day base(I don't need hours, seconds or minutes).
>>
>> So, I decided to create another column where I'll put just date without
>> time.
>>
>> I have 2 options: use smalldatetime column with hours, minutes and
>> second=0
>> or create integer column : YYYYMMDD
>>
>> What is better performance? Both columns are 4 bytes.
>>
>> regards,S
>>
>>
>>
>
>
Author
24 Mar 2006 10:16 AM
Robert Klemme
simonZ wrote:
> Thank you JT,
>
> that is exactly what I wan't to hear :)

Which doesn't necessarily mean that it's the right thing to do.

> When the performance is crytical it doesn't make any sence to store date as
> datetime value when all I need is just the date, not time.
> Automatic validation of the data is not necessary because I create this
> column from datetime column and I need this column only for search purpose.
> Usual practice is to convert this date to some integer and then have integer
> search in my queries.
> But idea with date_id is great, it gets only 2byte of data and I suppose
> performance would be better.

Not so fast.  Remember that inserts will definitely become more complex:
you'll have to do a lookup in the date table and possibly an insert to
get a new id then you'll have to insert your records in the fact table.
I don't see how you can do that with a simple bulk load, so you would
need DTS, store your data in a temp table and convert it with a SP or
something else (can this be solved with triggers?).

Queries will involve joining those two tables which might account for
increased tempdb usage etc.

Cheers

    robert


PS: Please avoid top posting.  Thank you.
Author
24 Mar 2006 1:40 PM
Wayne Snyder
I agree with Robert... Use the smalldatetime, force the time to midnight..
Unless you are doing a full on data warehouse, you'll be better off in the
long run than doing the extra layer of indirection.
--
Wayne Snyder MCDBA, SQL Server MVP
Mariner, Charlotte, NC

I support the Professional Association for SQL Server ( PASS) and it''s
community of SQL Professionals.


Show quote
"Robert Klemme" wrote:

> simonZ wrote:
> > Thank you JT,
> >
> > that is exactly what I wan't to hear :)
>
> Which doesn't necessarily mean that it's the right thing to do.
>
> > When the performance is crytical it doesn't make any sence to store date as
> > datetime value when all I need is just the date, not time.
> > Automatic validation of the data is not necessary because I create this
> > column from datetime column and I need this column only for search purpose.
> > Usual practice is to convert this date to some integer and then have integer
> > search in my queries.
> > But idea with date_id is great, it gets only 2byte of data and I suppose
> > performance would be better.
>
> Not so fast.  Remember that inserts will definitely become more complex:
> you'll have to do a lookup in the date table and possibly an insert to
> get a new id then you'll have to insert your records in the fact table.
> I don't see how you can do that with a simple bulk load, so you would
> need DTS, store your data in a temp table and convert it with a SP or
> something else (can this be solved with triggers?).
>
> Queries will involve joining those two tables which might account for
> increased tempdb usage etc.
>
> Cheers
>
>     robert
>
>
> PS: Please avoid top posting.  Thank you.
>
Author
24 Mar 2006 4:22 PM
Robert Klemme
Wayne Snyder wrote:
> I agree with Robert... Use the smalldatetime, force the time to midnight..
> Unless you are doing a full on data warehouse, you'll be better off in the
> long run than doing the extra layer of indirection.

"Premature optimization" is another term that comes to mind...

    robert
Author
24 Mar 2006 4:31 PM
JT
We don't know how much data he is dealing with or where in the product
lifecycle he is, so we can't really say the optimization is premature. He
just asked which option for storing the date would be better for
performance.

Show quote
"Robert Klemme" <bob.n***@gmx.net> wrote in message
news:%23lJE481TGHA.2276@tk2msftngp13.phx.gbl...
> Wayne Snyder wrote:
>> I agree with Robert... Use the smalldatetime, force the time to
>> midnight.. Unless you are doing a full on data warehouse, you'll be
>> better off in the long run than doing the extra layer of indirection.
>
> "Premature optimization" is another term that comes to mind...
>
> robert
Author
24 Mar 2006 2:05 PM
JT
I described two different methods of encoding a date as an integer. Using
the date offset method below, there is no need for a lookup table for either
the insert or the select. For example, to insert a date_id value for Mar 23,
2006 :

insert into MyTable (date_id) values (datediff(day,'Jan 1, 2005','Mar 23,
2006'))

Selecting a result for the date range Feb 1, 2006 - Feb 28, 2006 would
actually require less overhead than scanning on a datetime column:

select @startdate = datediff(day,'Jan 1, 2005','Feb 1, 2006')
select @enddate = datediff(day,'Jan 1, 2005','Feb 28, 2006')
select * from MyTable where date_id between @startdate and @enddate


Show quote
> Rather than storing a meta data row for each date, the date_id values
> could even be an offset from a starting date. For example, let's assume
> that your system starts collecting data on Jan 1, 2005. The date_id for a
> row imported on Mar 22, 2006 would be calculated as datediff(day,'Jan 1,
> 2005','Mar 22, 2006') or 445. A 2 byte [smallint] would store a range of
> up to 65,534 days or 179.54 years. The dateadd() function can be used to
> convert date_id to datetime format.


"Robert Klemme" <bob.n***@gmx.net> wrote in message
news:%23iEBGwyTGHA.4772@TK2MSFTNGP09.phx.gbl...
> simonZ wrote:
>> Thank you JT,
>>
>> that is exactly what I wan't to hear :)
>
> Which doesn't necessarily mean that it's the right thing to do.
>
>> When the performance is crytical it doesn't make any sence to store date
>> as datetime value when all I need is just the date, not time.
>> Automatic validation of the data is not necessary because I create this
>> column from datetime column and I need this column only for search
>> purpose.
>> Usual practice is to convert this date to some integer and then have
>> integer search in my queries.
>> But idea with date_id is great, it gets only 2byte of data and I suppose
>> performance would be better.
>
> Not so fast.  Remember that inserts will definitely become more complex:
> you'll have to do a lookup in the date table and possibly an insert to get
> a new id then you'll have to insert your records in the fact table. I
> don't see how you can do that with a simple bulk load, so you would need
> DTS, store your data in a temp table and convert it with a SP or something
> else (can this be solved with triggers?).
>
> Queries will involve joining those two tables which might account for
> increased tempdb usage etc.
>
> Cheers
>
> robert
>
>
> PS: Please avoid top posting.  Thank you.
Author
24 Mar 2006 4:24 PM
Robert Klemme
JT wrote:
> I described two different methods of encoding a date as an integer. Using
> the date offset method below, there is no need for a lookup table for either
> the insert or the select.

Yes, of course.  But Simon seemed to want to go with the other approach:

>>> But idea with date_id is great, it gets only 2byte of data and I suppose
>>> performance would be better.

    robert

AddThis Social Bookmark Button