|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
smalldatetime or integerI have to create some statistics based on datetime entry.
In my table I have datetime column, but all my entries for statistic are day base(I don't need hours, seconds or minutes). So, I decided to create another column where I'll put just date without time. I have 2 options: use smalldatetime column with hours, minutes and second=0 or create integer column : YYYYMMDD What is better performance? Both columns are 4 bytes. regards,S Option 1: a datetime column (you can use smalldatetime)
A few reasons: automatic validation of the data easier use of temporal functions Choose a datatype that actually represents what the data is. Besides, if you use an INT, your application will crash in the year 214749. Show quote "simonZ" <simon.zu***@studio-moderna.com> wrote in message news:%23NykIgoTGHA.792@TK2MSFTNGP10.phx.gbl... >I have to create some statistics based on datetime entry. > In my table I have datetime column, but all my entries for statistic are > day base(I don't need hours, seconds or minutes). > > So, I decided to create another column where I'll put just date without > time. > > I have 2 options: use smalldatetime column with hours, minutes and > second=0 > or create integer column : YYYYMMDD > > What is better performance? Both columns are 4 bytes. > > regards,S > > > If you're going to be doing any date range queries at all, then
SMALLDATETIME is definitely your best bet. In fact, I would say that using the datatype that actually represents the data is almost always your best bet (except in cases where, say, you need to store nanoseconds). What is better performance? Who knows. We don't know enough about your hardware, network, concurrency, data structure, size of the table, type of activity, etc. etc. Show quote "simonZ" <simon.zu***@studio-moderna.com> wrote in message news:%23NykIgoTGHA.792@TK2MSFTNGP10.phx.gbl... >I have to create some statistics based on datetime entry. > In my table I have datetime column, but all my entries for statistic are > day base(I don't need hours, seconds or minutes). > > So, I decided to create another column where I'll put just date without > time. > > I have 2 options: use smalldatetime column with hours, minutes and > second=0 > or create integer column : YYYYMMDD > > What is better performance? Both columns are 4 bytes. > > regards,S > > > Let's put the question of performance aside for a moment.
How are you planning on making sure all yyyymmdd values are valid? (e.g.: the number 20064078 is a valid integer, but not a valid date (not on Earth, anyway)). Why don't you store datetime values simply as yyyy-mm-dd 00:00:00.000? ML --- http://milambda.blogspot.com/ As an alternative to the 8 byte [datetime] datatype, there is the 4 byte
[smalldatetime]. There are other alternatives encoding the date: http://groups.google.com/group/microsoft.public.sqlserver.programming/browse_thread/thread/437cbf87aebdd6a0/21dd771823ff4cb0 One solution is to use a date dimension table and store each date an an integer based key. If the range of dates is sparse, for example data is only imported periodically, then date_id could be a 2 byte [smallint]. This is technique is typically used in OLAP cubes, but can also be implemented for relational databases as well when you want to conserve storage space. [dim_date_imported] import_date_id import_date 200 3/22/2006 201 3/28/2006 202 4/3/2006 .... ... Rather than storing a meta data row for each date, the date_id values could even be an offset from a starting date. For example, let's assume that your system starts collecting data on Jan 1, 2005. The date_id for a row imported on Mar 22, 2006 would be calculated as datediff(day,'Jan 1, 2005','Mar 22, 2006') or 445. A 2 byte [smallint] would store a range of up to 65,534 days or 179.54 years. The dateadd() function can be used to convert date_id to datetime format. Show quote "simonZ" <simon.zu***@studio-moderna.com> wrote in message news:%23NykIgoTGHA.792@TK2MSFTNGP10.phx.gbl... >I have to create some statistics based on datetime entry. > In my table I have datetime column, but all my entries for statistic are > day base(I don't need hours, seconds or minutes). > > So, I decided to create another column where I'll put just date without > time. > > I have 2 options: use smalldatetime column with hours, minutes and > second=0 > or create integer column : YYYYMMDD > > What is better performance? Both columns are 4 bytes. > > regards,S > > > Thank you JT,
that is exactly what I wan't to hear :) When the performance is crytical it doesn't make any sence to store date as datetime value when all I need is just the date, not time. Automatic validation of the data is not necessary because I create this column from datetime column and I need this column only for search purpose. Usual practice is to convert this date to some integer and then have integer search in my queries. But idea with date_id is great, it gets only 2byte of data and I suppose performance would be better. I have also indexed view with clustered index on that column. Thanks, Simon Show quote "JT" <some***@microsoft.com> wrote in message news:%23aUYC8oTGHA.4384@tk2msftngp13.phx.gbl... > As an alternative to the 8 byte [datetime] datatype, there is the 4 byte > [smalldatetime]. > > There are other alternatives encoding the date: > http://groups.google.com/group/microsoft.public.sqlserver.programming/browse_thread/thread/437cbf87aebdd6a0/21dd771823ff4cb0 > > One solution is to use a date dimension table and store each date an an > integer based key. If the range of dates is sparse, for example data is > only imported periodically, then date_id could be a 2 byte [smallint]. > This is technique is typically used in OLAP cubes, but can also be > implemented for relational databases as well when you want to conserve > storage space. > > [dim_date_imported] > import_date_id import_date > 200 3/22/2006 > 201 3/28/2006 > 202 4/3/2006 > ... ... > > Rather than storing a meta data row for each date, the date_id values > could even be an offset from a starting date. For example, let's assume > that your system starts collecting data on Jan 1, 2005. The date_id for a > row imported on Mar 22, 2006 would be calculated as datediff(day,'Jan 1, > 2005','Mar 22, 2006') or 445. A 2 byte [smallint] would store a range of > up to 65,534 days or 179.54 years. The dateadd() function can be used to > convert date_id to datetime format. > > > "simonZ" <simon.zu***@studio-moderna.com> wrote in message > news:%23NykIgoTGHA.792@TK2MSFTNGP10.phx.gbl... >>I have to create some statistics based on datetime entry. >> In my table I have datetime column, but all my entries for statistic are >> day base(I don't need hours, seconds or minutes). >> >> So, I decided to create another column where I'll put just date without >> time. >> >> I have 2 options: use smalldatetime column with hours, minutes and >> second=0 >> or create integer column : YYYYMMDD >> >> What is better performance? Both columns are 4 bytes. >> >> regards,S >> >> >> > > simonZ wrote:
> Thank you JT, Which doesn't necessarily mean that it's the right thing to do.> > that is exactly what I wan't to hear :) > When the performance is crytical it doesn't make any sence to store date as Not so fast. Remember that inserts will definitely become more complex: > datetime value when all I need is just the date, not time. > Automatic validation of the data is not necessary because I create this > column from datetime column and I need this column only for search purpose. > Usual practice is to convert this date to some integer and then have integer > search in my queries. > But idea with date_id is great, it gets only 2byte of data and I suppose > performance would be better. you'll have to do a lookup in the date table and possibly an insert to get a new id then you'll have to insert your records in the fact table. I don't see how you can do that with a simple bulk load, so you would need DTS, store your data in a temp table and convert it with a SP or something else (can this be solved with triggers?). Queries will involve joining those two tables which might account for increased tempdb usage etc. Cheers robert PS: Please avoid top posting. Thank you. I agree with Robert... Use the smalldatetime, force the time to midnight..
Unless you are doing a full on data warehouse, you'll be better off in the long run than doing the extra layer of indirection. -- Show quoteWayne Snyder MCDBA, SQL Server MVP Mariner, Charlotte, NC I support the Professional Association for SQL Server ( PASS) and it''s community of SQL Professionals. "Robert Klemme" wrote: > simonZ wrote: > > Thank you JT, > > > > that is exactly what I wan't to hear :) > > Which doesn't necessarily mean that it's the right thing to do. > > > When the performance is crytical it doesn't make any sence to store date as > > datetime value when all I need is just the date, not time. > > Automatic validation of the data is not necessary because I create this > > column from datetime column and I need this column only for search purpose. > > Usual practice is to convert this date to some integer and then have integer > > search in my queries. > > But idea with date_id is great, it gets only 2byte of data and I suppose > > performance would be better. > > Not so fast. Remember that inserts will definitely become more complex: > you'll have to do a lookup in the date table and possibly an insert to > get a new id then you'll have to insert your records in the fact table. > I don't see how you can do that with a simple bulk load, so you would > need DTS, store your data in a temp table and convert it with a SP or > something else (can this be solved with triggers?). > > Queries will involve joining those two tables which might account for > increased tempdb usage etc. > > Cheers > > robert > > > PS: Please avoid top posting. Thank you. > Wayne Snyder wrote:
> I agree with Robert... Use the smalldatetime, force the time to midnight.. "Premature optimization" is another term that comes to mind...> Unless you are doing a full on data warehouse, you'll be better off in the > long run than doing the extra layer of indirection. robert We don't know how much data he is dealing with or where in the product
lifecycle he is, so we can't really say the optimization is premature. He just asked which option for storing the date would be better for performance. Show quote "Robert Klemme" <bob.n***@gmx.net> wrote in message news:%23lJE481TGHA.2276@tk2msftngp13.phx.gbl... > Wayne Snyder wrote: >> I agree with Robert... Use the smalldatetime, force the time to >> midnight.. Unless you are doing a full on data warehouse, you'll be >> better off in the long run than doing the extra layer of indirection. > > "Premature optimization" is another term that comes to mind... > > robert I described two different methods of encoding a date as an integer. Using
the date offset method below, there is no need for a lookup table for either the insert or the select. For example, to insert a date_id value for Mar 23, 2006 : insert into MyTable (date_id) values (datediff(day,'Jan 1, 2005','Mar 23, 2006')) Selecting a result for the date range Feb 1, 2006 - Feb 28, 2006 would actually require less overhead than scanning on a datetime column: select @startdate = datediff(day,'Jan 1, 2005','Feb 1, 2006') select @enddate = datediff(day,'Jan 1, 2005','Feb 28, 2006') select * from MyTable where date_id between @startdate and @enddate Show quote > Rather than storing a meta data row for each date, the date_id values > could even be an offset from a starting date. For example, let's assume > that your system starts collecting data on Jan 1, 2005. The date_id for a > row imported on Mar 22, 2006 would be calculated as datediff(day,'Jan 1, > 2005','Mar 22, 2006') or 445. A 2 byte [smallint] would store a range of > up to 65,534 days or 179.54 years. The dateadd() function can be used to > convert date_id to datetime format. "Robert Klemme" <bob.n***@gmx.net> wrote in message news:%23iEBGwyTGHA.4772@TK2MSFTNGP09.phx.gbl... > simonZ wrote: >> Thank you JT, >> >> that is exactly what I wan't to hear :) > > Which doesn't necessarily mean that it's the right thing to do. > >> When the performance is crytical it doesn't make any sence to store date >> as datetime value when all I need is just the date, not time. >> Automatic validation of the data is not necessary because I create this >> column from datetime column and I need this column only for search >> purpose. >> Usual practice is to convert this date to some integer and then have >> integer search in my queries. >> But idea with date_id is great, it gets only 2byte of data and I suppose >> performance would be better. > > Not so fast. Remember that inserts will definitely become more complex: > you'll have to do a lookup in the date table and possibly an insert to get > a new id then you'll have to insert your records in the fact table. I > don't see how you can do that with a simple bulk load, so you would need > DTS, store your data in a temp table and convert it with a SP or something > else (can this be solved with triggers?). > > Queries will involve joining those two tables which might account for > increased tempdb usage etc. > > Cheers > > robert > > > PS: Please avoid top posting. Thank you. JT wrote:
> I described two different methods of encoding a date as an integer. Using Yes, of course. But Simon seemed to want to go with the other approach:> the date offset method below, there is no need for a lookup table for either > the insert or the select. >>> But idea with date_id is great, it gets only 2byte of data and I suppose robert>>> performance would be better. |
|||||||||||||||||||||||