|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Best Data Type for a Tracking ID?We're in the table design process and could really use outside thoughts on our options. The table in question will grow to several million records. This table will update and insert depending on what the tracking ID does as it lives out it day long life span. Example Stored Proc: UPDATE tbTracking SET myCount = myCount +1 WHERE trackingID = @trackingID AND productID = @productID IF (@@rowcount = 0) BEGIN INSERT INTO tbTracking (...) VALUES (...) ; END Everyday at midnight the table will be truncate just afer many other queries generated reports and store cumulative data into other tables. Our goal is extreme performance, with that in mind what is the best data type for trackingID and productID? We have the luxury to make them whatever is best, as this is a completely new project. Internally debated NVARCHR(50), INT, Binary, UNIQUEIDENTIFIER and the like, but we're in need of advise from those with more experience with these matters. Thank you for your time. Mark Mark S. wrote:
Show quote > Hello, This doesn't make much sense to me. As described it isn't clear whether> > We're in the table design process and could really use outside thoughts on > our options. > > The table in question will grow to several million records. This table will > update and insert depending on what the tracking ID does as it lives out it > day long life span. > > Example Stored Proc: > UPDATE tbTracking > SET myCount = myCount +1 > WHERE trackingID = @trackingID AND productID = @productID > IF (@@rowcount = 0) > BEGIN > INSERT INTO tbTracking (...) VALUES (...) ; > END > > Everyday at midnight the table will be truncate just afer many other queries > generated reports and store cumulative data into other tables. > > Our goal is extreme performance, with that in mind what is the best data > type for trackingID and productID? We have the luxury to make them whatever > is best, as this is a completely new project. > > Internally debated NVARCHR(50), INT, Binary, UNIQUEIDENTIFIER and the like, > but we're in need of advise from those with more experience with these > matters. > > Thank you for your time. > > Mark trackingid is a surrogate key or not. If it isn't then I don't understand what its purpose is. What is the business meaning of trackingid? Why isn't the datatype predetermined? What key or keys exist in this table? Maybe we are talking surrogate keys here, in which case see: http://www.aspfaq.com/show.asp?id=2504 > Everyday at midnight the table will be truncate just afer many other queries A guiding in principle in data warehouse applications is usually to> generated reports and store cumulative data into other tables. capture the data at the finest possible level of granularity. I expect you've considered this, but I just thought it worth restating for the benefit of all. -- David Portas SQL Server MVP -- Who or what determines the value of TrackingID?
If this key is to uniquely identify an event (such as a support call), then perhaps (8 byte) datetime would be the logical choice. It has inherent meaning, and unless you have hundreds of support calls coming in per minute, it is unlikely that this key value would be duplicated, and retry logic on the insert procedure or statement can handle the unlikely event that does. There may even be a need to purge the tbTracking table based on < a specific date/time rather than truncating the entire table at the end of the day. As for ProductID, this should be an existing attribute in your database. Show quote "Mark S." <ma***@yahoo.com> wrote in message news:Ow6CKQ9CGHA.4004@tk2msftngp13.phx.gbl... > Hello, > > We're in the table design process and could really use outside thoughts on > our options. > > The table in question will grow to several million records. This table > will update and insert depending on what the tracking ID does as it lives > out it day long life span. > > Example Stored Proc: > UPDATE tbTracking > SET myCount = myCount +1 > WHERE trackingID = @trackingID AND productID = @productID > IF (@@rowcount = 0) > BEGIN > INSERT INTO tbTracking (...) VALUES (...) ; > END > > Everyday at midnight the table will be truncate just afer many other > queries generated reports and store cumulative data into other tables. > > Our goal is extreme performance, with that in mind what is the best data > type for trackingID and productID? We have the luxury to make them > whatever is best, as this is a completely new project. > > Internally debated NVARCHR(50), INT, Binary, UNIQUEIDENTIFIER and the > like, but we're in need of advise from those with more experience with > these matters. > > Thank you for your time. > > Mark > > In my experiences, int data type is the best performer especially when it
comes to indexing, joins, etc. (This comment relates to 32-bit O/S and SQL Server 2000.) I would make a meaningless key: TrackingRowId int, identity, primary key, clustered ProductRowId int, identity, primary key, clustered SQL Server takes care of assigning the next number. The clustered index will always insert data at the end of the data pages which avoids page splits. If you want to reset the tracking id each midnight, simply change the reseed the tables as part of your processing. Personally... I would NOT use a character string for a key because its slower to compare strings. I would NOT use a UNIQUEIDENTIFIER for a key because it is a pain in the butt when you have to write a manual query to pull data. Finally, I have to reiterate David's point of capturing data at the lowest level. Are you sure you want to summarize and truncate on a nightly basis. What if a bug is discovered? You won't have the original data to recreate the totals. What if a new way of summarizing is added? You won't have the original data to drill into or summarize. And, to reiterate JT's point of having a datetime column that can be very handy. It is debateable whether to use it as the key or not. If you're truncating the tables each night, it may be a waste. Just my two cents, Joe Show quote "Mark S." wrote: > Hello, > > We're in the table design process and could really use outside thoughts on > our options. > > The table in question will grow to several million records. This table will > update and insert depending on what the tracking ID does as it lives out it > day long life span. > > Example Stored Proc: > UPDATE tbTracking > SET myCount = myCount +1 > WHERE trackingID = @trackingID AND productID = @productID > IF (@@rowcount = 0) > BEGIN > INSERT INTO tbTracking (...) VALUES (...) ; > END > > Everyday at midnight the table will be truncate just afer many other queries > generated reports and store cumulative data into other tables. > > Our goal is extreme performance, with that in mind what is the best data > type for trackingID and productID? We have the luxury to make them whatever > is best, as this is a completely new project. > > Internally debated NVARCHR(50), INT, Binary, UNIQUEIDENTIFIER and the like, > but we're in need of advise from those with more experience with these > matters. > > Thank you for your time. > > Mark > > > Gentleman thank you for your feedback.
Joe, other than the UniqueIndentifier being unfriendly for humans, do you find it faster than INTs? As to th other questions and caveats, all that has been considered previously, and wasn't mentioned in my question in the interests of brevity and an attempt to focus my question on a single point. Regardless, thank you for your full consideration. Cheers, Mark Show quote "Joe from WI" <Joefro***@discussions.microsoft.com> wrote in message news:50A0324B-1446-4044-A850-706B59A04169@microsoft.com... > In my experiences, int data type is the best performer especially when it > comes to indexing, joins, etc. (This comment relates to 32-bit O/S and > SQL > Server 2000.) > > I would make a meaningless key: > TrackingRowId int, identity, primary key, clustered > ProductRowId int, identity, primary key, clustered > > SQL Server takes care of assigning the next number. The clustered index > will always insert data at the end of the data pages which avoids page > splits. > > If you want to reset the tracking id each midnight, simply change the > reseed > the tables as part of your processing. > > Personally... > I would NOT use a character string for a key because its slower to compare > strings. > I would NOT use a UNIQUEIDENTIFIER for a key because it is a pain in the > butt when you have to write a manual query to pull data. > > Finally, I have to reiterate David's point of capturing data at the lowest > level. Are you sure you want to summarize and truncate on a nightly > basis. > What if a bug is discovered? You won't have the original data to recreate > the totals. What if a new way of summarizing is added? You won't have > the > original data to drill into or summarize. > > And, to reiterate JT's point of having a datetime column that can be very > handy. It is debateable whether to use it as the key or not. If you're > truncating the tables each night, it may be a waste. > > Just my two cents, > Joe > > "Mark S." wrote: > >> Hello, >> >> We're in the table design process and could really use outside thoughts >> on >> our options. >> >> The table in question will grow to several million records. This table >> will >> update and insert depending on what the tracking ID does as it lives out >> it >> day long life span. >> >> Example Stored Proc: >> UPDATE tbTracking >> SET myCount = myCount +1 >> WHERE trackingID = @trackingID AND productID = @productID >> IF (@@rowcount = 0) >> BEGIN >> INSERT INTO tbTracking (...) VALUES (...) ; >> END >> >> Everyday at midnight the table will be truncate just afer many other >> queries >> generated reports and store cumulative data into other tables. >> >> Our goal is extreme performance, with that in mind what is the best data >> type for trackingID and productID? We have the luxury to make them >> whatever >> is best, as this is a completely new project. >> >> Internally debated NVARCHR(50), INT, Binary, UNIQUEIDENTIFIER and the >> like, >> but we're in need of advise from those with more experience with these >> matters. >> >> Thank you for your time. >> >> Mark >> >> >> The last line of the article David suggested
http://www.aspfaq.com/show.asp?id=2504 Says alot about GUI() not being optimized, if anyone disagrees, feel free to speak up: "the wider datatype leads to a drop in index performance (if clustered, each insert almost guaranteed to 'dirty' a different page), and an increase in storage requirements; " and five other cons. Thank you. To maximize INSERT performance: use INT IDENTITY(1, 1) PRIMARY KEY CLUSTERED
WITH FILLFACTOR = 100 This causes every new row to be added at the end of the table--minimizing index maintenance and eliminating page splits. Use caution, however: since SQL Server doesn't automatically reorganize indexes, a high volume of DELETE activity will cause the index to become sparse, which can reduce SELECT performance, but it will not affect INSERT performance. The INT datatype matches the word size of most Intel processors (32-bit), so comparisons require fewer CPU cycles. The stored procedure is an example of what not to do. It's a recipie for primary key constraint violations. There's nothing that prevents two transactions from trying to INSERT the same information at the same time. Most of the time what will happen is that one connection will succeed with the INSERT and the other will UPDATE the newly inserted row, but a collision will occur if identical UPDATE statements occur simultaneously on two unbound connections followed by (since the row doesn't yet exist) simultaneous identical INSERT statements. The correct method is to use something like this: BEGIN TRAN IF EXISTS (SELECT WITH(UPDLOCK, HOLDLOCK)) UPDATE ELSE INSERT COMMIT Some people use the following instead, but I prefer the above method since it is easier to read and understand BEGIN TRAN INSERT...SELECT...WHERE NOT EXISTS(SELECT WITH(UPDLOCK, HOLDLOCK)) IF @@ROWCOUNT = 0 UPDATE COMMIT Note that there isn't any marked reduction in performance or concurrency between this and your sample, because UPDLOCK doesn't block SELECTs, and any blocking that does occur is necessary to maintain integrity. Without the EXISTS clause, the INSERT or UPDATE will be applying an exclusive lock anyway which involves reading the index page into memory. With the EXISTS clause, the SELECT reads the index page and applies an update lock, and the INSERT or UPDATE simply transition from an update lock to an exclusive lock in memory--no additional physical read is necessary. Show quote "Mark S." <ma***@yahoo.com> wrote in message news:Ow6CKQ9CGHA.4004@tk2msftngp13.phx.gbl... > Hello, > > We're in the table design process and could really use outside thoughts on > our options. > > The table in question will grow to several million records. This table > will update and insert depending on what the tracking ID does as it lives > out it day long life span. > > Example Stored Proc: > UPDATE tbTracking > SET myCount = myCount +1 > WHERE trackingID = @trackingID AND productID = @productID > IF (@@rowcount = 0) > BEGIN > INSERT INTO tbTracking (...) VALUES (...) ; > END > > Everyday at midnight the table will be truncate just afer many other > queries generated reports and store cumulative data into other tables. > > Our goal is extreme performance, with that in mind what is the best data > type for trackingID and productID? We have the luxury to make them > whatever is best, as this is a completely new project. > > Internally debated NVARCHR(50), INT, Binary, UNIQUEIDENTIFIER and the > like, but we're in need of advise from those with more experience with > these matters. > > Thank you for your time. > > Mark > > Thank you very much.
Show quote "Brian Selzer" <br***@selzer-software.com> wrote in message news:%23oG2PTDDGHA.3980@TK2MSFTNGP12.phx.gbl... > To maximize INSERT performance: use INT IDENTITY(1, 1) PRIMARY KEY > CLUSTERED WITH FILLFACTOR = 100 > > This causes every new row to be added at the end of the table--minimizing > index maintenance and eliminating page splits. Use caution, however: > since SQL Server doesn't automatically reorganize indexes, a high volume > of DELETE activity will cause the index to become sparse, which can reduce > SELECT performance, but it will not affect INSERT performance. > > The INT datatype matches the word size of most Intel processors (32-bit), > so comparisons require fewer CPU cycles. > > The stored procedure is an example of what not to do. It's a recipie for > primary key constraint violations. There's nothing that prevents two > transactions from trying to INSERT the same information at the same time. > Most of the time what will happen is that one connection will succeed with > the INSERT and the other will UPDATE the newly inserted row, but a > collision will occur if identical UPDATE statements occur simultaneously > on two unbound connections followed by (since the row doesn't yet exist) > simultaneous identical INSERT statements. > > The correct method is to use something like this: > > BEGIN TRAN > IF EXISTS (SELECT WITH(UPDLOCK, HOLDLOCK)) > UPDATE > ELSE > INSERT > COMMIT > > Some people use the following instead, but I prefer the above method since > it is easier to read and understand > > BEGIN TRAN > INSERT...SELECT...WHERE NOT EXISTS(SELECT WITH(UPDLOCK, HOLDLOCK)) > IF @@ROWCOUNT = 0 > UPDATE > COMMIT > > Note that there isn't any marked reduction in performance or concurrency > between this and your sample, because UPDLOCK doesn't block SELECTs, and > any blocking that does occur is necessary to maintain integrity. Without > the EXISTS clause, the INSERT or UPDATE will be applying an exclusive lock > anyway which involves reading the index page into memory. With the EXISTS > clause, the SELECT reads the index page and applies an update lock, and > the INSERT or UPDATE simply transition from an update lock to an exclusive > lock in memory--no additional physical read is necessary. > > "Mark S." <ma***@yahoo.com> wrote in message > news:Ow6CKQ9CGHA.4004@tk2msftngp13.phx.gbl... >> Hello, >> >> We're in the table design process and could really use outside thoughts >> on our options. >> >> The table in question will grow to several million records. This table >> will update and insert depending on what the tracking ID does as it lives >> out it day long life span. >> >> Example Stored Proc: >> UPDATE tbTracking >> SET myCount = myCount +1 >> WHERE trackingID = @trackingID AND productID = @productID >> IF (@@rowcount = 0) >> BEGIN >> INSERT INTO tbTracking (...) VALUES (...) ; >> END >> >> Everyday at midnight the table will be truncate just afer many other >> queries generated reports and store cumulative data into other tables. >> >> Our goal is extreme performance, with that in mind what is the best data >> type for trackingID and productID? We have the luxury to make them >> whatever is best, as this is a completely new project. >> >> Internally debated NVARCHR(50), INT, Binary, UNIQUEIDENTIFIER and the >> like, but we're in need of advise from those with more experience with >> these matters. >> >> Thank you for your time. >> >> Mark >> >> > > |
|||||||||||||||||||||||