|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Indexing questionI have a database table that stores the history of a data readings taken from hardware devices. The hardware device is queried by software once a minute (or more) and the value stored in the database for trending an analysis. The table structure is: CREATE TABLE DeviceData ( [DataID] [bigint] IDENTITY (1, 1) NOT NULL , [DeviceID] [int] NOT NULL , [DataValue] [decimal](18, 4) NOT NULL , [DataTimestamp] [datetime] NOT NULL ) After running for a few weeks, the number of rows in the table is getting large as expected. I started to notice the queries against this table are taken longer to run. There is currently a primary key (DataID) and an index on DeviceID. I later realized that since most of the queries search for a particular date range, an index on DataTimestamp would definitely help. Once I added the index, query times for newly added data greatly improved. The older data, however, still takes longer then I would like. My question is, once I added the index, is only new data that gets added to the table indexed? Is there any way to optimize the queries for the older data. Being a software developer, not a DBA, any other adivce is greatly appeciated. Thanks. > (DataID) and an index on DeviceID. I later realized that since most of Yofnik,> the queries search for a particular date range, an index on > DataTimestamp would definitely help. try making the index DataTimestamp CLUSTERED - that can make a huge difference. Yofnik wrote:
Show quote > Hello All, Creating an index will index ALL data in the table, not just new data. > I have a database table that stores the history of a data readings > taken from hardware devices. The hardware device is queried by software > once a minute (or more) and the value stored in the database for > trending an analysis. > > The table structure is: > CREATE TABLE DeviceData ( > [DataID] [bigint] IDENTITY (1, 1) NOT NULL , > [DeviceID] [int] NOT NULL , > [DataValue] [decimal](18, 4) NOT NULL , > [DataTimestamp] [datetime] NOT NULL > ) > > After running for a few weeks, the number of rows in the table is > getting large as expected. I started to notice the queries against this > table are taken longer to run. There is currently a primary key > (DataID) and an index on DeviceID. I later realized that since most of > the queries search for a particular date range, an index on > DataTimestamp would definitely help. > > Once I added the index, query times for newly added data greatly > improved. The older data, however, still takes longer then I would > like. My question is, once I added the index, is only new data that > gets added to the table indexed? Is there any way to optimize the > queries for the older data. > > Being a software developer, not a DBA, any other adivce is greatly > appeciated. > > Thanks. > Since most of your queries are against date ranges, you should create a CLUSTERED index on the DataTimeStamp column. That will physically sort the data in date/time order, so that when querying for a specific date range, the records you want are physically close together. Inserts of new rows, assuming they are chronologically later than existing rows, will take place at the end of the table, helping to keep fragmentation in check. Thank you for the advice. Being a developer not a DBA, I wasn't
positive on the difference between clustered and non-clustered indecies. What about the DataID column. I have some queries that use that field. Since it is the primary key, is that sufficient or can that be improved as well? Thanks again. Yofnik wrote:
> Thank you for the advice. Being a developer not a DBA, I wasn't Since it's a primary key, it's indexed by default. Without knowing more > positive on the difference between clustered and non-clustered > indecies. > > What about the DataID column. I have some queries that use that field. > Since it is the primary key, is that sufficient or can that be improved > as well? > > Thanks again. > about how it's used, I'd have to say that it's sufficient. I guess primary keys are clustered by default? I tried making the index
on DataTimestamp clustered as suggested and got an error saying only one clustered index can exist. I suppose I need to analyze my queries and determine which clustered index would make the most sense. If I am searching by the datetime field more frequently, would it make sense to make the primary key (DataID) non-clustered and make the DataTimestamp column clustered? Is there any other impact in having a non-clustered primary key? Yofnik wrote:
> I guess primary keys are clustered by default? I tried making the index Primary keys can be clustered or non-clustered, the default depends on > on DataTimestamp clustered as suggested and got an error saying only > one clustered index can exist. > > I suppose I need to analyze my queries and determine which clustered > index would make the most sense. If I am searching by the datetime > field more frequently, would it make sense to make the primary key > (DataID) non-clustered and make the DataTimestamp column clustered? Is > there any other impact in having a non-clustered primary key? > how it was created (T-SQL or point/click). In your case, it makes much more sense to have the date field clustered, you will (or should) see a tremendous performance gain by doing so. > Primary keys can be clustered or non-clustered, the default depends on how it was created (T-SQL It used to be so that TSQL defaulted the PK to clustered and EM defaulted the PK to non-clustered. > or point/click). This changed in a SQL Server 2000 service pack, however, so that EM behaved consistently with the TSQL regarding default. A quick test show that SSMS is also consistent with TSQL default... I.e., all these seem to cluster the PK by default. -- Show quoteTibor Karaszi, SQL Server MVP http://www.karaszi.com/sqlserver/default.asp http://www.solidqualitylearning.com/ "Tracy McKibben" <tr***@realsqlguy.com> wrote in message news:44F46767.50003@realsqlguy.com... > Yofnik wrote: >> I guess primary keys are clustered by default? I tried making the index >> on DataTimestamp clustered as suggested and got an error saying only >> one clustered index can exist. >> >> I suppose I need to analyze my queries and determine which clustered >> index would make the most sense. If I am searching by the datetime >> field more frequently, would it make sense to make the primary key >> (DataID) non-clustered and make the DataTimestamp column clustered? Is >> there any other impact in having a non-clustered primary key? >> > > Primary keys can be clustered or non-clustered, the default depends on how it was created (T-SQL > or point/click). > > In your case, it makes much more sense to have the date field clustered, you will (or should) see > a tremendous performance gain by doing so. > > > -- > Tracy McKibben > MCDBA > http://www.realsqlguy.com Primary keys are clustered by default. But, it is very often true that you
do not want the primary key index to be your clustered index. Since you only get one clustered index per table, you want to choose it carefully. Some things to consider: 1) Clustered indexes are wonderful for queries that return a range of values (like in your case you want all the the rows with a given date or range of dates) or for a query on a column that is for only one value, but there are many rows in the table which have that value. For a query like yours, making the clustered index on the datetime column can often make dramatic differences in response time - seeing queries that used to take minutes to now return in milliseconds is not uncommon. 2) Since the keys of your clustered index are also contained in every other index on that table, you don't want the clustered index to be too wide since that will make all your other indexes larger. 3) Since the keys of your clustered index are also contained in every other index on that table, you don't want to UPDATE the values in the keys of your clustered index to change very often since every time you do that SQL Server must not only change the value in the table, but clso the value in every index. Since the most important consideration is often 1), primary keys are often bad choices for the clustered index. In your case, the primary key (DataID) fits 2) and 3), but not 1) at all (I'd be willing to bet you have never done a query like SELECT * FROM DeviceData WHERE DataID BETWEEN 35000 AND 35999). But DataTimeStamp nicely fits all three conditions. If at all possible, you want to think about whether or not you want the primary key index to be clustered before you create the table and load it with data. If the table already has been created and you want to change the primary key to be nonclustered, you need the following steps: Drop any foreign keys on other tables that reference this table's primary key Drop this tables primary key constraint Add the new clustered index Add back the primary key constraint (non clustered) Add back any foreign keys on other tables that reference this table's primary key Of course, do this on a test server first, then a QA server, make sure you have adquate backups in production before starting this process, and generally follow other "best practices" before making this change. But I'm 99.99% sure you will find it worth the work. Tom Show quote "Yofnik" <yof***@comcast.net> wrote in message news:1156866965.346569.28080@75g2000cwc.googlegroups.com... >I guess primary keys are clustered by default? I tried making the index > on DataTimestamp clustered as suggested and got an error saying only > one clustered index can exist. > > I suppose I need to analyze my queries and determine which clustered > index would make the most sense. If I am searching by the datetime > field more frequently, would it make sense to make the primary key > (DataID) non-clustered and make the DataTimestamp column clustered? Is > there any other impact in having a non-clustered primary key? > We wrote a blog entry on how/when to create indexes that may be useful to
you: http://blogs.msdn.com/queryoptteam/archive/2006/04/06/570176.aspx Thanks, Conor Show quote "Tracy McKibben" <tr***@realsqlguy.com> wrote in message news:44F4621C.8040300@realsqlguy.com... > Yofnik wrote: >> Thank you for the advice. Being a developer not a DBA, I wasn't >> positive on the difference between clustered and non-clustered >> indecies. >> >> What about the DataID column. I have some queries that use that field. >> Since it is the primary key, is that sufficient or can that be improved >> as well? Thanks again. >> > > Since it's a primary key, it's indexed by default. Without knowing more > about how it's used, I'd have to say that it's sufficient. > > > -- > Tracy McKibben > MCDBA > http://www.realsqlguy.com
Show quote
"Tracy McKibben" <tr***@realsqlguy.com> wrote in message On SQL 2005, I'd probably just add a nonclustered index on DataTimestamp news:44F45D4B.5000404@realsqlguy.com... > Yofnik wrote: >> Hello All, >> I have a database table that stores the history of a data readings >> taken from hardware devices. The hardware device is queried by software >> once a minute (or more) and the value stored in the database for >> trending an analysis. >> >> The table structure is: >> CREATE TABLE DeviceData ( >> [DataID] [bigint] IDENTITY (1, 1) NOT NULL , >> [DeviceID] [int] NOT NULL , >> [DataValue] [decimal](18, 4) NOT NULL , >> [DataTimestamp] [datetime] NOT NULL >> ) >> >> After running for a few weeks, the number of rows in the table is >> getting large as expected. I started to notice the queries against this >> table are taken longer to run. There is currently a primary key >> (DataID) and an index on DeviceID. I later realized that since most of >> the queries search for a particular date range, an index on >> DataTimestamp would definitely help. >> >> Once I added the index, query times for newly added data greatly >> improved. The older data, however, still takes longer then I would >> like. My question is, once I added the index, is only new data that >> gets added to the table indexed? Is there any way to optimize the >> queries for the older data. >> >> Being a software developer, not a DBA, any other adivce is greatly >> appeciated. Thanks. >> > > Creating an index will index ALL data in the table, not just new data. > Since most of your queries are against date ranges, you should create a > CLUSTERED index on the DataTimeStamp column. That will physically sort > the data in date/time order, so that when querying for a specific date > range, the records you want are physically close together. Inserts of new > rows, assuming they are chronologically later than existing rows, will > take place at the end of the table, helping to keep fragmentation in > check. > > with included columns. create table DeviceData ( [DataID] bigint identity (1, 1) NOT NULL , [DeviceID] int NOT NULL , [DataValue] decimal(18, 4) NOT NULL , [DataTimestamp] datetime NOT NULL, constraint PK_DeviceData primary key clustered (DataID) ) create index IX_DeviceDataByTimestamp on DeviceData (DataTimestamp) include (DeviceID, DataValue) David Yofnik wrote:
Show quote > Hello All, What have seen, but apparently do not realize is that your IDENTITY> I have a database table that stores the history of a data readings > taken from hardware devices. The hardware device is queried by software > once a minute (or more) and the value stored in the database for > trending an analysis. > > The table structure is: > CREATE TABLE DeviceData ( > [DataID] [bigint] IDENTITY (1, 1) NOT NULL , > [DeviceID] [int] NOT NULL , > [DataValue] [decimal](18, 4) NOT NULL , > [DataTimestamp] [datetime] NOT NULL > ) > >> After running for a few weeks, the number of rows in the table is getting large as expected. I started to notice the queries against this table are taken longer to run. There is currently a primary key (DataID) and an index on DeviceID. I later realized that since most of the queries search for a particular date range, an index on DataTimestamp would definitely help. << column is redundant and not a key at all. It is an attribute of the hardware and has nothing to do with the data model. The data_timestamp is the natural key and should be so delcared: CREATE TABLE DeviceData (reading_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL PRIMARY KEY, device_id INTEGER NOT NULL REFERENCES Devices(device_id), device_reading DECIMAL (18,4) NOT NULL ); Names like :"data_value" are vague; isn't what you really have the device readings in that column? >> Once I added the index, query times for newly added data greatly improved. The older data, however, still takes longer then I would like. My question is, once I added the index, is only new data that gets added to the table indexed? Is there any way to optimize the queries for the older data.<< If all the data are in the same table, they all are indexed. Firstcheck the data to be sure that you really have unique timestamps. Drop the redundant, exposed locator IDENTITY column. Drop your indexes. Add a primary key constraint. I think that you have the wrong model of the world. In SQL, there are DRI constraints which you **must have** if you want to have data integrity. In SQL Server the uniqueness constraints are enforced by indexes. Other products do it in other ways (hashing, bit vectors, etc.) These are called primary indexes. But implement a LOGICAL concept -- a relational key. To improve performance, you can add optional indexes (secondary indexes) to a table. This is vendor dependent and not part of the SQL Standard at all. You are still thinking about a file system which does not have the concept of relational keys and whose indexes are all of the same kind. In a sequentail file, you locate records (which are not rows) by a physical position number (which newbies fake with IDENTITY). It takes at least a year of full time SQL programming, a lot of reading and a good teacher to get the hang of it. Could be worse; could be APL or LISP :) > CREATE TABLE DeviceData this design does not allow for more than one row per distinct> (reading_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP > NOT NULL PRIMARY KEY, > device_id INTEGER NOT NULL > REFERENCES Devices(device_id), > device_reading DECIMAL (18,4) NOT NULL ); reading_timestamp. Why so? > CREATE TABLE DeviceData I'm suprised you are advocating putting a PRIMARY KEY constraint on a data > (reading_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP > NOT NULL PRIMARY KEY, > device_id INTEGER NOT NULL > REFERENCES Devices(device_id), > device_reading DECIMAL (18,4) NOT NULL ); type that stores APPROXIMATE data. > What have seen, but apparently do not realize is that your IDENTITY IDENTITY is NOT an attribute of the hardware at all, in the majority of > column is redundant and not a key at all. It is an attribute of the > hardware and has nothing to do with the data model. The data_timestamp > is the natural key and should be so delcared: cases it is used as a SURROGATE KEY. It does look like in this instance there is no need for a surrogate key but to make reading_timestamp a unique key is just folly and asking for trouble. Cluster on reading_timestamp, device_id perhaps might make sense but not as a key. > You are still thinking about a file system which does not have the Surrogate key usage - its how we IMPLEMENT a good and well performing > concept of relational keys and whose indexes are all of the same kind. > In a sequentail file, you locate records (which are not rows) by a > physical position number (which newbies fake with IDENTITY). LOGICAL MODEL, something you seem to have little experience actually doing. > To improve performance, you can add optional indexes (secondary This news group is for MICROSOFT SQL SERVER, your posts should coming with a > indexes) to a table. This is vendor dependent and not part of the SQL > Standard at all. warning, something like 'BEWARE, THE CONTENTS OF THIS POST REFER TO A STANDARD VERSION OF SQL AND MAY NOT BE APPLICABLE TO MICROSOFT SQL SERVER'. > It takes at least a year of full time SQL programming, a lot of reading It takes at least 3 year of full time developing with mentorship and > and a good teacher to get the hang of it. Could be worse; could be APL > or LISP :) training to be a good developer and gain the experience necessary to work alone..... Stuff you cannot get in the class room.... -- Show quoteTony Rogerson SQL Server MVP http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL Server Consultant http://sqlserverfaq.com - free video tutorials "--CELKO--" <jcelko***@earthlink.net> wrote in message news:1156868024.628111.250220@m79g2000cwm.googlegroups.com... > Yofnik wrote: >> Hello All, >> I have a database table that stores the history of a data readings >> taken from hardware devices. The hardware device is queried by software >> once a minute (or more) and the value stored in the database for >> trending an analysis. >> >> The table structure is: >> CREATE TABLE DeviceData ( >> [DataID] [bigint] IDENTITY (1, 1) NOT NULL , >> [DeviceID] [int] NOT NULL , >> [DataValue] [decimal](18, 4) NOT NULL , >> [DataTimestamp] [datetime] NOT NULL >> ) >> >>> After running for a few weeks, the number of rows in the table is >>> getting large as expected. I started to notice the queries against this >>> table are taken longer to run. There is currently a primary key (DataID) >>> and an index on DeviceID. I later realized that since most of the >>> queries search for a particular date range, an index on DataTimestamp >>> would definitely help. << > > What have seen, but apparently do not realize is that your IDENTITY > column is redundant and not a key at all. It is an attribute of the > hardware and has nothing to do with the data model. The data_timestamp > is the natural key and should be so delcared: > > CREATE TABLE DeviceData > (reading_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP > NOT NULL PRIMARY KEY, > device_id INTEGER NOT NULL > REFERENCES Devices(device_id), > device_reading DECIMAL (18,4) NOT NULL ); > > Names like :"data_value" are vague; isn't what you really have the > device readings in that column? > >>> Once I added the index, query times for newly added data greatly >>> improved. The older data, however, still takes longer then I would >>> like. My question is, once I added the index, is only new data that gets >>> added to the table indexed? Is there any way to optimize the queries for >>> the older data.<< > > If all the data are in the same table, they all are indexed. First > check the data to be sure that you really have unique timestamps. Drop > the redundant, exposed locator IDENTITY column. Drop your indexes. > Add a primary key constraint. > > I think that you have the wrong model of the world. In SQL, there are > DRI constraints which you **must have** if you want to have data > integrity. In SQL Server the uniqueness constraints are enforced by > indexes. Other products do it in other ways (hashing, bit vectors, > etc.) These are called primary indexes. But implement a LOGICAL > concept -- a relational key. > > To improve performance, you can add optional indexes (secondary > indexes) to a table. This is vendor dependent and not part of the SQL > Standard at all. > > You are still thinking about a file system which does not have the > concept of relational keys and whose indexes are all of the same kind. > In a sequentail file, you locate records (which are not rows) by a > physical position number (which newbies fake with IDENTITY). > > It takes at least a year of full time SQL programming, a lot of reading > and a good teacher to get the hang of it. Could be worse; could be APL > or LISP :) > Thank you all for your very helpful comments. I believe in my case, it
makes the most sense to cluster the datetime field. I do have a question about a suggestion Tony made, however. Can someone explain the difference and potential impact of making a clustered index on reading_timestamp, device_id versus having a clustered index on reading_timestamp and a normal index on device_id? Hi Yofnik,
It depends on that type of queries you are performing and how the data is inserted. Remember, the data is physically stored in the order of the clustered index (well, pretty much so anyway, on page it uses slots which mean the physical order on the page may not be the same but for this that does not matter), if you have a lot of devices with a single process inserting one devices data then clustering by deviceid, datetime will reduce contention on the insert pages, whereas clustering by entry datetime first puts everything together. What type of queries are you running? -- Show quoteTony Rogerson SQL Server MVP http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL Server Consultant http://sqlserverfaq.com - free video tutorials "Yofnik" <yof***@comcast.net> wrote in message news:1156875888.222183.312500@i3g2000cwc.googlegroups.com... > Thank you all for your very helpful comments. I believe in my case, it > makes the most sense to cluster the datetime field. > > I do have a question about a suggestion Tony made, however. Can someone > explain the difference and potential impact of making a clustered index > on reading_timestamp, device_id versus having a clustered index on > reading_timestamp and a normal index on device_id? > I am running the following queries on the table:
INSERT: These are executed in a ThreadPool thread by one or more processes. Each thread will get a "batch" of data from a number of different device ids to insert. Select most recent: Finds the last device reading for a particular device id. I was using MAX(DataID) to find this, but if I make the DataTimestamp field the cluster index, I should probably change this to MAX(DataTimestamp) right? Select Range: Returns all data between two dates for analyzing. This query looks like: SELECT DataValue, DataTimestamp, DeviceID FROM DeviceData WHERE DataTimestamp > @0 AND DataTimestamp <= @1 and DeviceID = @3 ORDER BY DataTimestamp Tony Rogerson wrote: Show quote > Hi Yofnik, > > It depends on that type of queries you are performing and how the data is > inserted. > > Remember, the data is physically stored in the order of the clustered index > (well, pretty much so anyway, on page it uses slots which mean the physical > order on the page may not be the same but for this that does not matter), if > you have a lot of devices with a single process inserting one devices data > then clustering by deviceid, datetime will reduce contention on the insert > pages, whereas clustering by entry datetime first puts everything together. > > What type of queries are you running? > > -- > Tony Rogerson > SQL Server MVP > http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL > Server Consultant > http://sqlserverfaq.com - free video tutorials > > > "Yofnik" <yof***@comcast.net> wrote in message > news:1156875888.222183.312500@i3g2000cwc.googlegroups.com... > > Thank you all for your very helpful comments. I believe in my case, it > > makes the most sense to cluster the datetime field. > > > > I do have a question about a suggestion Tony made, however. Can someone > > explain the difference and potential impact of making a clustered index > > on reading_timestamp, device_id versus having a clustered index on > > reading_timestamp and a normal index on device_id? > > Hi Yofnik,
Yes, probs cluster on DAtaTimestamp, DeviceID in that case - don't make it unique though unless you can guarentee that those combinations will give a unique natural key. You probably don't need DataID so get rid, unless you are using the ID within the application, I don't like to use composite keys in the application - I prefer surrogates, in your case the surrogate key is DataID because its easier passing an integer around. Tony. -- Show quoteTony Rogerson SQL Server MVP http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL Server Consultant http://sqlserverfaq.com - free video tutorials "Yofnik" <yof***@comcast.net> wrote in message news:1156878975.812220.112860@h48g2000cwc.googlegroups.com... >I am running the following queries on the table: > > INSERT: These are executed in a ThreadPool thread by one or more > processes. Each thread will get a "batch" of data from a number of > different device ids to insert. > > Select most recent: Finds the last device reading for a particular > device id. I was using MAX(DataID) to find this, but if I make the > DataTimestamp field the cluster index, I should probably change this to > MAX(DataTimestamp) right? > > Select Range: Returns all data between two dates for analyzing. This > query looks like: > SELECT DataValue, DataTimestamp, DeviceID FROM DeviceData WHERE > DataTimestamp > @0 AND DataTimestamp <= @1 and DeviceID = @3 ORDER BY > DataTimestamp > > > Tony Rogerson wrote: >> Hi Yofnik, >> >> It depends on that type of queries you are performing and how the data is >> inserted. >> >> Remember, the data is physically stored in the order of the clustered >> index >> (well, pretty much so anyway, on page it uses slots which mean the >> physical >> order on the page may not be the same but for this that does not matter), >> if >> you have a lot of devices with a single process inserting one devices >> data >> then clustering by deviceid, datetime will reduce contention on the >> insert >> pages, whereas clustering by entry datetime first puts everything >> together. >> >> What type of queries are you running? >> >> -- >> Tony Rogerson >> SQL Server MVP >> http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a >> SQL >> Server Consultant >> http://sqlserverfaq.com - free video tutorials >> >> >> "Yofnik" <yof***@comcast.net> wrote in message >> news:1156875888.222183.312500@i3g2000cwc.googlegroups.com... >> > Thank you all for your very helpful comments. I believe in my case, it >> > makes the most sense to cluster the datetime field. >> > >> > I do have a question about a suggestion Tony made, however. Can someone >> > explain the difference and potential impact of making a clustered index >> > on reading_timestamp, device_id versus having a clustered index on >> > reading_timestamp and a normal index on device_id? >> > > Yofnik,
Our system collects data from devices around the world. With lots of different timezones. If you do this as well, then I would like to recommend using UTC datetimes in the database and in your devices. Then change your client s/w to convert the UTC to local time. Another benefit of using UTC is that there is no daylight saving stuff to cater for. Best regards, Henrik Staun Poulsen Now that I made the index clustered on (DataTimestamp, DeviceID) my
query that returns the most recent data value for a particular device is taking longer. The query I am using looks like this: SELECT DeviceID, DataValue, DataTimestamp FROM DeviceData A WHERE (DataTimestamp = (SELECT MAX(DataTimestamp) FROM DeviceData B WHERE A.DeviceID= B.DeviceID)) I am guessing the performance went down because I moved the cluster from DeviceID to DataTimestamp. The line A.DeviceID=B.DeviceID must be slowing it down. Does anyone see a way to improve this? Do you have a nonclustered index on DeviceID? If not, I would try adding
one. If that doesn't Is it possible to restrict this query by time? For example, if your table has data for the last three years would it work for your business problem to have the query be: SELECT DeviceID, DataValue, DataTimestamp FROM DeviceData A WHERE (DataTimestamp = (SELECT MAX(DataTimestamp) FROM DeviceData B WHERE A.DeviceID= B.DeviceID AND B.DataTimeStamp >= '20060101')) AND DataTimestamp >= '20060101' Of course, that would mean that you would not return a row for a DeviceID which had rows before 01/01/2006, but has no rows this year. The more you can restrict the dates you use, the faster the query should be. Maybe you can let the user enter how far back they want to go, then they will know that if they ask for data since the first of the month that they will get the result much faster than if they ask for everything since the beginning of time. You could help both queries by creating an indexed view, but that is probably a very bad choice for you because you are constantly updating this data. Indexed views generally are most helpful when the data is queried a lot, but UPDATEd or INSERTed very rarely. Finally, since you only get one clustered index per table, you may have to decide which of the two choices results in the best overall performance for your system. Tom Show quote "Yofnik" <yof***@comcast.net> wrote in message news:1156969020.483914.47370@i3g2000cwc.googlegroups.com... > Now that I made the index clustered on (DataTimestamp, DeviceID) my > query that returns the most recent data value for a particular device > is taking longer. The query I am using looks like this: > > SELECT DeviceID, DataValue, DataTimestamp > FROM DeviceData A > WHERE (DataTimestamp = > (SELECT MAX(DataTimestamp) > FROM DeviceData B > WHERE A.DeviceID= B.DeviceID)) > > I am guessing the performance went down because I moved the cluster > from DeviceID to DataTimestamp. The line A.DeviceID=B.DeviceID must be > slowing it down. > > Does anyone see a way to improve this? > Make sure you also have an index on DeviceID, DataTimestamp.
So, to recap. Clustered index on DataTimestamp, DeviceID NonClustered on DeviceID, DataTimestamp You might also be better rewriting the query something like.... SELECT ... FROM ( SELECT DeviceID, MaxDataTimestamp = MAX( DataTimestamp ) FROM DeviceData GROUP BY DeviceID ) AS md INNER JOIN DeviceData dd ON dd.DataTimestamp = md.MaxDataTimestamp AND dd.DeviceId = md.DeviceID -- Show quoteTony Rogerson SQL Server MVP http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL Server Consultant http://sqlserverfaq.com - free video tutorials "Yofnik" <yof***@comcast.net> wrote in message news:1156969020.483914.47370@i3g2000cwc.googlegroups.com... > Now that I made the index clustered on (DataTimestamp, DeviceID) my > query that returns the most recent data value for a particular device > is taking longer. The query I am using looks like this: > > SELECT DeviceID, DataValue, DataTimestamp > FROM DeviceData A > WHERE (DataTimestamp = > (SELECT MAX(DataTimestamp) > FROM DeviceData B > WHERE A.DeviceID= B.DeviceID)) > > I am guessing the performance went down because I moved the cluster > from DeviceID to DataTimestamp. The line A.DeviceID=B.DeviceID must be > slowing it down. > > Does anyone see a way to improve this? > Yofnik wrote:
Show quote > Now that I made the index clustered on (DataTimestamp, DeviceID) my The clustered index was suggested because your original post stated that > query that returns the most recent data value for a particular device > is taking longer. The query I am using looks like this: > > SELECT DeviceID, DataValue, DataTimestamp > FROM DeviceData A > WHERE (DataTimestamp = > (SELECT MAX(DataTimestamp) > FROM DeviceData B > WHERE A.DeviceID= B.DeviceID)) > > I am guessing the performance went down because I moved the cluster > from DeviceID to DataTimestamp. The line A.DeviceID=B.DeviceID must be > slowing it down. > > Does anyone see a way to improve this? > you commonly query using date ranges. This query isn't doing that, so the benefit of the clustered index isn't being realized. |
|||||||||||||||||||||||