Home All Groups Group Topic Archive Search About
Author
29 Aug 2006 3:16 PM
Yofnik
Hello All,
I have a database table that stores the history of a data readings
taken from hardware devices. The hardware device is queried by software
once a minute (or more) and the value stored in the database for
trending an analysis.

The table structure is:
CREATE TABLE DeviceData (
    [DataID] [bigint] IDENTITY (1, 1) NOT NULL ,
    [DeviceID] [int] NOT NULL ,
    [DataValue] [decimal](18, 4) NOT NULL ,
    [DataTimestamp] [datetime] NOT NULL
)

After running for a few weeks, the number of rows in the table is
getting large as expected. I started to notice the queries against this
table are taken longer to run. There is currently a primary key
(DataID) and an index on DeviceID. I later realized that since most of
the queries search for a particular date range, an index on
DataTimestamp would definitely help.

Once I added the index, query times for newly added data greatly
improved. The older data, however, still takes longer then I would
like. My question is, once I added the index, is only new data that
gets added to the table indexed? Is there any way to optimize the
queries for the older data.

Being a software developer, not a DBA, any other adivce is greatly
appeciated.

Thanks.

Author
29 Aug 2006 3:22 PM
Alexander Kuznetsov
> (DataID) and an index on DeviceID. I later realized that since most of
> the queries search for a particular date range, an index on
> DataTimestamp would definitely help.

Yofnik,

try making the index DataTimestamp CLUSTERED - that can make a huge
difference.
Author
29 Aug 2006 3:29 PM
Tracy McKibben
Yofnik wrote:
Show quote
> Hello All,
> I have a database table that stores the history of a data readings
> taken from hardware devices. The hardware device is queried by software
> once a minute (or more) and the value stored in the database for
> trending an analysis.
>
> The table structure is:
> CREATE TABLE DeviceData (
>     [DataID] [bigint] IDENTITY (1, 1) NOT NULL ,
>     [DeviceID] [int] NOT NULL ,
>     [DataValue] [decimal](18, 4) NOT NULL ,
>     [DataTimestamp] [datetime] NOT NULL
> )
>
> After running for a few weeks, the number of rows in the table is
> getting large as expected. I started to notice the queries against this
> table are taken longer to run. There is currently a primary key
> (DataID) and an index on DeviceID. I later realized that since most of
> the queries search for a particular date range, an index on
> DataTimestamp would definitely help.
>
> Once I added the index, query times for newly added data greatly
> improved. The older data, however, still takes longer then I would
> like. My question is, once I added the index, is only new data that
> gets added to the table indexed? Is there any way to optimize the
> queries for the older data.
>
> Being a software developer, not a DBA, any other adivce is greatly
> appeciated.
>
> Thanks.
>

Creating an index will index ALL data in the table, not just new data.
Since most of your queries are against date ranges, you should create a
CLUSTERED index on the DataTimeStamp column.  That will physically sort
the data in date/time order, so that when querying for a specific date
range, the records you want are physically close together.  Inserts of
new rows, assuming they are chronologically later than existing rows,
will take place at the end of the table, helping to keep fragmentation
in check.


--
Tracy McKibben
MCDBA
http://www.realsqlguy.com
Author
29 Aug 2006 3:42 PM
Yofnik
Thank you for the advice. Being a developer not a DBA, I wasn't
positive on the difference between clustered and non-clustered
indecies.

What about the DataID column. I have some queries that use that field.
Since it is the primary key, is that sufficient or can that be improved
as well?

Thanks again.
Author
29 Aug 2006 3:49 PM
Tracy McKibben
Yofnik wrote:
> Thank you for the advice. Being a developer not a DBA, I wasn't
> positive on the difference between clustered and non-clustered
> indecies.
>
> What about the DataID column. I have some queries that use that field.
> Since it is the primary key, is that sufficient or can that be improved
> as well?
>
> Thanks again.
>

Since it's a primary key, it's indexed by default.  Without knowing more
about how it's used, I'd have to say that it's sufficient.


--
Tracy McKibben
MCDBA
http://www.realsqlguy.com
Author
29 Aug 2006 3:56 PM
Yofnik
I guess primary keys are clustered by default? I tried making the index
on DataTimestamp clustered as suggested and got an error saying only
one clustered index can exist.

I suppose I need to analyze my queries and determine which clustered
index would make the most sense. If I am searching by the datetime
field more frequently, would it make sense to make the primary key
(DataID) non-clustered and make the DataTimestamp column clustered? Is
there any other impact in having a non-clustered primary key?
Author
29 Aug 2006 4:12 PM
Tracy McKibben
Yofnik wrote:
> I guess primary keys are clustered by default? I tried making the index
> on DataTimestamp clustered as suggested and got an error saying only
> one clustered index can exist.
>
> I suppose I need to analyze my queries and determine which clustered
> index would make the most sense. If I am searching by the datetime
> field more frequently, would it make sense to make the primary key
> (DataID) non-clustered and make the DataTimestamp column clustered? Is
> there any other impact in having a non-clustered primary key?
>

Primary keys can be clustered or non-clustered, the default depends on
how it was created (T-SQL or point/click).

In your case, it makes much more sense to have the date field clustered,
you will (or should) see a tremendous performance gain by doing so.


--
Tracy McKibben
MCDBA
http://www.realsqlguy.com
Author
29 Aug 2006 4:23 PM
Tibor Karaszi
> Primary keys can be clustered or non-clustered, the default depends on how it was created (T-SQL
> or point/click).

It used to be so that TSQL defaulted the PK to clustered and EM defaulted the PK to non-clustered.
This changed in a SQL Server 2000 service pack, however, so that EM behaved consistently with the
TSQL regarding default. A quick test show that SSMS is also consistent with TSQL default... I.e.,
all these seem to cluster the PK by default.

Show quote
"Tracy McKibben" <tr***@realsqlguy.com> wrote in message news:44F46767.50003@realsqlguy.com...
> Yofnik wrote:
>> I guess primary keys are clustered by default? I tried making the index
>> on DataTimestamp clustered as suggested and got an error saying only
>> one clustered index can exist.
>>
>> I suppose I need to analyze my queries and determine which clustered
>> index would make the most sense. If I am searching by the datetime
>> field more frequently, would it make sense to make the primary key
>> (DataID) non-clustered and make the DataTimestamp column clustered? Is
>> there any other impact in having a non-clustered primary key?
>>
>
> Primary keys can be clustered or non-clustered, the default depends on how it was created (T-SQL
> or point/click).
>
> In your case, it makes much more sense to have the date field clustered, you will (or should) see
> a tremendous performance gain by doing so.
>
>
> --
> Tracy McKibben
> MCDBA
> http://www.realsqlguy.com
Author
29 Aug 2006 4:35 PM
Tom Cooper
Primary keys are clustered by default.  But, it is very often true that you
do not want the primary key index to be your clustered index.  Since you
only get one clustered index per table, you want to choose it carefully.
Some things to consider:

1)  Clustered indexes are wonderful for queries that return a range of
values (like in your case you want all the the rows with a given date or
range of dates) or for a query on a column that is for only one value, but
there are many rows in the table which have that value.  For a query like
yours, making the clustered index on the datetime column can often make
dramatic differences in response time - seeing queries that used to take
minutes to now return in milliseconds is not uncommon.

2) Since the keys of your clustered index are also contained in every other
index on that table, you don't want the clustered index to be too wide since
that will make all your other indexes larger.

3) Since the keys of your clustered index are also contained in every other
index on that table, you don't want to UPDATE the values in the keys of your
clustered index to change very often since every time you do that SQL Server
must not only change the value in the table, but clso the value in every
index.

Since the most important consideration is often 1), primary keys are often
bad choices for the clustered index.  In your case, the primary key (DataID)
fits 2) and 3), but not 1) at all (I'd be willing to bet you have never done
a query like SELECT * FROM DeviceData WHERE DataID BETWEEN 35000 AND 35999).
But DataTimeStamp nicely fits all three conditions.

If at all possible, you want to think about whether or not you want the
primary key index to be clustered before you create the table and load it
with data.  If the table already has been created and you want to change the
primary key to be nonclustered, you need the following steps:

Drop any foreign keys on other tables that reference this table's primary
key
Drop this tables primary key constraint
Add the new clustered index
Add back the primary key constraint (non clustered)
Add back any foreign keys on other tables that reference this table's
primary key

Of course, do this on a test server first, then a QA server, make sure you
have adquate backups in production before starting this process, and
generally follow other "best practices" before making this change.  But I'm
99.99% sure you will find it worth the work.

Tom

Show quote
"Yofnik" <yof***@comcast.net> wrote in message
news:1156866965.346569.28080@75g2000cwc.googlegroups.com...
>I guess primary keys are clustered by default? I tried making the index
> on DataTimestamp clustered as suggested and got an error saying only
> one clustered index can exist.
>
> I suppose I need to analyze my queries and determine which clustered
> index would make the most sense. If I am searching by the datetime
> field more frequently, would it make sense to make the primary key
> (DataID) non-clustered and make the DataTimestamp column clustered? Is
> there any other impact in having a non-clustered primary key?
>
Author
29 Aug 2006 3:57 PM
Conor Cunningham [MS]
We wrote a blog entry on how/when to create indexes that may be useful to
you:
http://blogs.msdn.com/queryoptteam/archive/2006/04/06/570176.aspx

Thanks,

Conor

Show quote
"Tracy McKibben" <tr***@realsqlguy.com> wrote in message
news:44F4621C.8040300@realsqlguy.com...
> Yofnik wrote:
>> Thank you for the advice. Being a developer not a DBA, I wasn't
>> positive on the difference between clustered and non-clustered
>> indecies.
>>
>> What about the DataID column. I have some queries that use that field.
>> Since it is the primary key, is that sufficient or can that be improved
>> as well? Thanks again.
>>
>
> Since it's a primary key, it's indexed by default.  Without knowing more
> about how it's used, I'd have to say that it's sufficient.
>
>
> --
> Tracy McKibben
> MCDBA
> http://www.realsqlguy.com
Author
29 Aug 2006 3:57 PM
David Browne
Show quote
"Tracy McKibben" <tr***@realsqlguy.com> wrote in message
news:44F45D4B.5000404@realsqlguy.com...
> Yofnik wrote:
>> Hello All,
>> I have a database table that stores the history of a data readings
>> taken from hardware devices. The hardware device is queried by software
>> once a minute (or more) and the value stored in the database for
>> trending an analysis.
>>
>> The table structure is:
>> CREATE TABLE DeviceData (
>>     [DataID] [bigint] IDENTITY (1, 1) NOT NULL ,
>>     [DeviceID] [int] NOT NULL ,
>>     [DataValue] [decimal](18, 4) NOT NULL ,
>>     [DataTimestamp] [datetime] NOT NULL
>> )
>>
>> After running for a few weeks, the number of rows in the table is
>> getting large as expected. I started to notice the queries against this
>> table are taken longer to run. There is currently a primary key
>> (DataID) and an index on DeviceID. I later realized that since most of
>> the queries search for a particular date range, an index on
>> DataTimestamp would definitely help.
>>
>> Once I added the index, query times for newly added data greatly
>> improved. The older data, however, still takes longer then I would
>> like. My question is, once I added the index, is only new data that
>> gets added to the table indexed? Is there any way to optimize the
>> queries for the older data.
>>
>> Being a software developer, not a DBA, any other adivce is greatly
>> appeciated. Thanks.
>>
>
> Creating an index will index ALL data in the table, not just new data.
> Since most of your queries are against date ranges, you should create a
> CLUSTERED index on the DataTimeStamp column.  That will physically sort
> the data in date/time order, so that when querying for a specific date
> range, the records you want are physically close together.  Inserts of new
> rows, assuming they are chronologically later than existing rows, will
> take place at the end of the table, helping to keep fragmentation in
> check.
>
>

On SQL 2005, I'd probably just add a nonclustered index on DataTimestamp
with included columns.

create table DeviceData
(
    [DataID] bigint identity (1, 1) NOT NULL ,
    [DeviceID] int NOT NULL ,
    [DataValue] decimal(18, 4) NOT NULL ,
    [DataTimestamp] datetime NOT NULL,
    constraint PK_DeviceData
      primary key clustered (DataID)
)

create index IX_DeviceDataByTimestamp
on DeviceData (DataTimestamp)
include (DeviceID, DataValue)


David
Author
29 Aug 2006 4:13 PM
--CELKO--
Yofnik wrote:
Show quote
> Hello All,
> I have a database table that stores the history of a data readings
> taken from hardware devices. The hardware device is queried by software
> once a minute (or more) and the value stored in the database for
> trending an analysis.
>
> The table structure is:
> CREATE TABLE DeviceData (
>     [DataID] [bigint] IDENTITY (1, 1) NOT NULL ,
>     [DeviceID] [int] NOT NULL ,
>     [DataValue] [decimal](18, 4) NOT NULL ,
>     [DataTimestamp] [datetime] NOT NULL
> )
>
>> After running for a few weeks, the number of rows in the table is getting large as expected. I started to notice the queries against this table are taken longer to run. There is currently a primary key (DataID) and an index on DeviceID. I later realized that since most of the queries search for a particular date range, an index on DataTimestamp would definitely help. <<

What have seen, but apparently do not realize is that your IDENTITY
column is redundant and not a key at all.  It is an attribute of the
hardware and has nothing to do with the data model.  The data_timestamp
is the natural key and should be so delcared:

CREATE TABLE DeviceData
(reading_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
        NOT NULL PRIMARY KEY,
device_id INTEGER NOT NULL
   REFERENCES Devices(device_id),
device_reading DECIMAL (18,4) NOT NULL );

Names like :"data_value" are vague; isn't what you really have the
device readings in that column?

>> Once I added the index, query times for newly added data greatly improved. The older data, however, still takes longer then I would  like. My question is, once I added the index, is only new data that gets added to the table indexed? Is there any way to optimize the queries for the older data.<<

If all the data are in the same table, they all are indexed.  First
check the data to be sure that you really have unique timestamps.  Drop
the redundant, exposed locator IDENTITY column.  Drop your indexes.
Add a primary key constraint.

I think that you have the wrong model of the world.  In SQL, there are
DRI constraints which you **must have** if you want to have data
integrity.  In SQL Server the uniqueness constraints are enforced by
indexes.  Other products do it in other ways (hashing, bit vectors,
etc.)  These are called primary indexes.  But implement a LOGICAL
concept -- a relational key.

To improve performance, you can add optional indexes (secondary
indexes) to a table.  This is vendor dependent and not part of the SQL
Standard at all.

You are still thinking about a file system which does not have the
concept of relational keys and whose indexes are all of the same kind.
In a sequentail file, you locate records (which are not rows) by a
physical position number (which newbies fake with IDENTITY).

It takes at least a year of full time SQL programming, a lot of reading
and a good teacher to get the hang of it.  Could be worse; could be APL
or LISP :)
Author
29 Aug 2006 4:22 PM
Alexander Kuznetsov
> CREATE TABLE DeviceData
> (reading_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
>         NOT NULL PRIMARY KEY,
>  device_id INTEGER NOT NULL
>    REFERENCES Devices(device_id),
>  device_reading DECIMAL (18,4) NOT NULL );

this design does not allow for more than one row per distinct
reading_timestamp. Why so?
Author
29 Aug 2006 4:31 PM
Tony Rogerson
> CREATE TABLE DeviceData
> (reading_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
>        NOT NULL PRIMARY KEY,
> device_id INTEGER NOT NULL
>   REFERENCES Devices(device_id),
> device_reading DECIMAL (18,4) NOT NULL );

I'm suprised you are advocating putting a PRIMARY KEY constraint on a data
type that stores APPROXIMATE data.

> What have seen, but apparently do not realize is that your IDENTITY
> column is redundant and not a key at all.  It is an attribute of the
> hardware and has nothing to do with the data model.  The data_timestamp
> is the natural key and should be so delcared:

IDENTITY is NOT an attribute of the hardware at all, in the majority of
cases it is used as a SURROGATE KEY.

It does look like in this instance there is no need for a surrogate key but
to make reading_timestamp a unique key is just folly and asking for trouble.

Cluster on reading_timestamp, device_id perhaps might make sense but not as
a key.

> You are still thinking about a file system which does not have the
> concept of relational keys and whose indexes are all of the same kind.
> In a sequentail file, you locate records (which are not rows) by a
> physical position number (which newbies fake with IDENTITY).

Surrogate key usage - its how we IMPLEMENT a good and well performing
LOGICAL MODEL, something you seem to have little experience actually doing.

> To improve performance, you can add optional indexes (secondary
> indexes) to a table.  This is vendor dependent and not part of the SQL
> Standard at all.

This news group is for MICROSOFT SQL SERVER, your posts should coming with a
warning, something like 'BEWARE, THE CONTENTS OF THIS POST REFER TO A
STANDARD VERSION OF SQL AND MAY NOT BE APPLICABLE TO MICROSOFT SQL SERVER'.

> It takes at least a year of full time SQL programming, a lot of reading
> and a good teacher to get the hang of it.  Could be worse; could be APL
> or LISP :)

It takes at least 3 year of full time developing with mentorship and
training to be a good developer and gain the experience necessary to work
alone..... Stuff you cannot get in the class room....

--
Tony Rogerson
SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL
Server Consultant
http://sqlserverfaq.com - free video tutorials


Show quote
"--CELKO--" <jcelko***@earthlink.net> wrote in message
news:1156868024.628111.250220@m79g2000cwm.googlegroups.com...
> Yofnik wrote:
>> Hello All,
>> I have a database table that stores the history of a data readings
>> taken from hardware devices. The hardware device is queried by software
>> once a minute (or more) and the value stored in the database for
>> trending an analysis.
>>
>> The table structure is:
>> CREATE TABLE DeviceData (
>>     [DataID] [bigint] IDENTITY (1, 1) NOT NULL ,
>>     [DeviceID] [int] NOT NULL ,
>>     [DataValue] [decimal](18, 4) NOT NULL ,
>>     [DataTimestamp] [datetime] NOT NULL
>> )
>>
>>> After running for a few weeks, the number of rows in the table is
>>> getting large as expected. I started to notice the queries against this
>>> table are taken longer to run. There is currently a primary key (DataID)
>>> and an index on DeviceID. I later realized that since most of the
>>> queries search for a particular date range, an index on DataTimestamp
>>> would definitely help. <<
>
> What have seen, but apparently do not realize is that your IDENTITY
> column is redundant and not a key at all.  It is an attribute of the
> hardware and has nothing to do with the data model.  The data_timestamp
> is the natural key and should be so delcared:
>
> CREATE TABLE DeviceData
> (reading_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
>        NOT NULL PRIMARY KEY,
> device_id INTEGER NOT NULL
>   REFERENCES Devices(device_id),
> device_reading DECIMAL (18,4) NOT NULL );
>
> Names like :"data_value" are vague; isn't what you really have the
> device readings in that column?
>
>>> Once I added the index, query times for newly added data greatly
>>> improved. The older data, however, still takes longer then I would
>>> like. My question is, once I added the index, is only new data that gets
>>> added to the table indexed? Is there any way to optimize the queries for
>>> the older data.<<
>
> If all the data are in the same table, they all are indexed.  First
> check the data to be sure that you really have unique timestamps.  Drop
> the redundant, exposed locator IDENTITY column.  Drop your indexes.
> Add a primary key constraint.
>
> I think that you have the wrong model of the world.  In SQL, there are
> DRI constraints which you **must have** if you want to have data
> integrity.  In SQL Server the uniqueness constraints are enforced by
> indexes.  Other products do it in other ways (hashing, bit vectors,
> etc.)  These are called primary indexes.  But implement a LOGICAL
> concept -- a relational key.
>
> To improve performance, you can add optional indexes (secondary
> indexes) to a table.  This is vendor dependent and not part of the SQL
> Standard at all.
>
> You are still thinking about a file system which does not have the
> concept of relational keys and whose indexes are all of the same kind.
> In a sequentail file, you locate records (which are not rows) by a
> physical position number (which newbies fake with IDENTITY).
>
> It takes at least a year of full time SQL programming, a lot of reading
> and a good teacher to get the hang of it.  Could be worse; could be APL
> or LISP :)
>
Author
29 Aug 2006 6:24 PM
Yofnik
Thank you all for your very helpful comments. I believe in my case, it
makes the most sense to cluster the datetime field.

I do have a question about a suggestion Tony made, however. Can someone
explain the difference and potential impact of making a clustered index
on reading_timestamp, device_id versus having a clustered index on
reading_timestamp and a normal index on device_id?
Author
29 Aug 2006 6:53 PM
Tony Rogerson
Hi Yofnik,

It depends on that type of queries you are performing and how the data is
inserted.

Remember, the data is physically stored in the order of the clustered index
(well, pretty much so anyway, on page it uses slots which mean the physical
order on the page may not be the same but for this that does not matter), if
you have a lot of devices with a single process inserting one devices data
then clustering by deviceid, datetime will reduce contention on the insert
pages, whereas clustering by entry datetime first puts everything together.

What type of queries are you running?

--
Tony Rogerson
SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL
Server Consultant
http://sqlserverfaq.com - free video tutorials


Show quote
"Yofnik" <yof***@comcast.net> wrote in message
news:1156875888.222183.312500@i3g2000cwc.googlegroups.com...
> Thank you all for your very helpful comments. I believe in my case, it
> makes the most sense to cluster the datetime field.
>
> I do have a question about a suggestion Tony made, however. Can someone
> explain the difference and potential impact of making a clustered index
> on reading_timestamp, device_id versus having a clustered index on
> reading_timestamp and a normal index on device_id?
>
Author
29 Aug 2006 7:16 PM
Yofnik
I am running the following queries on the table:

INSERT: These are executed in a ThreadPool thread by one or more
processes. Each thread will get a "batch" of data from a number of
different device ids to insert.

Select most recent: Finds the last device reading for a particular
device id. I was using MAX(DataID) to find this, but if I make the
DataTimestamp field the cluster index, I should probably change this to
MAX(DataTimestamp) right?

Select Range: Returns all data between two dates for analyzing. This
query looks like:
SELECT DataValue, DataTimestamp, DeviceID FROM DeviceData WHERE
DataTimestamp > @0 AND DataTimestamp <= @1 and DeviceID = @3 ORDER BY
DataTimestamp


Tony Rogerson wrote:
Show quote
> Hi Yofnik,
>
> It depends on that type of queries you are performing and how the data is
> inserted.
>
> Remember, the data is physically stored in the order of the clustered index
> (well, pretty much so anyway, on page it uses slots which mean the physical
> order on the page may not be the same but for this that does not matter), if
> you have a lot of devices with a single process inserting one devices data
> then clustering by deviceid, datetime will reduce contention on the insert
> pages, whereas clustering by entry datetime first puts everything together.
>
> What type of queries are you running?
>
> --
> Tony Rogerson
> SQL Server MVP
> http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL
> Server Consultant
> http://sqlserverfaq.com - free video tutorials
>
>
> "Yofnik" <yof***@comcast.net> wrote in message
> news:1156875888.222183.312500@i3g2000cwc.googlegroups.com...
> > Thank you all for your very helpful comments. I believe in my case, it
> > makes the most sense to cluster the datetime field.
> >
> > I do have a question about a suggestion Tony made, however. Can someone
> > explain the difference and potential impact of making a clustered index
> > on reading_timestamp, device_id versus having a clustered index on
> > reading_timestamp and a normal index on device_id?
> >
Author
29 Aug 2006 7:27 PM
Tony Rogerson
Hi Yofnik,

Yes, probs cluster on DAtaTimestamp, DeviceID in that case - don't make it
unique though unless you can guarentee that those combinations will give a
unique natural key.

You probably don't need DataID so get rid, unless you are using the ID
within the application, I don't like to use composite keys in the
application - I prefer surrogates, in your case the surrogate key is DataID
because its easier passing an integer around.

Tony.

--
Tony Rogerson
SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL
Server Consultant
http://sqlserverfaq.com - free video tutorials


Show quote
"Yofnik" <yof***@comcast.net> wrote in message
news:1156878975.812220.112860@h48g2000cwc.googlegroups.com...
>I am running the following queries on the table:
>
> INSERT: These are executed in a ThreadPool thread by one or more
> processes. Each thread will get a "batch" of data from a number of
> different device ids to insert.
>
> Select most recent: Finds the last device reading for a particular
> device id. I was using MAX(DataID) to find this, but if I make the
> DataTimestamp field the cluster index, I should probably change this to
> MAX(DataTimestamp) right?
>
> Select Range: Returns all data between two dates for analyzing. This
> query looks like:
> SELECT DataValue, DataTimestamp, DeviceID FROM DeviceData WHERE
> DataTimestamp > @0 AND DataTimestamp <= @1 and DeviceID = @3 ORDER BY
> DataTimestamp
>
>
> Tony Rogerson wrote:
>> Hi Yofnik,
>>
>> It depends on that type of queries you are performing and how the data is
>> inserted.
>>
>> Remember, the data is physically stored in the order of the clustered
>> index
>> (well, pretty much so anyway, on page it uses slots which mean the
>> physical
>> order on the page may not be the same but for this that does not matter),
>> if
>> you have a lot of devices with a single process inserting one devices
>> data
>> then clustering by deviceid, datetime will reduce contention on the
>> insert
>> pages, whereas clustering by entry datetime first puts everything
>> together.
>>
>> What type of queries are you running?
>>
>> --
>> Tony Rogerson
>> SQL Server MVP
>> http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a
>> SQL
>> Server Consultant
>> http://sqlserverfaq.com - free video tutorials
>>
>>
>> "Yofnik" <yof***@comcast.net> wrote in message
>> news:1156875888.222183.312500@i3g2000cwc.googlegroups.com...
>> > Thank you all for your very helpful comments. I believe in my case, it
>> > makes the most sense to cluster the datetime field.
>> >
>> > I do have a question about a suggestion Tony made, however. Can someone
>> > explain the difference and potential impact of making a clustered index
>> > on reading_timestamp, device_id versus having a clustered index on
>> > reading_timestamp and a normal index on device_id?
>> >
>
Author
29 Aug 2006 7:36 PM
Yofnik
Excellent. Thank you.
Author
30 Aug 2006 8:20 AM
Henrik Staun Poulsen
Yofnik,

Our system collects data from devices around the world. With lots of
different timezones.
If you do this as well, then I would like to recommend using UTC
datetimes in the database and in your devices.
Then change your client s/w to convert the UTC to local time.
Another benefit of using UTC is that there is no daylight saving stuff
to cater for.

Best regards,
Henrik Staun Poulsen
Author
30 Aug 2006 8:17 PM
Yofnik
Now that I made the index clustered on (DataTimestamp, DeviceID) my
query that returns the most recent data value for a particular device
is taking longer. The query I am using looks like this:

SELECT     DeviceID, DataValue, DataTimestamp
FROM        DeviceData A
WHERE     (DataTimestamp =
                          (SELECT     MAX(DataTimestamp)
                            FROM          DeviceData B
                            WHERE      A.DeviceID= B.DeviceID))

I am guessing the performance went down because I moved the cluster
from DeviceID to DataTimestamp. The line A.DeviceID=B.DeviceID must be
slowing it down.

Does anyone see a way to improve this?
Author
30 Aug 2006 8:58 PM
Tom Cooper
Do you have a nonclustered index on DeviceID?  If not, I would try adding
one.

If that doesn't Is it possible to restrict this query by time?  For example,
if your table has data for the last three years would it work for your
business problem to have the query be:

SELECT     DeviceID, DataValue, DataTimestamp
FROM        DeviceData A
WHERE     (DataTimestamp =
                          (SELECT     MAX(DataTimestamp)
                            FROM          DeviceData B
                            WHERE      A.DeviceID= B.DeviceID
                             AND B.DataTimeStamp >= '20060101'))
AND DataTimestamp >= '20060101'

Of course, that would mean that you would not return a row for a DeviceID
which had rows before 01/01/2006, but has no rows this year.  The more you
can restrict the dates you use, the faster the query should be.  Maybe you
can let the user enter how far back they want to go, then they will know
that if they ask for data since the first of the month that they will get
the result much faster than if they ask for everything since the beginning
of time.

You could help both queries by creating an indexed view, but that is
probably a very bad choice for you because you are constantly updating this
data.  Indexed views generally are most helpful when the data is queried a
lot, but UPDATEd or INSERTed very rarely.

Finally, since you only get one clustered index per table, you may have to
decide which of the two choices results in the best overall performance for
your system.

Tom

Show quote
"Yofnik" <yof***@comcast.net> wrote in message
news:1156969020.483914.47370@i3g2000cwc.googlegroups.com...
> Now that I made the index clustered on (DataTimestamp, DeviceID) my
> query that returns the most recent data value for a particular device
> is taking longer. The query I am using looks like this:
>
> SELECT     DeviceID, DataValue, DataTimestamp
> FROM        DeviceData A
> WHERE     (DataTimestamp =
>                          (SELECT     MAX(DataTimestamp)
>                            FROM          DeviceData B
>                            WHERE      A.DeviceID= B.DeviceID))
>
> I am guessing the performance went down because I moved the cluster
> from DeviceID to DataTimestamp. The line A.DeviceID=B.DeviceID must be
> slowing it down.
>
> Does anyone see a way to improve this?
>
Author
31 Aug 2006 7:14 AM
Tony Rogerson
Make sure you also have an index on DeviceID, DataTimestamp.

So, to recap.

Clustered index on DataTimestamp, DeviceID
NonClustered on DeviceID, DataTimestamp

You might also be better rewriting the query something like....

SELECT ...
FROM ( SELECT DeviceID, MaxDataTimestamp = MAX( DataTimestamp )
              FROM DeviceData
              GROUP BY DeviceID ) AS md
    INNER JOIN DeviceData dd ON
                dd.DataTimestamp = md.MaxDataTimestamp
        AND dd.DeviceId = md.DeviceID

--
Tony Rogerson
SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL
Server Consultant
http://sqlserverfaq.com - free video tutorials


Show quote
"Yofnik" <yof***@comcast.net> wrote in message
news:1156969020.483914.47370@i3g2000cwc.googlegroups.com...
> Now that I made the index clustered on (DataTimestamp, DeviceID) my
> query that returns the most recent data value for a particular device
> is taking longer. The query I am using looks like this:
>
> SELECT     DeviceID, DataValue, DataTimestamp
> FROM        DeviceData A
> WHERE     (DataTimestamp =
>                          (SELECT     MAX(DataTimestamp)
>                            FROM          DeviceData B
>                            WHERE      A.DeviceID= B.DeviceID))
>
> I am guessing the performance went down because I moved the cluster
> from DeviceID to DataTimestamp. The line A.DeviceID=B.DeviceID must be
> slowing it down.
>
> Does anyone see a way to improve this?
>
Author
31 Aug 2006 1:22 PM
Tracy McKibben
Yofnik wrote:
Show quote
> Now that I made the index clustered on (DataTimestamp, DeviceID) my
> query that returns the most recent data value for a particular device
> is taking longer. The query I am using looks like this:
>
> SELECT     DeviceID, DataValue, DataTimestamp
> FROM        DeviceData A
> WHERE     (DataTimestamp =
>                           (SELECT     MAX(DataTimestamp)
>                             FROM          DeviceData B
>                             WHERE      A.DeviceID= B.DeviceID))
>
> I am guessing the performance went down because I moved the cluster
> from DeviceID to DataTimestamp. The line A.DeviceID=B.DeviceID must be
> slowing it down.
>
> Does anyone see a way to improve this?
>

The clustered index was suggested because your original post stated that
you commonly query using date ranges.  This query isn't doing that, so
the benefit of the clustered index isn't being realized.

--
Tracy McKibben
MCDBA
http://www.realsqlguy.com

AddThis Social Bookmark Button