Home All Groups Group Topic Archive Search About

Why query optimizer does not select the index?

Author
22 Dec 2005 12:32 PM
Vagif Abilov
Hello,

I have a table with a couple of non-clustered indexes. One of them is a
single-column index on a time column. The table has several hundred
thousands records, so when I search for data within certain time range, I
expect the index to be used. However, SQL optimized only selects the index
if the search range is very narrow.

For example:

SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-01-02'

This search would result in just a few records, and optimization plan shows
that the right index (on Timestamp column) is used.

But if I extend time interval from a day to a month:

SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-02-01'

then the query would fetch a couple of thousands records, and query
optimizer falls to table scan.

It even does it when I use ORDER BY clause:

SELECT * FROM MyTable ORDER BY Timestamp

It still insists on using table scan and sorting data afterwards. If however
I modify the query to force it to use the right index:

SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp

then the execution time is reduced by 3 times - from 15 seconds to 5.

My question: when it's so obvious that use of index gives execution
benefits, why does query optimizer still try to use table scan?

Vagif Abilov
Oslo Norway

Author
22 Dec 2005 12:49 PM
Rick Sawtell
Show quote
"Vagif Abilov" <va***@online.no> wrote in message
news:OnSUENvBGHA.516@TK2MSFTNGP15.phx.gbl...
> Hello,
>
> I have a table with a couple of non-clustered indexes. One of them is a
> single-column index on a time column. The table has several hundred
> thousands records, so when I search for data within certain time range, I
> expect the index to be used. However, SQL optimized only selects the index
> if the search range is very narrow.
>
> For example:
>
> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND
> '2004-01-02'
>
> This search would result in just a few records, and optimization plan
> shows that the right index (on Timestamp column) is used.
>
> But if I extend time interval from a day to a month:
>
> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND
> '2004-02-01'
>
> then the query would fetch a couple of thousands records, and query
> optimizer falls to table scan.
>
> It even does it when I use ORDER BY clause:
>
> SELECT * FROM MyTable ORDER BY Timestamp
>
> It still insists on using table scan and sorting data afterwards. If
> however I modify the query to force it to use the right index:
>
> SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp
>
> then the execution time is reduced by 3 times - from 15 seconds to 5.
>
> My question: when it's so obvious that use of index gives execution
> benefits, why does query optimizer still try to use table scan?
>
> Vagif Abilov
> Oslo Norway
>

A couple of possible reasons here.

1.  If your index statistics are out of date, then the optimizer may be
making it's decisions on bad information.

For larger ranges of values, it may think that using the index will force
many bookmark lookups which can be expensive.  It may think that it is more
expensive to walk the index tree and do all of the bookmark lookups rather
than simply perform a table scan.  It depends on the number of rows returned
and the percentage of the total number of rows according to the statistics.

Update the statistics on the indexes and see what the optimizer does.

2.  Columns frequently searched on ranges (BETWEEN 100 and 200,  > 50, and
so forth) are good candidates for a clustered index.  You may see a big
improvement in speed if you put a clustered index on your TimeStamp column.

Rick Sawtell
MCT, MCSD, MCDBA
Author
22 Dec 2005 1:10 PM
Vagif Abilov
Thanks Rick,

I followed your advice and updated statistics on this table. No luck -
execution time got slightly better, but still not as good as when I force
the query to use the index. And execution plan still shows table scan
followed by the sort. Very strange since both table and index are just
trivial.

Vagif

Show quote
"Rick Sawtell" <Quicken***@msn.com> wrote in message
news:eKKJlYvBGHA.3452@TK2MSFTNGP10.phx.gbl...
>
> "Vagif Abilov" <va***@online.no> wrote in message
> news:OnSUENvBGHA.516@TK2MSFTNGP15.phx.gbl...
>> Hello,
>>
>> I have a table with a couple of non-clustered indexes. One of them is a
>> single-column index on a time column. The table has several hundred
>> thousands records, so when I search for data within certain time range, I
>> expect the index to be used. However, SQL optimized only selects the
>> index if the search range is very narrow.
>>
>> For example:
>>
>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND
>> '2004-01-02'
>>
>> This search would result in just a few records, and optimization plan
>> shows that the right index (on Timestamp column) is used.
>>
>> But if I extend time interval from a day to a month:
>>
>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND
>> '2004-02-01'
>>
>> then the query would fetch a couple of thousands records, and query
>> optimizer falls to table scan.
>>
>> It even does it when I use ORDER BY clause:
>>
>> SELECT * FROM MyTable ORDER BY Timestamp
>>
>> It still insists on using table scan and sorting data afterwards. If
>> however I modify the query to force it to use the right index:
>>
>> SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp
>>
>> then the execution time is reduced by 3 times - from 15 seconds to 5.
>>
>> My question: when it's so obvious that use of index gives execution
>> benefits, why does query optimizer still try to use table scan?
>>
>> Vagif Abilov
>> Oslo Norway
>>
>
> A couple of possible reasons here.
>
> 1.  If your index statistics are out of date, then the optimizer may be
> making it's decisions on bad information.
>
> For larger ranges of values, it may think that using the index will force
> many bookmark lookups which can be expensive.  It may think that it is
> more expensive to walk the index tree and do all of the bookmark lookups
> rather than simply perform a table scan.  It depends on the number of rows
> returned and the percentage of the total number of rows according to the
> statistics.
>
> Update the statistics on the indexes and see what the optimizer does.
>
> 2.  Columns frequently searched on ranges (BETWEEN 100 and 200,  > 50, and
> so forth) are good candidates for a clustered index.  You may see a big
> improvement in speed if you put a clustered index on your TimeStamp
> column.
>
> Rick Sawtell
> MCT, MCSD, MCDBA
>
>
>
Author
22 Dec 2005 12:50 PM
Uri Dimant
Hi
How is about

SELECT Timestamp, and other columns that have an index
FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-02-01'


Lookup COVERING indexes in the BOL





Show quote
"Vagif Abilov" <va***@online.no> wrote in message
news:OnSUENvBGHA.516@TK2MSFTNGP15.phx.gbl...
> Hello,
>
> I have a table with a couple of non-clustered indexes. One of them is a
> single-column index on a time column. The table has several hundred
> thousands records, so when I search for data within certain time range, I
> expect the index to be used. However, SQL optimized only selects the index
> if the search range is very narrow.
>
> For example:
>
> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND
> '2004-01-02'
>
> This search would result in just a few records, and optimization plan
> shows that the right index (on Timestamp column) is used.
>
> But if I extend time interval from a day to a month:
>
> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND
> '2004-02-01'
>
> then the query would fetch a couple of thousands records, and query
> optimizer falls to table scan.
>
> It even does it when I use ORDER BY clause:
>
> SELECT * FROM MyTable ORDER BY Timestamp
>
> It still insists on using table scan and sorting data afterwards. If
> however I modify the query to force it to use the right index:
>
> SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp
>
> then the execution time is reduced by 3 times - from 15 seconds to 5.
>
> My question: when it's so obvious that use of index gives execution
> benefits, why does query optimizer still try to use table scan?
>
> Vagif Abilov
> Oslo Norway
>
Author
22 Dec 2005 1:05 PM
Vagif Abilov
First of all, it's really bad if the order of columns in result list affect
the execution plan. It shouldn't.

Second, I made a test and as long as I fetch all the columns I need,
reordering them and placing the indexed column first does not affect the
execution plan. Only if reduce the number of columns to include only indexed
columns, then the execution plan changes to use the index that contains all
selected columns. So it's basically useless.

Vagif


Show quote
"Uri Dimant" <u***@iscar.co.il> wrote in message
news:uQEL3YvBGHA.2912@tk2msftngp13.phx.gbl...
> Hi
> How is about
>
> SELECT Timestamp, and other columns that have an index
> FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-02-01'
>
>
> Lookup COVERING indexes in the BOL
>
>
>
>
>
> "Vagif Abilov" <va***@online.no> wrote in message
> news:OnSUENvBGHA.516@TK2MSFTNGP15.phx.gbl...
>> Hello,
>>
>> I have a table with a couple of non-clustered indexes. One of them is a
>> single-column index on a time column. The table has several hundred
>> thousands records, so when I search for data within certain time range, I
>> expect the index to be used. However, SQL optimized only selects the
>> index if the search range is very narrow.
>>
>> For example:
>>
>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND
>> '2004-01-02'
>>
>> This search would result in just a few records, and optimization plan
>> shows that the right index (on Timestamp column) is used.
>>
>> But if I extend time interval from a day to a month:
>>
>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND
>> '2004-02-01'
>>
>> then the query would fetch a couple of thousands records, and query
>> optimizer falls to table scan.
>>
>> It even does it when I use ORDER BY clause:
>>
>> SELECT * FROM MyTable ORDER BY Timestamp
>>
>> It still insists on using table scan and sorting data afterwards. If
>> however I modify the query to force it to use the right index:
>>
>> SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp
>>
>> then the execution time is reduced by 3 times - from 15 seconds to 5.
>>
>> My question: when it's so obvious that use of index gives execution
>> benefits, why does query optimizer still try to use table scan?
>>
>> Vagif Abilov
>> Oslo Norway
>>
>
>
Author
22 Dec 2005 1:11 PM
Uri Dimant
> First of all, it's really bad if the order of columns in result list
> affect the execution plan. It shouldn't.

Yep , in the WHERE clause it  may happened . If you have an index on A,B,C
columns so SQL Server keeps statistics only on the first one which may or
may not affect the query's perfomance

Read this article

http://www.sql-server-performance.com/covering_indexes.asp




Show quote
"Vagif Abilov" <va***@online.no> wrote in message
news:Of2YHfvBGHA.1088@tk2msftngp13.phx.gbl...
> First of all, it's really bad if the order of columns in result list
> affect the execution plan. It shouldn't.
>
> Second, I made a test and as long as I fetch all the columns I need,
> reordering them and placing the indexed column first does not affect the
> execution plan. Only if reduce the number of columns to include only
> indexed columns, then the execution plan changes to use the index that
> contains all selected columns. So it's basically useless.
>
> Vagif
>
>
> "Uri Dimant" <u***@iscar.co.il> wrote in message
> news:uQEL3YvBGHA.2912@tk2msftngp13.phx.gbl...
>> Hi
>> How is about
>>
>> SELECT Timestamp, and other columns that have an index
>> FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-02-01'
>>
>>
>> Lookup COVERING indexes in the BOL
>>
>>
>>
>>
>>
>> "Vagif Abilov" <va***@online.no> wrote in message
>> news:OnSUENvBGHA.516@TK2MSFTNGP15.phx.gbl...
>>> Hello,
>>>
>>> I have a table with a couple of non-clustered indexes. One of them is a
>>> single-column index on a time column. The table has several hundred
>>> thousands records, so when I search for data within certain time range,
>>> I expect the index to be used. However, SQL optimized only selects the
>>> index if the search range is very narrow.
>>>
>>> For example:
>>>
>>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND
>>> '2004-01-02'
>>>
>>> This search would result in just a few records, and optimization plan
>>> shows that the right index (on Timestamp column) is used.
>>>
>>> But if I extend time interval from a day to a month:
>>>
>>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND
>>> '2004-02-01'
>>>
>>> then the query would fetch a couple of thousands records, and query
>>> optimizer falls to table scan.
>>>
>>> It even does it when I use ORDER BY clause:
>>>
>>> SELECT * FROM MyTable ORDER BY Timestamp
>>>
>>> It still insists on using table scan and sorting data afterwards. If
>>> however I modify the query to force it to use the right index:
>>>
>>> SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp
>>>
>>> then the execution time is reduced by 3 times - from 15 seconds to 5.
>>>
>>> My question: when it's so obvious that use of index gives execution
>>> benefits, why does query optimizer still try to use table scan?
>>>
>>> Vagif Abilov
>>> Oslo Norway
>>>
>>
>>
>
>
Author
22 Dec 2005 1:36 PM
Vagif Abilov
Thanks Uri,

I believe my case is simpler because in my test I don't even have WHERE
clause: select ... from MyTable order by Timestamp. What can be simpler? The
idea about covering index is interesting, but again, in my case may be
irrelevant: the indexed column is small (datetime), the fetched data contain
varchar column that is 500 char long. Making a covered index that would
contain this long text would be just a waste of space.

Vagif


Show quote
"Uri Dimant" <u***@iscar.co.il> wrote in message
news:%23eyF6kvBGHA.3604@TK2MSFTNGP09.phx.gbl...
>
>
>
>> First of all, it's really bad if the order of columns in result list
>> affect the execution plan. It shouldn't.
>
> Yep , in the WHERE clause it  may happened . If you have an index on A,B,C
> columns so SQL Server keeps statistics only on the first one which may or
> may not affect the query's perfomance
>
> Read this article
>
> http://www.sql-server-performance.com/covering_indexes.asp
>
>
>
>
> "Vagif Abilov" <va***@online.no> wrote in message
> news:Of2YHfvBGHA.1088@tk2msftngp13.phx.gbl...
>> First of all, it's really bad if the order of columns in result list
>> affect the execution plan. It shouldn't.
>>
>> Second, I made a test and as long as I fetch all the columns I need,
>> reordering them and placing the indexed column first does not affect the
>> execution plan. Only if reduce the number of columns to include only
>> indexed columns, then the execution plan changes to use the index that
>> contains all selected columns. So it's basically useless.
>>
>> Vagif
>>
>>
>> "Uri Dimant" <u***@iscar.co.il> wrote in message
>> news:uQEL3YvBGHA.2912@tk2msftngp13.phx.gbl...
>>> Hi
>>> How is about
>>>
>>> SELECT Timestamp, and other columns that have an index
>>> FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-02-01'
>>>
>>>
>>> Lookup COVERING indexes in the BOL
>>>
>>>
>>>
>>>
>>>
>>> "Vagif Abilov" <va***@online.no> wrote in message
>>> news:OnSUENvBGHA.516@TK2MSFTNGP15.phx.gbl...
>>>> Hello,
>>>>
>>>> I have a table with a couple of non-clustered indexes. One of them is a
>>>> single-column index on a time column. The table has several hundred
>>>> thousands records, so when I search for data within certain time range,
>>>> I expect the index to be used. However, SQL optimized only selects the
>>>> index if the search range is very narrow.
>>>>
>>>> For example:
>>>>
>>>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND
>>>> '2004-01-02'
>>>>
>>>> This search would result in just a few records, and optimization plan
>>>> shows that the right index (on Timestamp column) is used.
>>>>
>>>> But if I extend time interval from a day to a month:
>>>>
>>>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND
>>>> '2004-02-01'
>>>>
>>>> then the query would fetch a couple of thousands records, and query
>>>> optimizer falls to table scan.
>>>>
>>>> It even does it when I use ORDER BY clause:
>>>>
>>>> SELECT * FROM MyTable ORDER BY Timestamp
>>>>
>>>> It still insists on using table scan and sorting data afterwards. If
>>>> however I modify the query to force it to use the right index:
>>>>
>>>> SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp
>>>>
>>>> then the execution time is reduced by 3 times - from 15 seconds to 5.
>>>>
>>>> My question: when it's so obvious that use of index gives execution
>>>> benefits, why does query optimizer still try to use table scan?
>>>>
>>>> Vagif Abilov
>>>> Oslo Norway
>>>>
>>>
>>>
>>
>>
>
>
Author
22 Dec 2005 1:58 PM
Andrew J. Kelly
Vagif,

Sorry for not following the whole thread but there is a comment I want to
make on this aspect.  When you have no WHERE clause there is no option but
to retrieve all the rows in the table.  When the ORDER BY is not on a
clustered index you only have two choices on how to do that. One is to use a
nonclustered index on the datetime column and then do a Bookmark lookup for
each row to retrieve the rest of the data.  The other option is to read all
the pages in order that they appear in the file and then sort them.  While a
seek on a nonclustered index with a Bookmark lookup is pretty fast and cheap
for a few rows it becomes extremely expensive in terms of cost for large
numbers of rows.  Retrieving the rows via the physical order is very
inexpensive for large amounts of data, especially table scans.  Since the
optimizer is a cost based optimizer (looks at overall resource utilization)
I think you will find the cost is lower for the 2nd option over a bunch of
seeks.  You can certainly see this if you SET STATISTICS IO ON and run each
query.  I bet the seek option will have many more I/O's than the scan.  This
does not mean it will be faster. In some cases it is slower.  It is always a
trade off between time and resources. When it comes to what the user wants
some times the optimizer gets it right and some times it is wrong.  Usually
it is right but there will always be cases when the conditions are such that
it isn't as fast one way vs the other.  The key is always returning the
fewest rows you actually need to ensure the optimizer can do it's best.  In
this case it is up to you to determine if speed or resource utilization is
more important. If you want speed than maybe you can use the hint.  The
other thing to be careful of that I see a lot when people are trying to tune
queries like this is cached plans.  If you run a query that has a BETWEEN
with only a few rows and it chooses a SEEK then the next time you run the
query with a wider range it may reuse that plan and still do a SEEK when a
scan was more appropriate or visa versa.  So keep that in mind when testing
as well.

--
Andrew J. Kelly  SQL MVP


Show quote
"Vagif Abilov" <va***@online.no> wrote in message
news:esqiwwvBGHA.1008@TK2MSFTNGP12.phx.gbl...
> Thanks Uri,
>
> I believe my case is simpler because in my test I don't even have WHERE
> clause: select ... from MyTable order by Timestamp. What can be simpler?
> The idea about covering index is interesting, but again, in my case may be
> irrelevant: the indexed column is small (datetime), the fetched data
> contain varchar column that is 500 char long. Making a covered index that
> would contain this long text would be just a waste of space.
>
> Vagif
>
>
> "Uri Dimant" <u***@iscar.co.il> wrote in message
> news:%23eyF6kvBGHA.3604@TK2MSFTNGP09.phx.gbl...
>>
>>
>>
>>> First of all, it's really bad if the order of columns in result list
>>> affect the execution plan. It shouldn't.
>>
>> Yep , in the WHERE clause it  may happened . If you have an index on
>> A,B,C columns so SQL Server keeps statistics only on the first one which
>> may or may not affect the query's perfomance
>>
>> Read this article
>>
>> http://www.sql-server-performance.com/covering_indexes.asp
>>
>>
>>
>>
>> "Vagif Abilov" <va***@online.no> wrote in message
>> news:Of2YHfvBGHA.1088@tk2msftngp13.phx.gbl...
>>> First of all, it's really bad if the order of columns in result list
>>> affect the execution plan. It shouldn't.
>>>
>>> Second, I made a test and as long as I fetch all the columns I need,
>>> reordering them and placing the indexed column first does not affect the
>>> execution plan. Only if reduce the number of columns to include only
>>> indexed columns, then the execution plan changes to use the index that
>>> contains all selected columns. So it's basically useless.
>>>
>>> Vagif
>>>
>>>
>>> "Uri Dimant" <u***@iscar.co.il> wrote in message
>>> news:uQEL3YvBGHA.2912@tk2msftngp13.phx.gbl...
>>>> Hi
>>>> How is about
>>>>
>>>> SELECT Timestamp, and other columns that have an index
>>>> FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-02-01'
>>>>
>>>>
>>>> Lookup COVERING indexes in the BOL
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> "Vagif Abilov" <va***@online.no> wrote in message
>>>> news:OnSUENvBGHA.516@TK2MSFTNGP15.phx.gbl...
>>>>> Hello,
>>>>>
>>>>> I have a table with a couple of non-clustered indexes. One of them is
>>>>> a single-column index on a time column. The table has several hundred
>>>>> thousands records, so when I search for data within certain time
>>>>> range, I expect the index to be used. However, SQL optimized only
>>>>> selects the index if the search range is very narrow.
>>>>>
>>>>> For example:
>>>>>
>>>>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND
>>>>> '2004-01-02'
>>>>>
>>>>> This search would result in just a few records, and optimization plan
>>>>> shows that the right index (on Timestamp column) is used.
>>>>>
>>>>> But if I extend time interval from a day to a month:
>>>>>
>>>>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND
>>>>> '2004-02-01'
>>>>>
>>>>> then the query would fetch a couple of thousands records, and query
>>>>> optimizer falls to table scan.
>>>>>
>>>>> It even does it when I use ORDER BY clause:
>>>>>
>>>>> SELECT * FROM MyTable ORDER BY Timestamp
>>>>>
>>>>> It still insists on using table scan and sorting data afterwards. If
>>>>> however I modify the query to force it to use the right index:
>>>>>
>>>>> SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp
>>>>>
>>>>> then the execution time is reduced by 3 times - from 15 seconds to 5.
>>>>>
>>>>> My question: when it's so obvious that use of index gives execution
>>>>> benefits, why does query optimizer still try to use table scan?
>>>>>
>>>>> Vagif Abilov
>>>>> Oslo Norway
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
Author
22 Dec 2005 3:48 PM
Vagif Abilov
Andrew,

Thank you very much for the clarification. You are quite right - just making
the index clustered was enough for SQL server to start using it
appropriately. I think I also understand better the motivation for selecting
query execution plan.

Vagif


Show quote
"Andrew J. Kelly" <sqlmvpnooospam@shadhawk.com> wrote in message
news:uIPlQ$vBGHA.344@TK2MSFTNGP11.phx.gbl...
> Vagif,
>
> Sorry for not following the whole thread but there is a comment I want to
> make on this aspect.  When you have no WHERE clause there is no option but
> to retrieve all the rows in the table.  When the ORDER BY is not on a
> clustered index you only have two choices on how to do that. One is to use
> a nonclustered index on the datetime column and then do a Bookmark lookup
> for each row to retrieve the rest of the data.  The other option is to
> read all the pages in order that they appear in the file and then sort
> them.  While a seek on a nonclustered index with a Bookmark lookup is
> pretty fast and cheap for a few rows it becomes extremely expensive in
> terms of cost for large numbers of rows.  Retrieving the rows via the
> physical order is very inexpensive for large amounts of data, especially
> table scans.  Since the optimizer is a cost based optimizer (looks at
> overall resource utilization) I think you will find the cost is lower for
> the 2nd option over a bunch of seeks.  You can certainly see this if you
> SET STATISTICS IO ON and run each query.  I bet the seek option will have
> many more I/O's than the scan.  This does not mean it will be faster. In
> some cases it is slower.  It is always a trade off between time and
> resources. When it comes to what the user wants some times the optimizer
> gets it right and some times it is wrong.  Usually it is right but there
> will always be cases when the conditions are such that it isn't as fast
> one way vs the other.  The key is always returning the fewest rows you
> actually need to ensure the optimizer can do it's best.  In this case it
> is up to you to determine if speed or resource utilization is more
> important. If you want speed than maybe you can use the hint.  The other
> thing to be careful of that I see a lot when people are trying to tune
> queries like this is cached plans.  If you run a query that has a BETWEEN
> with only a few rows and it chooses a SEEK then the next time you run the
> query with a wider range it may reuse that plan and still do a SEEK when a
> scan was more appropriate or visa versa.  So keep that in mind when
> testing as well.
>
> --
> Andrew J. Kelly  SQL MVP
>
>
> "Vagif Abilov" <va***@online.no> wrote in message
> news:esqiwwvBGHA.1008@TK2MSFTNGP12.phx.gbl...
>> Thanks Uri,
>>
>> I believe my case is simpler because in my test I don't even have WHERE
>> clause: select ... from MyTable order by Timestamp. What can be simpler?
>> The idea about covering index is interesting, but again, in my case may
>> be irrelevant: the indexed column is small (datetime), the fetched data
>> contain varchar column that is 500 char long. Making a covered index that
>> would contain this long text would be just a waste of space.
>>
>> Vagif
>>
>>
>> "Uri Dimant" <u***@iscar.co.il> wrote in message
>> news:%23eyF6kvBGHA.3604@TK2MSFTNGP09.phx.gbl...
>>>
>>>
>>>
>>>> First of all, it's really bad if the order of columns in result list
>>>> affect the execution plan. It shouldn't.
>>>
>>> Yep , in the WHERE clause it  may happened . If you have an index on
>>> A,B,C columns so SQL Server keeps statistics only on the first one which
>>> may or may not affect the query's perfomance
>>>
>>> Read this article
>>>
>>> http://www.sql-server-performance.com/covering_indexes.asp
>>>
>>>
>>>
>>>
>>> "Vagif Abilov" <va***@online.no> wrote in message
>>> news:Of2YHfvBGHA.1088@tk2msftngp13.phx.gbl...
>>>> First of all, it's really bad if the order of columns in result list
>>>> affect the execution plan. It shouldn't.
>>>>
>>>> Second, I made a test and as long as I fetch all the columns I need,
>>>> reordering them and placing the indexed column first does not affect
>>>> the execution plan. Only if reduce the number of columns to include
>>>> only indexed columns, then the execution plan changes to use the index
>>>> that contains all selected columns. So it's basically useless.
>>>>
>>>> Vagif
>>>>
>>>>
>>>> "Uri Dimant" <u***@iscar.co.il> wrote in message
>>>> news:uQEL3YvBGHA.2912@tk2msftngp13.phx.gbl...
>>>>> Hi
>>>>> How is about
>>>>>
>>>>> SELECT Timestamp, and other columns that have an index
>>>>> FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-02-01'
>>>>>
>>>>>
>>>>> Lookup COVERING indexes in the BOL
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> "Vagif Abilov" <va***@online.no> wrote in message
>>>>> news:OnSUENvBGHA.516@TK2MSFTNGP15.phx.gbl...
>>>>>> Hello,
>>>>>>
>>>>>> I have a table with a couple of non-clustered indexes. One of them is
>>>>>> a single-column index on a time column. The table has several hundred
>>>>>> thousands records, so when I search for data within certain time
>>>>>> range, I expect the index to be used. However, SQL optimized only
>>>>>> selects the index if the search range is very narrow.
>>>>>>
>>>>>> For example:
>>>>>>
>>>>>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND
>>>>>> '2004-01-02'
>>>>>>
>>>>>> This search would result in just a few records, and optimization plan
>>>>>> shows that the right index (on Timestamp column) is used.
>>>>>>
>>>>>> But if I extend time interval from a day to a month:
>>>>>>
>>>>>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND
>>>>>> '2004-02-01'
>>>>>>
>>>>>> then the query would fetch a couple of thousands records, and query
>>>>>> optimizer falls to table scan.
>>>>>>
>>>>>> It even does it when I use ORDER BY clause:
>>>>>>
>>>>>> SELECT * FROM MyTable ORDER BY Timestamp
>>>>>>
>>>>>> It still insists on using table scan and sorting data afterwards. If
>>>>>> however I modify the query to force it to use the right index:
>>>>>>
>>>>>> SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp
>>>>>>
>>>>>> then the execution time is reduced by 3 times - from 15 seconds to 5.
>>>>>>
>>>>>> My question: when it's so obvious that use of index gives execution
>>>>>> benefits, why does query optimizer still try to use table scan?
>>>>>>
>>>>>> Vagif Abilov
>>>>>> Oslo Norway
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
Author
22 Dec 2005 10:43 PM
Gert-Jan Strik
Vagif,

For SQL Server, a read from disk (a data cache miss) is about 1,000 to
10,000 times slower then a read from memory (a data cache hit). For that
reason, the optimizer tries to minize the number of reads. In addition
to that, the optimizer also knows that random I/O is slower than
sequential I/O, and will therefore prefer sequential reads (such as
those caused by an index scan) over random reads (such as bookmark
lookups).

Because of all this, the optimizer basically creates a query plan that
has the best performance when you have an empty data cache. So if you
restart SQL Server (or clear the data cache), you will find that the
optimizer determines the most efficient query plan.

If you have a hot cache (most relevant data is in memory), then the
optimizer will overestimate the cost of random reads and bookmark
lookups. This is probably what you have been seeing.

If you can change the index on Timestamp to be the clustered index, then
that is a very good option. If this is not an option, and you know for
sure that the required data is in memory all of the time (or most of the
time), then you could consider using index hints.

HTH,
Gert-Jan


Vagif Abilov wrote:
Show quote
>
> Hello,
>
> I have a table with a couple of non-clustered indexes. One of them is a
> single-column index on a time column. The table has several hundred
> thousands records, so when I search for data within certain time range, I
> expect the index to be used. However, SQL optimized only selects the index
> if the search range is very narrow.
>
> For example:
>
> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-01-02'
>
> This search would result in just a few records, and optimization plan shows
> that the right index (on Timestamp column) is used.
>
> But if I extend time interval from a day to a month:
>
> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-02-01'
>
> then the query would fetch a couple of thousands records, and query
> optimizer falls to table scan.
>
> It even does it when I use ORDER BY clause:
>
> SELECT * FROM MyTable ORDER BY Timestamp
>
> It still insists on using table scan and sorting data afterwards. If however
> I modify the query to force it to use the right index:
>
> SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp
>
> then the execution time is reduced by 3 times - from 15 seconds to 5.
>
> My question: when it's so obvious that use of index gives execution
> benefits, why does query optimizer still try to use table scan?
>
> Vagif Abilov
> Oslo Norway

AddThis Social Bookmark Button