|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Why query optimizer does not select the index?I have a table with a couple of non-clustered indexes. One of them is a single-column index on a time column. The table has several hundred thousands records, so when I search for data within certain time range, I expect the index to be used. However, SQL optimized only selects the index if the search range is very narrow. For example: SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-01-02' This search would result in just a few records, and optimization plan shows that the right index (on Timestamp column) is used. But if I extend time interval from a day to a month: SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-02-01' then the query would fetch a couple of thousands records, and query optimizer falls to table scan. It even does it when I use ORDER BY clause: SELECT * FROM MyTable ORDER BY Timestamp It still insists on using table scan and sorting data afterwards. If however I modify the query to force it to use the right index: SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp then the execution time is reduced by 3 times - from 15 seconds to 5. My question: when it's so obvious that use of index gives execution benefits, why does query optimizer still try to use table scan? Vagif Abilov Oslo Norway
Show quote
"Vagif Abilov" <va***@online.no> wrote in message A couple of possible reasons here.news:OnSUENvBGHA.516@TK2MSFTNGP15.phx.gbl... > Hello, > > I have a table with a couple of non-clustered indexes. One of them is a > single-column index on a time column. The table has several hundred > thousands records, so when I search for data within certain time range, I > expect the index to be used. However, SQL optimized only selects the index > if the search range is very narrow. > > For example: > > SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND > '2004-01-02' > > This search would result in just a few records, and optimization plan > shows that the right index (on Timestamp column) is used. > > But if I extend time interval from a day to a month: > > SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND > '2004-02-01' > > then the query would fetch a couple of thousands records, and query > optimizer falls to table scan. > > It even does it when I use ORDER BY clause: > > SELECT * FROM MyTable ORDER BY Timestamp > > It still insists on using table scan and sorting data afterwards. If > however I modify the query to force it to use the right index: > > SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp > > then the execution time is reduced by 3 times - from 15 seconds to 5. > > My question: when it's so obvious that use of index gives execution > benefits, why does query optimizer still try to use table scan? > > Vagif Abilov > Oslo Norway > 1. If your index statistics are out of date, then the optimizer may be making it's decisions on bad information. For larger ranges of values, it may think that using the index will force many bookmark lookups which can be expensive. It may think that it is more expensive to walk the index tree and do all of the bookmark lookups rather than simply perform a table scan. It depends on the number of rows returned and the percentage of the total number of rows according to the statistics. Update the statistics on the indexes and see what the optimizer does. 2. Columns frequently searched on ranges (BETWEEN 100 and 200, > 50, and so forth) are good candidates for a clustered index. You may see a big improvement in speed if you put a clustered index on your TimeStamp column. Rick Sawtell MCT, MCSD, MCDBA Thanks Rick,
I followed your advice and updated statistics on this table. No luck - execution time got slightly better, but still not as good as when I force the query to use the index. And execution plan still shows table scan followed by the sort. Very strange since both table and index are just trivial. Vagif Show quote "Rick Sawtell" <Quicken***@msn.com> wrote in message news:eKKJlYvBGHA.3452@TK2MSFTNGP10.phx.gbl... > > "Vagif Abilov" <va***@online.no> wrote in message > news:OnSUENvBGHA.516@TK2MSFTNGP15.phx.gbl... >> Hello, >> >> I have a table with a couple of non-clustered indexes. One of them is a >> single-column index on a time column. The table has several hundred >> thousands records, so when I search for data within certain time range, I >> expect the index to be used. However, SQL optimized only selects the >> index if the search range is very narrow. >> >> For example: >> >> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND >> '2004-01-02' >> >> This search would result in just a few records, and optimization plan >> shows that the right index (on Timestamp column) is used. >> >> But if I extend time interval from a day to a month: >> >> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND >> '2004-02-01' >> >> then the query would fetch a couple of thousands records, and query >> optimizer falls to table scan. >> >> It even does it when I use ORDER BY clause: >> >> SELECT * FROM MyTable ORDER BY Timestamp >> >> It still insists on using table scan and sorting data afterwards. If >> however I modify the query to force it to use the right index: >> >> SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp >> >> then the execution time is reduced by 3 times - from 15 seconds to 5. >> >> My question: when it's so obvious that use of index gives execution >> benefits, why does query optimizer still try to use table scan? >> >> Vagif Abilov >> Oslo Norway >> > > A couple of possible reasons here. > > 1. If your index statistics are out of date, then the optimizer may be > making it's decisions on bad information. > > For larger ranges of values, it may think that using the index will force > many bookmark lookups which can be expensive. It may think that it is > more expensive to walk the index tree and do all of the bookmark lookups > rather than simply perform a table scan. It depends on the number of rows > returned and the percentage of the total number of rows according to the > statistics. > > Update the statistics on the indexes and see what the optimizer does. > > 2. Columns frequently searched on ranges (BETWEEN 100 and 200, > 50, and > so forth) are good candidates for a clustered index. You may see a big > improvement in speed if you put a clustered index on your TimeStamp > column. > > Rick Sawtell > MCT, MCSD, MCDBA > > > Hi
How is about SELECT Timestamp, and other columns that have an index FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-02-01' Lookup COVERING indexes in the BOL Show quote "Vagif Abilov" <va***@online.no> wrote in message news:OnSUENvBGHA.516@TK2MSFTNGP15.phx.gbl... > Hello, > > I have a table with a couple of non-clustered indexes. One of them is a > single-column index on a time column. The table has several hundred > thousands records, so when I search for data within certain time range, I > expect the index to be used. However, SQL optimized only selects the index > if the search range is very narrow. > > For example: > > SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND > '2004-01-02' > > This search would result in just a few records, and optimization plan > shows that the right index (on Timestamp column) is used. > > But if I extend time interval from a day to a month: > > SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND > '2004-02-01' > > then the query would fetch a couple of thousands records, and query > optimizer falls to table scan. > > It even does it when I use ORDER BY clause: > > SELECT * FROM MyTable ORDER BY Timestamp > > It still insists on using table scan and sorting data afterwards. If > however I modify the query to force it to use the right index: > > SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp > > then the execution time is reduced by 3 times - from 15 seconds to 5. > > My question: when it's so obvious that use of index gives execution > benefits, why does query optimizer still try to use table scan? > > Vagif Abilov > Oslo Norway > First of all, it's really bad if the order of columns in result list affect
the execution plan. It shouldn't. Second, I made a test and as long as I fetch all the columns I need, reordering them and placing the indexed column first does not affect the execution plan. Only if reduce the number of columns to include only indexed columns, then the execution plan changes to use the index that contains all selected columns. So it's basically useless. Vagif Show quote "Uri Dimant" <u***@iscar.co.il> wrote in message news:uQEL3YvBGHA.2912@tk2msftngp13.phx.gbl... > Hi > How is about > > SELECT Timestamp, and other columns that have an index > FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-02-01' > > > Lookup COVERING indexes in the BOL > > > > > > "Vagif Abilov" <va***@online.no> wrote in message > news:OnSUENvBGHA.516@TK2MSFTNGP15.phx.gbl... >> Hello, >> >> I have a table with a couple of non-clustered indexes. One of them is a >> single-column index on a time column. The table has several hundred >> thousands records, so when I search for data within certain time range, I >> expect the index to be used. However, SQL optimized only selects the >> index if the search range is very narrow. >> >> For example: >> >> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND >> '2004-01-02' >> >> This search would result in just a few records, and optimization plan >> shows that the right index (on Timestamp column) is used. >> >> But if I extend time interval from a day to a month: >> >> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND >> '2004-02-01' >> >> then the query would fetch a couple of thousands records, and query >> optimizer falls to table scan. >> >> It even does it when I use ORDER BY clause: >> >> SELECT * FROM MyTable ORDER BY Timestamp >> >> It still insists on using table scan and sorting data afterwards. If >> however I modify the query to force it to use the right index: >> >> SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp >> >> then the execution time is reduced by 3 times - from 15 seconds to 5. >> >> My question: when it's so obvious that use of index gives execution >> benefits, why does query optimizer still try to use table scan? >> >> Vagif Abilov >> Oslo Norway >> > > > First of all, it's really bad if the order of columns in result list Yep , in the WHERE clause it may happened . If you have an index on A,B,C > affect the execution plan. It shouldn't. columns so SQL Server keeps statistics only on the first one which may or may not affect the query's perfomance Read this article http://www.sql-server-performance.com/covering_indexes.asp Show quote "Vagif Abilov" <va***@online.no> wrote in message news:Of2YHfvBGHA.1088@tk2msftngp13.phx.gbl... > First of all, it's really bad if the order of columns in result list > affect the execution plan. It shouldn't. > > Second, I made a test and as long as I fetch all the columns I need, > reordering them and placing the indexed column first does not affect the > execution plan. Only if reduce the number of columns to include only > indexed columns, then the execution plan changes to use the index that > contains all selected columns. So it's basically useless. > > Vagif > > > "Uri Dimant" <u***@iscar.co.il> wrote in message > news:uQEL3YvBGHA.2912@tk2msftngp13.phx.gbl... >> Hi >> How is about >> >> SELECT Timestamp, and other columns that have an index >> FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-02-01' >> >> >> Lookup COVERING indexes in the BOL >> >> >> >> >> >> "Vagif Abilov" <va***@online.no> wrote in message >> news:OnSUENvBGHA.516@TK2MSFTNGP15.phx.gbl... >>> Hello, >>> >>> I have a table with a couple of non-clustered indexes. One of them is a >>> single-column index on a time column. The table has several hundred >>> thousands records, so when I search for data within certain time range, >>> I expect the index to be used. However, SQL optimized only selects the >>> index if the search range is very narrow. >>> >>> For example: >>> >>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND >>> '2004-01-02' >>> >>> This search would result in just a few records, and optimization plan >>> shows that the right index (on Timestamp column) is used. >>> >>> But if I extend time interval from a day to a month: >>> >>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND >>> '2004-02-01' >>> >>> then the query would fetch a couple of thousands records, and query >>> optimizer falls to table scan. >>> >>> It even does it when I use ORDER BY clause: >>> >>> SELECT * FROM MyTable ORDER BY Timestamp >>> >>> It still insists on using table scan and sorting data afterwards. If >>> however I modify the query to force it to use the right index: >>> >>> SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp >>> >>> then the execution time is reduced by 3 times - from 15 seconds to 5. >>> >>> My question: when it's so obvious that use of index gives execution >>> benefits, why does query optimizer still try to use table scan? >>> >>> Vagif Abilov >>> Oslo Norway >>> >> >> > > Thanks Uri,
I believe my case is simpler because in my test I don't even have WHERE clause: select ... from MyTable order by Timestamp. What can be simpler? The idea about covering index is interesting, but again, in my case may be irrelevant: the indexed column is small (datetime), the fetched data contain varchar column that is 500 char long. Making a covered index that would contain this long text would be just a waste of space. Vagif Show quote "Uri Dimant" <u***@iscar.co.il> wrote in message news:%23eyF6kvBGHA.3604@TK2MSFTNGP09.phx.gbl... > > > >> First of all, it's really bad if the order of columns in result list >> affect the execution plan. It shouldn't. > > Yep , in the WHERE clause it may happened . If you have an index on A,B,C > columns so SQL Server keeps statistics only on the first one which may or > may not affect the query's perfomance > > Read this article > > http://www.sql-server-performance.com/covering_indexes.asp > > > > > "Vagif Abilov" <va***@online.no> wrote in message > news:Of2YHfvBGHA.1088@tk2msftngp13.phx.gbl... >> First of all, it's really bad if the order of columns in result list >> affect the execution plan. It shouldn't. >> >> Second, I made a test and as long as I fetch all the columns I need, >> reordering them and placing the indexed column first does not affect the >> execution plan. Only if reduce the number of columns to include only >> indexed columns, then the execution plan changes to use the index that >> contains all selected columns. So it's basically useless. >> >> Vagif >> >> >> "Uri Dimant" <u***@iscar.co.il> wrote in message >> news:uQEL3YvBGHA.2912@tk2msftngp13.phx.gbl... >>> Hi >>> How is about >>> >>> SELECT Timestamp, and other columns that have an index >>> FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-02-01' >>> >>> >>> Lookup COVERING indexes in the BOL >>> >>> >>> >>> >>> >>> "Vagif Abilov" <va***@online.no> wrote in message >>> news:OnSUENvBGHA.516@TK2MSFTNGP15.phx.gbl... >>>> Hello, >>>> >>>> I have a table with a couple of non-clustered indexes. One of them is a >>>> single-column index on a time column. The table has several hundred >>>> thousands records, so when I search for data within certain time range, >>>> I expect the index to be used. However, SQL optimized only selects the >>>> index if the search range is very narrow. >>>> >>>> For example: >>>> >>>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND >>>> '2004-01-02' >>>> >>>> This search would result in just a few records, and optimization plan >>>> shows that the right index (on Timestamp column) is used. >>>> >>>> But if I extend time interval from a day to a month: >>>> >>>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND >>>> '2004-02-01' >>>> >>>> then the query would fetch a couple of thousands records, and query >>>> optimizer falls to table scan. >>>> >>>> It even does it when I use ORDER BY clause: >>>> >>>> SELECT * FROM MyTable ORDER BY Timestamp >>>> >>>> It still insists on using table scan and sorting data afterwards. If >>>> however I modify the query to force it to use the right index: >>>> >>>> SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp >>>> >>>> then the execution time is reduced by 3 times - from 15 seconds to 5. >>>> >>>> My question: when it's so obvious that use of index gives execution >>>> benefits, why does query optimizer still try to use table scan? >>>> >>>> Vagif Abilov >>>> Oslo Norway >>>> >>> >>> >> >> > > Vagif,
Sorry for not following the whole thread but there is a comment I want to make on this aspect. When you have no WHERE clause there is no option but to retrieve all the rows in the table. When the ORDER BY is not on a clustered index you only have two choices on how to do that. One is to use a nonclustered index on the datetime column and then do a Bookmark lookup for each row to retrieve the rest of the data. The other option is to read all the pages in order that they appear in the file and then sort them. While a seek on a nonclustered index with a Bookmark lookup is pretty fast and cheap for a few rows it becomes extremely expensive in terms of cost for large numbers of rows. Retrieving the rows via the physical order is very inexpensive for large amounts of data, especially table scans. Since the optimizer is a cost based optimizer (looks at overall resource utilization) I think you will find the cost is lower for the 2nd option over a bunch of seeks. You can certainly see this if you SET STATISTICS IO ON and run each query. I bet the seek option will have many more I/O's than the scan. This does not mean it will be faster. In some cases it is slower. It is always a trade off between time and resources. When it comes to what the user wants some times the optimizer gets it right and some times it is wrong. Usually it is right but there will always be cases when the conditions are such that it isn't as fast one way vs the other. The key is always returning the fewest rows you actually need to ensure the optimizer can do it's best. In this case it is up to you to determine if speed or resource utilization is more important. If you want speed than maybe you can use the hint. The other thing to be careful of that I see a lot when people are trying to tune queries like this is cached plans. If you run a query that has a BETWEEN with only a few rows and it chooses a SEEK then the next time you run the query with a wider range it may reuse that plan and still do a SEEK when a scan was more appropriate or visa versa. So keep that in mind when testing as well. -- Show quoteAndrew J. Kelly SQL MVP "Vagif Abilov" <va***@online.no> wrote in message news:esqiwwvBGHA.1008@TK2MSFTNGP12.phx.gbl... > Thanks Uri, > > I believe my case is simpler because in my test I don't even have WHERE > clause: select ... from MyTable order by Timestamp. What can be simpler? > The idea about covering index is interesting, but again, in my case may be > irrelevant: the indexed column is small (datetime), the fetched data > contain varchar column that is 500 char long. Making a covered index that > would contain this long text would be just a waste of space. > > Vagif > > > "Uri Dimant" <u***@iscar.co.il> wrote in message > news:%23eyF6kvBGHA.3604@TK2MSFTNGP09.phx.gbl... >> >> >> >>> First of all, it's really bad if the order of columns in result list >>> affect the execution plan. It shouldn't. >> >> Yep , in the WHERE clause it may happened . If you have an index on >> A,B,C columns so SQL Server keeps statistics only on the first one which >> may or may not affect the query's perfomance >> >> Read this article >> >> http://www.sql-server-performance.com/covering_indexes.asp >> >> >> >> >> "Vagif Abilov" <va***@online.no> wrote in message >> news:Of2YHfvBGHA.1088@tk2msftngp13.phx.gbl... >>> First of all, it's really bad if the order of columns in result list >>> affect the execution plan. It shouldn't. >>> >>> Second, I made a test and as long as I fetch all the columns I need, >>> reordering them and placing the indexed column first does not affect the >>> execution plan. Only if reduce the number of columns to include only >>> indexed columns, then the execution plan changes to use the index that >>> contains all selected columns. So it's basically useless. >>> >>> Vagif >>> >>> >>> "Uri Dimant" <u***@iscar.co.il> wrote in message >>> news:uQEL3YvBGHA.2912@tk2msftngp13.phx.gbl... >>>> Hi >>>> How is about >>>> >>>> SELECT Timestamp, and other columns that have an index >>>> FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-02-01' >>>> >>>> >>>> Lookup COVERING indexes in the BOL >>>> >>>> >>>> >>>> >>>> >>>> "Vagif Abilov" <va***@online.no> wrote in message >>>> news:OnSUENvBGHA.516@TK2MSFTNGP15.phx.gbl... >>>>> Hello, >>>>> >>>>> I have a table with a couple of non-clustered indexes. One of them is >>>>> a single-column index on a time column. The table has several hundred >>>>> thousands records, so when I search for data within certain time >>>>> range, I expect the index to be used. However, SQL optimized only >>>>> selects the index if the search range is very narrow. >>>>> >>>>> For example: >>>>> >>>>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND >>>>> '2004-01-02' >>>>> >>>>> This search would result in just a few records, and optimization plan >>>>> shows that the right index (on Timestamp column) is used. >>>>> >>>>> But if I extend time interval from a day to a month: >>>>> >>>>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND >>>>> '2004-02-01' >>>>> >>>>> then the query would fetch a couple of thousands records, and query >>>>> optimizer falls to table scan. >>>>> >>>>> It even does it when I use ORDER BY clause: >>>>> >>>>> SELECT * FROM MyTable ORDER BY Timestamp >>>>> >>>>> It still insists on using table scan and sorting data afterwards. If >>>>> however I modify the query to force it to use the right index: >>>>> >>>>> SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp >>>>> >>>>> then the execution time is reduced by 3 times - from 15 seconds to 5. >>>>> >>>>> My question: when it's so obvious that use of index gives execution >>>>> benefits, why does query optimizer still try to use table scan? >>>>> >>>>> Vagif Abilov >>>>> Oslo Norway >>>>> >>>> >>>> >>> >>> >> >> > > Andrew,
Thank you very much for the clarification. You are quite right - just making the index clustered was enough for SQL server to start using it appropriately. I think I also understand better the motivation for selecting query execution plan. Vagif Show quote "Andrew J. Kelly" <sqlmvpnooospam@shadhawk.com> wrote in message news:uIPlQ$vBGHA.344@TK2MSFTNGP11.phx.gbl... > Vagif, > > Sorry for not following the whole thread but there is a comment I want to > make on this aspect. When you have no WHERE clause there is no option but > to retrieve all the rows in the table. When the ORDER BY is not on a > clustered index you only have two choices on how to do that. One is to use > a nonclustered index on the datetime column and then do a Bookmark lookup > for each row to retrieve the rest of the data. The other option is to > read all the pages in order that they appear in the file and then sort > them. While a seek on a nonclustered index with a Bookmark lookup is > pretty fast and cheap for a few rows it becomes extremely expensive in > terms of cost for large numbers of rows. Retrieving the rows via the > physical order is very inexpensive for large amounts of data, especially > table scans. Since the optimizer is a cost based optimizer (looks at > overall resource utilization) I think you will find the cost is lower for > the 2nd option over a bunch of seeks. You can certainly see this if you > SET STATISTICS IO ON and run each query. I bet the seek option will have > many more I/O's than the scan. This does not mean it will be faster. In > some cases it is slower. It is always a trade off between time and > resources. When it comes to what the user wants some times the optimizer > gets it right and some times it is wrong. Usually it is right but there > will always be cases when the conditions are such that it isn't as fast > one way vs the other. The key is always returning the fewest rows you > actually need to ensure the optimizer can do it's best. In this case it > is up to you to determine if speed or resource utilization is more > important. If you want speed than maybe you can use the hint. The other > thing to be careful of that I see a lot when people are trying to tune > queries like this is cached plans. If you run a query that has a BETWEEN > with only a few rows and it chooses a SEEK then the next time you run the > query with a wider range it may reuse that plan and still do a SEEK when a > scan was more appropriate or visa versa. So keep that in mind when > testing as well. > > -- > Andrew J. Kelly SQL MVP > > > "Vagif Abilov" <va***@online.no> wrote in message > news:esqiwwvBGHA.1008@TK2MSFTNGP12.phx.gbl... >> Thanks Uri, >> >> I believe my case is simpler because in my test I don't even have WHERE >> clause: select ... from MyTable order by Timestamp. What can be simpler? >> The idea about covering index is interesting, but again, in my case may >> be irrelevant: the indexed column is small (datetime), the fetched data >> contain varchar column that is 500 char long. Making a covered index that >> would contain this long text would be just a waste of space. >> >> Vagif >> >> >> "Uri Dimant" <u***@iscar.co.il> wrote in message >> news:%23eyF6kvBGHA.3604@TK2MSFTNGP09.phx.gbl... >>> >>> >>> >>>> First of all, it's really bad if the order of columns in result list >>>> affect the execution plan. It shouldn't. >>> >>> Yep , in the WHERE clause it may happened . If you have an index on >>> A,B,C columns so SQL Server keeps statistics only on the first one which >>> may or may not affect the query's perfomance >>> >>> Read this article >>> >>> http://www.sql-server-performance.com/covering_indexes.asp >>> >>> >>> >>> >>> "Vagif Abilov" <va***@online.no> wrote in message >>> news:Of2YHfvBGHA.1088@tk2msftngp13.phx.gbl... >>>> First of all, it's really bad if the order of columns in result list >>>> affect the execution plan. It shouldn't. >>>> >>>> Second, I made a test and as long as I fetch all the columns I need, >>>> reordering them and placing the indexed column first does not affect >>>> the execution plan. Only if reduce the number of columns to include >>>> only indexed columns, then the execution plan changes to use the index >>>> that contains all selected columns. So it's basically useless. >>>> >>>> Vagif >>>> >>>> >>>> "Uri Dimant" <u***@iscar.co.il> wrote in message >>>> news:uQEL3YvBGHA.2912@tk2msftngp13.phx.gbl... >>>>> Hi >>>>> How is about >>>>> >>>>> SELECT Timestamp, and other columns that have an index >>>>> FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-02-01' >>>>> >>>>> >>>>> Lookup COVERING indexes in the BOL >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> "Vagif Abilov" <va***@online.no> wrote in message >>>>> news:OnSUENvBGHA.516@TK2MSFTNGP15.phx.gbl... >>>>>> Hello, >>>>>> >>>>>> I have a table with a couple of non-clustered indexes. One of them is >>>>>> a single-column index on a time column. The table has several hundred >>>>>> thousands records, so when I search for data within certain time >>>>>> range, I expect the index to be used. However, SQL optimized only >>>>>> selects the index if the search range is very narrow. >>>>>> >>>>>> For example: >>>>>> >>>>>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND >>>>>> '2004-01-02' >>>>>> >>>>>> This search would result in just a few records, and optimization plan >>>>>> shows that the right index (on Timestamp column) is used. >>>>>> >>>>>> But if I extend time interval from a day to a month: >>>>>> >>>>>> SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND >>>>>> '2004-02-01' >>>>>> >>>>>> then the query would fetch a couple of thousands records, and query >>>>>> optimizer falls to table scan. >>>>>> >>>>>> It even does it when I use ORDER BY clause: >>>>>> >>>>>> SELECT * FROM MyTable ORDER BY Timestamp >>>>>> >>>>>> It still insists on using table scan and sorting data afterwards. If >>>>>> however I modify the query to force it to use the right index: >>>>>> >>>>>> SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp >>>>>> >>>>>> then the execution time is reduced by 3 times - from 15 seconds to 5. >>>>>> >>>>>> My question: when it's so obvious that use of index gives execution >>>>>> benefits, why does query optimizer still try to use table scan? >>>>>> >>>>>> Vagif Abilov >>>>>> Oslo Norway >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > > Vagif,
For SQL Server, a read from disk (a data cache miss) is about 1,000 to 10,000 times slower then a read from memory (a data cache hit). For that reason, the optimizer tries to minize the number of reads. In addition to that, the optimizer also knows that random I/O is slower than sequential I/O, and will therefore prefer sequential reads (such as those caused by an index scan) over random reads (such as bookmark lookups). Because of all this, the optimizer basically creates a query plan that has the best performance when you have an empty data cache. So if you restart SQL Server (or clear the data cache), you will find that the optimizer determines the most efficient query plan. If you have a hot cache (most relevant data is in memory), then the optimizer will overestimate the cost of random reads and bookmark lookups. This is probably what you have been seeing. If you can change the index on Timestamp to be the clustered index, then that is a very good option. If this is not an option, and you know for sure that the required data is in memory all of the time (or most of the time), then you could consider using index hints. HTH, Gert-Jan Vagif Abilov wrote: Show quote > > Hello, > > I have a table with a couple of non-clustered indexes. One of them is a > single-column index on a time column. The table has several hundred > thousands records, so when I search for data within certain time range, I > expect the index to be used. However, SQL optimized only selects the index > if the search range is very narrow. > > For example: > > SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-01-02' > > This search would result in just a few records, and optimization plan shows > that the right index (on Timestamp column) is used. > > But if I extend time interval from a day to a month: > > SELECT * FROM MyTable where Timestamp BETWEEN '2004-01-01' AND '2004-02-01' > > then the query would fetch a couple of thousands records, and query > optimizer falls to table scan. > > It even does it when I use ORDER BY clause: > > SELECT * FROM MyTable ORDER BY Timestamp > > It still insists on using table scan and sorting data afterwards. If however > I modify the query to force it to use the right index: > > SELECT * FROM MyTable WITH (INDEX(IX_Timestamp)) ORDER BY Timestamp > > then the execution time is reduced by 3 times - from 15 seconds to 5. > > My question: when it's so obvious that use of index gives execution > benefits, why does query optimizer still try to use table scan? > > Vagif Abilov > Oslo Norway |
|||||||||||||||||||||||