|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
How does engine decide whether to scan or seek indexI'm working on an application that uses SQL Server 7 as a back end. Nearly everything in the database references one table with the following columns. generation_id int (primary key) entity_id int effective_from datetime effective_to datetime active tinyint deleted tinyint other text fields.....(e.g. notes) Querying this table can be pretty slow. But in most cases I can squeeze reasonable performance out of my queries. By trial and error I've found I get best performance with a non-clustered index on all the columns above bar the 'other text cols.' However, in a few cases with what seems to be an almost identical query, the execution plan shows the index is being scanned, and this causes the query to run slowly. Can anyone tell me how (in general) the engine decides whether to scan or seek on the index? Is it related to the properties of any other tables you join to? I haven't been able to find a description of this process anywhere - all you seem to get is a description of what the various types of index are, and how things work is relegated to the 'as if by magic' category! TIA Neil From my experience, you will see the optimizer choose a scan of an index
(rather than a seek) whenever the filter in the WHERE condition specifies a range of values using operators like <,>, BETWEEN or LIKE. If your table has a clustered index, and you specify an exact-match filter in your WHERE clause using a column or columns that have a nonclustered index, you will typically see an index seek with a bookmark lookup to the clustered index. If you have two identical queries that execute with different plans, investigate how often statistics are updated on your tables and indexes. -- Show quote"Neil Waldie" wrote: > Hi, > > I'm working on an application that uses SQL Server 7 as a back end. Nearly > everything in the database references one table with the following columns. > > generation_id int (primary key) > entity_id int > effective_from datetime > effective_to datetime > active tinyint > deleted tinyint > other text fields.....(e.g. notes) > > Querying this table can be pretty slow. But in most cases I can squeeze > reasonable performance out of my queries. By trial and error I've found I get > best performance with a non-clustered index on all the columns above bar the > 'other text cols.' However, in a few cases with what seems to be an almost > identical query, the execution plan shows the index is being scanned, and > this causes the query to run slowly. > > Can anyone tell me how (in general) the engine decides whether to scan or > seek on the index? Is it related to the properties of any other tables you > join to? I haven't been able to find a description of this process anywhere - > all you seem to get is a description of what the various types of index are, > and how things work is relegated to the 'as if by magic' category! > > TIA > > Neil Mark Williams wrote:
> From my experience, you will see the optimizer choose a scan of an I'm not sure what you're at here. Two identical queries is actually one> index (rather than a seek) whenever the filter in the WHERE condition > specifies a range of values using operators like <,>, BETWEEN or LIKE. > > If your table has a clustered index, and you specify an exact-match > filter in your WHERE clause using a column or columns that have a > nonclustered index, you will typically see an index seek with a > bookmark lookup to the clustered index. > > If you have two identical queries that execute with different plans, > investigate how often statistics are updated on your tables and > indexes. query. If it executes with different plans over time then statistics could be the reason or change in data distribution. If two queries are just similar (meaning, same conditions but differnt values for conditions) the optimizer will often choose a scan if statistics indicate that a majority of rows will be returned for this criterium. Kind regards robert Robert,
I thought that might trip me up! You can never shortcut these things.... What I meant by 'almost identical' was that the database consists of several tables that are linked to my problem table by the generation id. Most of the queries I'm using are of the same form (TableX INNER JOIN Generation on TableX.g_id = Generation.g_id) but using different tables (TableX). The Generation table they ALL join too was someone else's bright idea which I have to implement and it is this sort of query that causes me problems. 99% of them use an index seek in execution, 1 % use a scan. This is why I raised the possibility that the decision to do that may be related to something about the table you join to. You've pointed out that 'the optimizer will often choose a scan if statistics indicate that a majority of rows will be returned for this criterium.' The criteria for searching will always be the same - is the search date between the effective dates and is the row active and not deleted. But maybe the number of rows returned by the join on the primary key affects this decision too? I'll have a look and see if this only happens where TableX contains a large number of rows that join to the Generation table. As Mark pointed out, I didn't specify my WHERE clause, so it makes it hard for you to know anything about the criteria for searching! Thanks for your help - you've given me something to think about. I'll post a note if I find anything worth reporting. Neil Show quote "Robert Klemme" wrote: > Mark Williams wrote: > > From my experience, you will see the optimizer choose a scan of an > > index (rather than a seek) whenever the filter in the WHERE condition > > specifies a range of values using operators like <,>, BETWEEN or LIKE. > > > > If your table has a clustered index, and you specify an exact-match > > filter in your WHERE clause using a column or columns that have a > > nonclustered index, you will typically see an index seek with a > > bookmark lookup to the clustered index. > > > > If you have two identical queries that execute with different plans, > > investigate how often statistics are updated on your tables and > > indexes. > > I'm not sure what you're at here. Two identical queries is actually one > query. If it executes with different plans over time then statistics > could be the reason or change in data distribution. > > If two queries are just similar (meaning, same conditions but differnt > values for conditions) the optimizer will often choose a scan if > statistics indicate that a majority of rows will be returned for this > criterium. > > Kind regards > > robert > > Neil Waldie wrote:
Show quote > Robert, That's exactly what I meant. If the optimizer figures that 90% of rows> > I thought that might trip me up! You can never shortcut these > things.... > > What I meant by 'almost identical' was that the database consists of > several tables that are linked to my problem table by the generation > id. Most of the queries I'm using are of the same form (TableX INNER > JOIN Generation on TableX.g_id = Generation.g_id) but using different > tables (TableX). The Generation table they ALL join too was someone > else's bright idea which I have to implement and it is this sort of > query that causes me problems. 99% of them use an index seek in > execution, 1 % use a scan. > > This is why I raised the possibility that the decision to do that may > be related to something about the table you join to. You've pointed > out that 'the optimizer will often choose a scan if statistics > indicate that a majority of rows will be returned for this > criterium.' The criteria for searching will always be the same - is > the search date between the effective dates and is the row active and > not deleted. But maybe the number of rows returned by the join on the > primary key affects this decision too? I'll have a look and see if > this only happens where TableX contains a large number of rows that > join to the Generation table. from you fact table are returned by the join he might as well go about scanning. Then again, if tables are joint the decision situation becomes more complex: two tables with statistics, join condition etc. > As Mark pointed out, I didn't specify my WHERE clause, so it makes it True. :-)> hard for you to know anything about the criteria for searching! > Thanks for your help - you've given me something to think about. I'll I'm glad I could help. Good luck!> post a note if I find anything worth reporting. robert Neil,
Could you provide the order of they columns in your index key? Also it seems from your description above that there is no predicates on entity_id. Is this the case? Regards Awny Al-Omari Awny,
You're correct - in this type of query entity_id is unneccessary as the generation id is unique and the join will be between the entity table and the generation table, ensuring I only get the entities I want. The entity_id is, in truth, rarely of any relevance. I probably should have lumped it in the 'other columns' section. The index is currently defined as generation_id, effective_from, effective_to, active, deleted but I have tried several other combinations, often having several indices defined at the same time in the hope that one of them would provide more joy than the others. I can't say I've tried every possible combination, but certainly the first three columns have been re-arranged to have a bash at being first! Neil Show quote "awny***@gmail.com" wrote: > Neil, > > Could you provide the order of they columns in your index key? > Also it seems from your description above that there is no predicates > on entity_id. Is this the case? > > Regards > Awny Al-Omari > > Neil,
It could be that the optimizer is choosing a Hash/Merge Join with index scan against a Nested Loop Join with index seek on generation as inner. There are several factors in optimizer decision. Most important are - The number of rows retrieved from TableX i.e. the table size assuming no local predicates on the table. - The size of the generation table - The constants used in your queries and in particular the date value used against 'effective_from'. This determine their predicates selectivities. It would help if you can post a sample query with showplan. Also what are roughly the table sizes? Regards Awny Al-Omari Mmm. All my queries have to do the same thing - Join another entity table to
the generation data in the table to get the currently active version. So you're correct in guessing that I use BETWEEN in the where clause - I should have given more info. Unfortunately, they all use BETWEEN, so that is not the key to my problem - In the vast majority of cases I still get a seek, not a scan. Re: statistics, I've been forcing an update in my testing using UPDATE STATISTICS table WITH FULLSCAN. I've been doing this between tests and changing index definitions as I thought this would ensure a level playing field. I'd be interested to know what you think about that - is it the best thing to do when testing? It only takes a second, so if it does no harm, I'll keep doing it! Thanks for your comments. Neil Show quote "Mark Williams" wrote: > From my experience, you will see the optimizer choose a scan of an index > (rather than a seek) whenever the filter in the WHERE condition specifies a > range of values using operators like <,>, BETWEEN or LIKE. > > If your table has a clustered index, and you specify an exact-match filter > in your WHERE clause using a column or columns that have a nonclustered > index, you will typically see an index seek with a bookmark lookup to the > clustered index. > > If you have two identical queries that execute with different plans, > investigate how often statistics are updated on your tables and indexes. > > -- > > "Neil Waldie" wrote: > > > Hi, > > > > I'm working on an application that uses SQL Server 7 as a back end. Nearly > > everything in the database references one table with the following columns. > > > > generation_id int (primary key) > > entity_id int > > effective_from datetime > > effective_to datetime > > active tinyint > > deleted tinyint > > other text fields.....(e.g. notes) > > > > Querying this table can be pretty slow. But in most cases I can squeeze > > reasonable performance out of my queries. By trial and error I've found I get > > best performance with a non-clustered index on all the columns above bar the > > 'other text cols.' However, in a few cases with what seems to be an almost > > identical query, the execution plan shows the index is being scanned, and > > this causes the query to run slowly. > > > > Can anyone tell me how (in general) the engine decides whether to scan or > > seek on the index? Is it related to the properties of any other tables you > > join to? I haven't been able to find a description of this process anywhere - > > all you seem to get is a description of what the various types of index are, > > and how things work is relegated to the 'as if by magic' category! > > > > TIA > > > > Neil |
|||||||||||||||||||||||