|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
SQL Server index low cardinality columnshave read several post and web sites on indexing, but I am still not clear on if I should index columns that only contain a few distinct values. (btw... I did not design this table, but I do have to live with it :) I have a table with 30 million rows. It is a wide table with maybe 30 columns. The table has about 15 indexes on it. The clustered index is on a VARCHAR(25) field that is 99% unique. Assume I am doing a select and the indexed field is used in the where clause, group by, order by, or join. My question is, will it help if I index a VARCAHR(25) field that only has 10 distinct values (<0.1% unique) ? Does it make any difference if the low cardinality field is a TINYINT as opposed to a VACHAR(25)? Is there any way an index can actually slow down a select? Any advice would be appreciated. Thanks Dave > will it help if I index a VARCAHR(25) field that only not too likely, except for index covering situations> has 10 distinct values (<0.1% unique) ? > Is there any way an index can actually slow down a select? yes sure especially when there is data skew and the statistics arestale Dave wrote:[snip]
> Does it make any difference if the low cardinality field is a TINYINT Not likely. When using an index seek and bookmark lookups, the high read> as opposed to a VACHAR(25)? count is usually with the bookmark lookups, and they will most likely determine whether this scenario is fast or slow. The reads related to the index seek part can be affected by the unused space of the index and the average size of an index entry. For a tinyint, this will always be 1 byte plus overhead. For a varchar(25) this will be 2 bytes + actual size + overhead. This means that on average, there will be fewer varchar(25) index entries on one page, which means more reads are necessary to satisfy the query. Gert-Jan On 2 Dec 2005 14:29:01 -0800, "Dave" <daveg***@gmail.com> wrote: Me too.>I am trying to understand when to index a column and when not to. I think "try it" is the operative phrase. :) It will at least tell the optimizer what is going on, but I'm not>Assume I am doing a select and the indexed field is used in the where >clause, group by, order by, or join. > >My question is, will it help if I index a VARCAHR(25) field that only >has 10 distinct values (<0.1% unique) ? entirely clear what the optimizer does in various cases when it does not know. Might help complex query that would return relatively few rows, avoiding a bookmark. MIGHT. >Does it make any difference if the low cardinality field is a TINYINT If it actually uses the index, then sure, it will make a difference.>as opposed to a VACHAR(25)? It may help if the optimizer wants to do a hash match join, but I'm unclear as to just when that is. Will it ever happen with a low cardinality field? Dunno. >Is there any way an index can actually slow down a select? I've seen queries run slower after adding an index. Should beimpossible, one would think, but sometimes the optimizer can suffer from "too much information" and miss the good plans. J. OK, I had to check something.
On 2 Dec 2005 14:29:01 -0800, "Dave" <daveg***@gmail.com> wrote: I was going to say before, I'm surprised others didn't jump all over>(btw... I did not design this table, but I do have to live with it :) >I have a table with 30 million rows. It is a wide table with maybe 30 >columns. The table has about 15 indexes on it. The clustered index is >on a VARCHAR(25) field that is 99% unique. this! In SQLServer, if you have a clustered key, even a non-unique one, then all other indexes run through it, even unique ones. So, if you have a not-quite-unique clustered index, then SQLServer has to *make* it unique under the covers by jamming a GUID (I think) onto it. So, if you have even a tinyint secondary index, it's going to have a varchar(25) plus a GUID(16) added to it ... .... if I have this all straight. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/architec/8_ar_da2_75mb.asp J.
Show quote
"jxstern" <jxst***@nowhere.xyz> wrote in message The uniqifier is 4 bytes (basically an int), but good point.news:snk9p11rqlshqsc8e1u3ov9kjcjhmsfvih@4ax.com... > OK, I had to check something. > > On 2 Dec 2005 14:29:01 -0800, "Dave" <daveg***@gmail.com> wrote: >>(btw... I did not design this table, but I do have to live with it :) >>I have a table with 30 million rows. It is a wide table with maybe 30 >>columns. The table has about 15 indexes on it. The clustered index is >>on a VARCHAR(25) field that is 99% unique. > > I was going to say before, I'm surprised others didn't jump all over > this! In SQLServer, if you have a clustered key, even a non-unique > one, then all other indexes run through it, even unique ones. > > So, if you have a not-quite-unique clustered index, then SQLServer has > to *make* it unique under the covers by jamming a GUID (I think) onto > it. So, if you have even a tinyint secondary index, it's going to > have a varchar(25) plus a GUID(16) added to it ... > > ... if I have this all straight. > > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/architec/8_ar_da2_75mb.asp > > David On Mon, 5 Dec 2005 18:29:58 -0600, "David Browne" <davidbaxterbrowne
no potted m***@hotmail.com> wrote: >> have a varchar(25) plus a GUID(16) added to it ... Look again!>> >> ... if I have this all straight. >> >> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/architec/8_ar_da2_75mb.asp >> >The uniqifier is 4 bytes (basically an int), but good point. An identity may be an int - or other numeric type. The uniqueidentifier is another thing entirely, and I believe it is what SQLServer jams onto the key to make it unique. J. I believe David is correct in that it is 4 bytes. It is definitely not a
guid. -- Show quoteAndrew J. Kelly SQL MVP "jxstern" <jxst***@nowhere.xyz> wrote in message news:vjs9p19s515ohbkkn5e76uir086188jvo5@4ax.com... > On Mon, 5 Dec 2005 18:29:58 -0600, "David Browne" <davidbaxterbrowne > no potted m***@hotmail.com> wrote: >>> have a varchar(25) plus a GUID(16) added to it ... >>> >>> ... if I have this all straight. >>> >>> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/architec/8_ar_da2_75mb.asp >>> >>The uniqifier is 4 bytes (basically an int), but good point. > > Look again! > > An identity may be an int - or other numeric type. The > uniqueidentifier is another thing entirely, and I believe it is what > SQLServer jams onto the key to make it unique. > > J. > >
Show quote
"jxstern" <jxst***@nowhere.xyz> wrote in message It jams something in. It's not documented what. Its uniqifier not a news:vjs9p19s515ohbkkn5e76uir086188jvo5@4ax.com... > On Mon, 5 Dec 2005 18:29:58 -0600, "David Browne" <davidbaxterbrowne > no potted m***@hotmail.com> wrote: >>> have a varchar(25) plus a GUID(16) added to it ... >>> >>> ... if I have this all straight. >>> >>> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/architec/8_ar_da2_75mb.asp >>> >>The uniqifier is 4 bytes (basically an int), but good point. > > Look again! > > An identity may be an int - or other numeric type. The > uniqueidentifier is another thing entirely, and I believe it is what > SQLServer jams onto the key to make it unique. > uniqueidentifier, and I believe it happens to be 4 bytes long. David On Mon, 5 Dec 2005 22:00:50 -0600, "David Browne" <davidbaxterbrowne
no potted m***@hotmail.com> wrote: >>>The uniqifier is 4 bytes (basically an int), but good point. 4 bytes would be better, I'll bow to the consensus, I recalled seeing>> >> Look again! >> >> An identity may be an int - or other numeric type. The >> uniqueidentifier is another thing entirely, and I believe it is what >> SQLServer jams onto the key to make it unique. >> > >It jams something in. It's not documented what. Its uniqifier not a >uniqueidentifier, and I believe it happens to be 4 bytes long. this discussed recently and I may have just misread the "uniqifier" term. J. From the SQL Server 2005 BOL topic Clustered Index Design Guidelines:
If the clustered index is not created with the UNIQUE property, the Database Engine automatically adds a 4-byte uniqueifier column to the table. When it is required, the Database Engine automatically adds a uniqueifier value to a row to make each key unique. This column and its values are used internally and cannot be seen or accessed by users. -- Show quoteGail Erickson [MS] SQL Server Documentation Team This posting is provided "AS IS" with no warranties, and confers no rights "jxstern" <jxst***@nowhere.xyz> wrote in message news:l4lbp15qf8bjofet2go41c6ibj7ifaggeu@4ax.com... > On Mon, 5 Dec 2005 22:00:50 -0600, "David Browne" <davidbaxterbrowne > no potted m***@hotmail.com> wrote: >>>>The uniqifier is 4 bytes (basically an int), but good point. >>> >>> Look again! >>> >>> An identity may be an int - or other numeric type. The >>> uniqueidentifier is another thing entirely, and I believe it is what >>> SQLServer jams onto the key to make it unique. >>> >> >>It jams something in. It's not documented what. Its uniqifier not a >>uniqueidentifier, and I believe it happens to be 4 bytes long. > > 4 bytes would be better, I'll bow to the consensus, I recalled seeing > this discussed recently and I may have just misread the "uniqifier" > term. > > J. > |
|||||||||||||||||||||||