Home All Groups Group Topic Archive Search About

SQL Server index low cardinality columns

Author
2 Dec 2005 10:29 PM
Dave
I am trying to understand when to index a column and when not to.  I
have read several post and web sites on indexing, but I am still not
clear on if I should index columns that only contain a few distinct
values.

(btw... I did not design this table, but I do have to live with it :)
I have a table with 30 million rows.  It is a wide table with maybe 30
columns.  The table has about 15 indexes on it.  The clustered index is
on a VARCHAR(25) field that is 99% unique.

Assume I am doing a select and the indexed field is used in the where
clause, group by, order by, or join.

My question is, will it help if I index a VARCAHR(25) field that only
has 10 distinct values (<0.1% unique) ?

Does it make any difference if the low cardinality field is a TINYINT
as opposed to a VACHAR(25)?

Is there any way an index can actually slow down a select?

Any advice would be appreciated.

Thanks

Dave

Author
2 Dec 2005 10:59 PM
Alexander Kuznetsov
> will it help if I index a VARCAHR(25) field that only
> has 10 distinct values (<0.1% unique) ?

not too likely, except for index covering situations

> Is there any way an index can actually slow down a select?

yes sure especially when there is data skew and the statistics are
stale
Author
4 Dec 2005 8:17 PM
Gert-Jan Strik
Dave wrote:[snip]
> Does it make any difference if the low cardinality field is a TINYINT
> as opposed to a VACHAR(25)?

Not likely. When using an index seek and bookmark lookups, the high read
count is usually with the bookmark lookups, and they will most likely
determine whether this scenario is fast or slow.

The reads related to the index seek part can be affected by the unused
space of the index and the average size of an index entry. For a
tinyint, this will always be 1 byte plus overhead. For a varchar(25)
this will be 2 bytes + actual size + overhead. This means that on
average, there will be fewer varchar(25) index entries on one page,
which means more reads are necessary to satisfy the query.

Gert-Jan
Author
5 Dec 2005 6:58 PM
jxstern
On 2 Dec 2005 14:29:01 -0800, "Dave" <daveg***@gmail.com> wrote:
>I am trying to understand when to index a column and when not to.

Me too.

I think "try it" is the operative phrase.

:)

>Assume I am doing a select and the indexed field is used in the where
>clause, group by, order by, or join.
>
>My question is, will it help if I index a VARCAHR(25) field that only
>has 10 distinct values (<0.1% unique) ?

It will at least tell the optimizer what is going on, but I'm not
entirely clear what the optimizer does in various cases when it does
not know.  Might help complex query that would return relatively few
rows, avoiding a bookmark.  MIGHT.

>Does it make any difference if the low cardinality field is a TINYINT
>as opposed to a VACHAR(25)?

If it actually uses the index, then sure, it will make a difference.

It may help if the optimizer wants to do a hash match join, but I'm
unclear as to just when that is.  Will it ever happen with a low
cardinality field?  Dunno.

>Is there any way an index can actually slow down a select?

I've seen queries run slower after adding an index.  Should be
impossible, one would think, but sometimes the optimizer can suffer
from "too much information" and miss the good plans.

J.
Author
5 Dec 2005 11:57 PM
jxstern
OK, I had to check something.

On 2 Dec 2005 14:29:01 -0800, "Dave" <daveg***@gmail.com> wrote:
>(btw... I did not design this table, but I do have to live with it :)
>I have a table with 30 million rows.  It is a wide table with maybe 30
>columns.  The table has about 15 indexes on it.  The clustered index is
>on a VARCHAR(25) field that is 99% unique.

I was going to say before, I'm surprised others didn't jump all over
this! In SQLServer, if you have a clustered key, even a non-unique
one, then all other indexes run through it, even unique ones.

So, if you have a not-quite-unique clustered index, then SQLServer has
to *make* it unique under the covers by jamming a GUID (I think) onto
it.  So, if you have even a tinyint secondary index, it's going to
have a varchar(25) plus a GUID(16) added to it ...

.... if I have this all straight.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/architec/8_ar_da2_75mb.asp


J.
Author
6 Dec 2005 12:29 AM
David Browne
Show quote
"jxstern" <jxst***@nowhere.xyz> wrote in message
news:snk9p11rqlshqsc8e1u3ov9kjcjhmsfvih@4ax.com...
> OK, I had to check something.
>
> On 2 Dec 2005 14:29:01 -0800, "Dave" <daveg***@gmail.com> wrote:
>>(btw... I did not design this table, but I do have to live with it :)
>>I have a table with 30 million rows.  It is a wide table with maybe 30
>>columns.  The table has about 15 indexes on it.  The clustered index is
>>on a VARCHAR(25) field that is 99% unique.
>
> I was going to say before, I'm surprised others didn't jump all over
> this! In SQLServer, if you have a clustered key, even a non-unique
> one, then all other indexes run through it, even unique ones.
>
> So, if you have a not-quite-unique clustered index, then SQLServer has
> to *make* it unique under the covers by jamming a GUID (I think) onto
> it.  So, if you have even a tinyint secondary index, it's going to
> have a varchar(25) plus a GUID(16) added to it ...
>
> ... if I have this all straight.
>
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/architec/8_ar_da2_75mb.asp
>
>

The uniqifier is 4 bytes (basically an int), but good point.

David
Author
6 Dec 2005 2:09 AM
jxstern
On Mon, 5 Dec 2005 18:29:58 -0600, "David Browne" <davidbaxterbrowne
no potted m***@hotmail.com> wrote:
>> have a varchar(25) plus a GUID(16) added to it ...
>>
>> ... if I have this all straight.
>>
>> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/architec/8_ar_da2_75mb.asp
>>
>The uniqifier is 4 bytes (basically an int), but good point.

Look again!

An identity may be an int - or other numeric type.  The
uniqueidentifier is another thing entirely, and I believe it is what
SQLServer jams onto the key to make it unique.

J.
Author
6 Dec 2005 3:49 AM
Andrew J. Kelly
I believe David is correct in that it is 4 bytes.  It is definitely not a
guid.


--
Andrew J. Kelly  SQL MVP


Show quote
"jxstern" <jxst***@nowhere.xyz> wrote in message
news:vjs9p19s515ohbkkn5e76uir086188jvo5@4ax.com...
> On Mon, 5 Dec 2005 18:29:58 -0600, "David Browne" <davidbaxterbrowne
> no potted m***@hotmail.com> wrote:
>>> have a varchar(25) plus a GUID(16) added to it ...
>>>
>>> ... if I have this all straight.
>>>
>>> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/architec/8_ar_da2_75mb.asp
>>>
>>The uniqifier is 4 bytes (basically an int), but good point.
>
> Look again!
>
> An identity may be an int - or other numeric type.  The
> uniqueidentifier is another thing entirely, and I believe it is what
> SQLServer jams onto the key to make it unique.
>
> J.
>
>
Author
6 Dec 2005 4:00 AM
David Browne
Show quote
"jxstern" <jxst***@nowhere.xyz> wrote in message
news:vjs9p19s515ohbkkn5e76uir086188jvo5@4ax.com...
> On Mon, 5 Dec 2005 18:29:58 -0600, "David Browne" <davidbaxterbrowne
> no potted m***@hotmail.com> wrote:
>>> have a varchar(25) plus a GUID(16) added to it ...
>>>
>>> ... if I have this all straight.
>>>
>>> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/architec/8_ar_da2_75mb.asp
>>>
>>The uniqifier is 4 bytes (basically an int), but good point.
>
> Look again!
>
> An identity may be an int - or other numeric type.  The
> uniqueidentifier is another thing entirely, and I believe it is what
> SQLServer jams onto the key to make it unique.
>

It jams something in.  It's not documented what.  Its uniqifier not a
uniqueidentifier, and I believe it happens to be 4 bytes long.

David
Author
6 Dec 2005 6:14 PM
jxstern
On Mon, 5 Dec 2005 22:00:50 -0600, "David Browne" <davidbaxterbrowne
no potted m***@hotmail.com> wrote:
>>>The uniqifier is 4 bytes (basically an int), but good point.
>>
>> Look again!
>>
>> An identity may be an int - or other numeric type.  The
>> uniqueidentifier is another thing entirely, and I believe it is what
>> SQLServer jams onto the key to make it unique.
>>
>
>It jams something in.  It's not documented what.  Its uniqifier not a
>uniqueidentifier, and I believe it happens to be 4 bytes long.

4 bytes would be better, I'll bow to the consensus, I recalled seeing
this discussed recently and I may have just misread the "uniqifier"
term.

J.
Author
6 Dec 2005 6:30 PM
Gail Erickson [MS]
From the SQL Server 2005 BOL topic Clustered Index Design Guidelines:
If the clustered index is not created with the UNIQUE property, the Database
Engine automatically adds a 4-byte uniqueifier column to the table. When it
is required, the Database Engine automatically adds a uniqueifier value to a
row to make each key unique. This column and its values are used internally
and cannot be seen or accessed by users.

--
Gail Erickson [MS]
SQL Server Documentation Team
This posting is provided "AS IS" with no warranties, and confers no rights

Show quote
"jxstern" <jxst***@nowhere.xyz> wrote in message
news:l4lbp15qf8bjofet2go41c6ibj7ifaggeu@4ax.com...
> On Mon, 5 Dec 2005 22:00:50 -0600, "David Browne" <davidbaxterbrowne
> no potted m***@hotmail.com> wrote:
>>>>The uniqifier is 4 bytes (basically an int), but good point.
>>>
>>> Look again!
>>>
>>> An identity may be an int - or other numeric type.  The
>>> uniqueidentifier is another thing entirely, and I believe it is what
>>> SQLServer jams onto the key to make it unique.
>>>
>>
>>It jams something in.  It's not documented what.  Its uniqifier not a
>>uniqueidentifier, and I believe it happens to be 4 bytes long.
>
> 4 bytes would be better, I'll bow to the consensus, I recalled seeing
> this discussed recently and I may have just misread the "uniqifier"
> term.
>
> J.
>

AddThis Social Bookmark Button