|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
clustered indexIf you look at share point portal server database you will see, that primary key clustered index is composed of one unique identifier column and 2 columns of nvarchar(255) data type. Isn't that too long clustered index? I thought that clustered indexes should be as narrow as possible? For example, identity column is perfect for that. So, I would create identity clustered index and put other columns in nonclustered index if needed. Can someone explain me why is so extensive clustered index there? Second question: I always thought that every table should have primary key, but I found a lot that there are tables without it. Why? I found a table, which had one unique index, 3 not unique indexes and no one of them was clustered. So table without primary key and without any clustered index. What could be the benefit of that? What is the difference if you have primary key (which by default creates unique clustered index on columns of PK) or just unique clustered index without primary key? I think that this is pretty the same. Thank you for some explanation about that, Simon I wouldn't use the SharePoint database as an example of how to best do
things in your db. First off I don't agree with some of their techniques and you always have to understand what they are using the indexes for and how they use the data. This is different for each application. Usually wide clustered indexes are discouraged but that does not mean they can not be used effectively in some cases. And in general every table should have a clustered index but there will always be exceptions. The point is to not look at someother application as an example of what you should do without understanding why they did it. Try to understand what a clustered index is best for and apply that to your schema the way that makes the most sense. This might help: http://www.sql-server-performance.com/gv_clustered_indexes.asp As for PK vs. unique you should always have a PK constraint defined. The main difference physically between the two is that a PK can not have any nulls where as a Unique can have one. -- Show quoteAndrew J. Kelly SQL MVP "simon" <simon.zu***@iware.si> wrote in message news:fJ_Sf.996$oj5.378456@news.siol.net... >I have question about clustered index. > > If you look at share point portal server database you will see, that > primary key clustered index is composed of one unique identifier column > and 2 columns of nvarchar(255) data type. > > Isn't that too long clustered index? > > I thought that clustered indexes should be as narrow as possible? For > example, identity column is perfect for that. > > So, I would create identity clustered index and put other columns in > nonclustered index if needed. > > Can someone explain me why is so extensive clustered index there? > > Second question: > > I always thought that every table should have primary key, but I found a > lot that there are tables without it. > Why? > > I found a table, which had one unique index, 3 not unique indexes and no > one of them was clustered. > > So table without primary key and without any clustered index. What could > be the benefit of that? > > What is the difference if you have primary key (which by default creates > unique clustered index on columns of PK) or just unique clustered index > without primary key? > > I think that this is pretty the same. > > Thank you for some explanation about that, > Simon > If there's anyone that knows differently, please feel free to correct
me. DDL: CREATE TABLE Transactions ( TransactionClass NVARCHAR(255) , TransactionDate DATETIME NOT NULL , ...<more columns> ) CREATE CLUSTERED INDEX IXC_TransactionClass_TransactionDate ON Transactions(TransactionClass, TransactionDate) In certain cases, it may be beneficial to have a wide clustered index. Lets say you have a table with billions of rows. Each row represents data specific to a transaction. (notice that I'm carefully avoiding using the term "record" here as whether the data included in the row constitutes a complete record is irrelevant for the example). Continuing, lets say there are hundreds of thousands of distinct values in the TransactionClass column and you frequently look up data based on this column. And rarely, if ever, research based on data in any other column, it may be beneficial to cluster based on the TransactionClass column. (other readers, please correct me here if I'm wrong) The width of your clustered index only becomes a factor when you have other nonclustered indexes defined, as the clustered index is included as the row pointer in each row of your nonclustered index. In short, if you have the following schema: CREATE TABLE t ( a DATATYPE_A , b DATATYPE_B , c DATATYPE_C , d DATATYPE_D ) If you cluster on a and DATATYPE_A is long, but you have no nonclustered indexes defined, there is little implication. However, when you add a nonclustered index, say on b and c, your index can be thought of as a table with the following schema: CREATE TABLE index_t_b_c ( b DATATYPE_B , c DATATYPE_C , a DATATYPE_A ) as [a] must be included with your nonclustered index. The implications here are as follows: >From my understanding, when SQL Server fetches data from disk, it fetches one 64KB extent at a time. When your rows are narrower, thefetch will return more rows than if they're wider. If your clustered key is 90 bytes and your index row is 100 bytes (lets forget about other data overhead for the time being), then 90% of your index row is the clustered key. If your clustered key is 4 bytes (INT), then only 4/14, or about 30% of your index is your clustered key, meaning that with each physical fetch from disk, seven times as many index entries may be fetched, dramatically increasing performance. When nonclustered indexes don't exist on your table, this doesn't matter. -Alan |
|||||||||||||||||||||||