|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Design: multiple columns for primary keyHi,
A very basic question: Imagine I have a table with multiple fields, some strings, some integers etc. None of these fields are unique on their own, but all fields together is unique per record and thus can be used as the primary key. However, referring to a record based on all these columns is quite cumbersome (and not efficient, or does indexing take care of this nearly entirely compared to a single identity field?), so how is this situation best solved in practice. Do I use an ugly IDENTITY column just for 'convenience' or continue to use all fields, resulting in huge queries that lose legibility, or is there another way? Lisa Lisa Pearlson wrote:
> Do I use an ugly IDENTITY column just for 'convenience' or continue I would use "an ugle IDENTITY column" without doubt! Maybe others have> to use all fields, resulting in huge queries that lose legibility, or > is there another way? different opinions. It'll make your life a less easier. Otherwise if you want to refer to that row in a foreign key you need to include all the columns of the key which isn't really doable. -- Kind regards, Stijn Verrept. Lisa Pearlson (no@spam.plz) writes:
Show quote > A very basic question: It doesn't have to be IDENTITY, you can roll your own as well. But judging> Imagine I have a table with multiple fields, some strings, some integers > etc. > None of these fields are unique on their own, but all fields together is > unique per record and thus can be used as the primary key. > > However, referring to a record based on all these columns is quite > cumbersome (and not efficient, or does indexing take care of this nearly > entirely compared to a single identity field?), so how is this situation > best solved in practice. > > Do I use an ugly IDENTITY column just for 'convenience' or continue to use > all fields, resulting in huge queries that lose legibility, or is there > another way? from the shallow description you give, it appears that this could be a solution. But you should add UNIQUE constraint on the other columns as well to ensure their uniqueness. I like to point out that adding a surrogate key does not always make things simpler. I had a table with a four-column key, and then I needed to add a sub-table with two more keys. Since a six-column key sounded too much, I added a surrogate key to the main table. Years later I had reason to write code to maintain these tables. Turned out that the surrogate key made this a whole lot more complex. So the next time I had revise those tables, I removed the surrogate key. (I was also able to remove one the columns in the four-column key, and one of the keys in the sub-table.) -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx Using the terms "field" and "record" will bring down the wrath of Celko!
I use IDENTITY for a number of reasons, and none of them are for convenience: (1) An IDENTITY value is immutable. (2) Cascading updates are not required to maintain integrity. (3) IDENTITY reduces redundancy. (4) Joins generally perform better on IDENTITY values. (5) IDENTITY reduces lock contention. There are other reasons, and I could expound further on the merits of each of the above. It is not unusual in a logical database design to have tables with compound primary keys; however, when it comes time to implement the design, surrogates--including IDENTITY--make a lot more sense. One more thing: if you do use an IDENTITY PRIMARY KEY, be sure to create a UNIQUE constraint or index on the combination of columns that are together unique per row. Alternate keys should be enforced by the database with a UNIQUE constraint. Show quote "Lisa Pearlson" <no@spam.plz> wrote in message news:%23hOXsSz9FHA.3928@TK2MSFTNGP11.phx.gbl... > Hi, > > A very basic question: > Imagine I have a table with multiple fields, some strings, some integers > etc. > None of these fields are unique on their own, but all fields together is > unique per record and thus can be used as the primary key. > > However, referring to a record based on all these columns is quite > cumbersome (and not efficient, or does indexing take care of this nearly > entirely compared to a single identity field?), so how is this situation > best solved in practice. > > Do I use an ugly IDENTITY column just for 'convenience' or continue to use > all fields, resulting in huge queries that lose legibility, or is there > another way? > > Lisa > I understand 3, 4 and take 5 for granted (some internal DBMS matter I
presume), but could you elaborate a bit on 1 and 2? 1) immutable means you can't do UPDATE MyTable SET identcol=123 WHERE identcol=456 ? 2) What does cascading have to do with it? (as I understand cascading, it's like triggers where change in one record triggers changes in other tables/records?) Show quote "Brian Selzer" <br***@selzer-software.com> wrote in message news:OIHDWpz9FHA.3168@TK2MSFTNGP10.phx.gbl... > Using the terms "field" and "record" will bring down the wrath of Celko! > > I use IDENTITY for a number of reasons, and none of them are for > convenience: > > (1) An IDENTITY value is immutable. > (2) Cascading updates are not required to maintain integrity. > (3) IDENTITY reduces redundancy. > (4) Joins generally perform better on IDENTITY values. > (5) IDENTITY reduces lock contention. > > There are other reasons, and I could expound further on the merits of each > of the above. > > It is not unusual in a logical database design to have tables with > compound primary keys; however, when it comes time to implement the > design, surrogates--including IDENTITY--make a lot more sense. > > One more thing: if you do use an IDENTITY PRIMARY KEY, be sure to create a > UNIQUE constraint or index on the combination of columns that are together > unique per row. Alternate keys should be enforced by the database with a > UNIQUE constraint. > > > "Lisa Pearlson" <no@spam.plz> wrote in message > news:%23hOXsSz9FHA.3928@TK2MSFTNGP11.phx.gbl... >> Hi, >> >> A very basic question: >> Imagine I have a table with multiple fields, some strings, some integers >> etc. >> None of these fields are unique on their own, but all fields together is >> unique per record and thus can be used as the primary key. >> >> However, referring to a record based on all these columns is quite >> cumbersome (and not efficient, or does indexing take care of this nearly >> entirely compared to a single identity field?), so how is this situation >> best solved in practice. >> >> Do I use an ugly IDENTITY column just for 'convenience' or continue to >> use all fields, resulting in huge queries that lose legibility, or is >> there another way? >> >> Lisa >> > > Lisa Pearlson (no@spam.plz) writes:
> I understand 3, 4 and take 5 for granted (some internal DBMS matter I Actually, I think of the reasons that Brian listed, only 5 is really > presume), but could you elaborate a bit on 1 and 2? > 1) immutable means you can't do UPDATE MyTable SET identcol=123 WHERE > identcol=456 ? > 2) What does cascading have to do with it? (as I understand cascading, > it's like triggers where change in one record triggers changes in other > tables/records?) applicable to IDENTITY columns, although 1 has a touch to it: if you have an IDENTITY property on the key, you know that the value cannot be updated, not even by mistake. Points 2-4 applies to surrogate keys in general, no matter if they have IDENTITY or not. I guess what Brian means with cascading is that if you use entirely natural keys these can change. Say that you get the idea to use the stock symbol as the key for financial instruments. Then the company changes the name, and gets a new symbol. With the symbol as key, you have to update all tables where the symbol appears. With a surrogate key, there is no need to. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx Immutable primary keys yield many benefits.
A mutable primary key will make your code less stable. It's possible in a concurrent environment to read a row with a particular primary key value and while you're working on it for another transaction or transactions to make changes so that when you initiate your update, it appears that the row no longer exists, or even worse, that the row exists even though it's really another row. Here's a simple example: Assume that you're tracking a part as it moves from machine to machine on an assembly line. Assume also that only one part can be manipulated by one machine at one time. The mutable primary key is this example is {PartNo, Location}. Now, you read the row with key {'45G', 'PRESS01'} prior to some manipulation. While you're working on the row, the part is moved from 'PRESS01' to 'FORMER01', so it now appears to you that the row no longer exists. If another part '45G' is moved onto 'PRESS01' prior to your update, then it will appear to you that the row still exists even though it's a different part, and you may erroneously update the row for the wrong part. To counter this, you must either lock and hold the row when you read it (not a very attractive prospect because it will severely reduce concurrency and will preclude the use of disconnected datasets, message queues, etc.), or write a ton of code on the client end to detect the change--which may not always be possible or practical. Note that this problem increases in complexity when there are related tables, because it's possible for a row to look the same on the primary key table, but to actually refer to a different row with a different set of related rows. With an IDENTITY primary key, as each part is placed in production a new row is added and a new IDENTITY value is generated. Because that value cannot change, when you go to perform your update, you can determine not only that the part has moved, but exactly where it is now. In addition, the problem with related tables cannot occur, because the related rows refer to a value that cannot change. Another problem lies with UPDATE triggers that are used for auditing or to implement transition constraints. SQL Server update triggers have two pseudotables, deleted and inserted, which contain the old and new values for each row that was updated. When an update affects more than one row, there is no supported mechanism to determine which row in the inserted pseudotable corresponds to each row in the deleted pseudotable. (Oracle has a FOR EACH ROW trigger, which I've been begging Microsoft to implement.) IDENTITY solves this problem because since the key cannot change, you can join the deleted and inseted pseudotables on the IDENTITY column and determine exactly what happened to each row. Cascading updates cause more problems than they're worth. The most common form of avoidable deadlock is caused by multiple transactions obtaining and holding locks on rows in more than one table in a different order. To combat this, you must make sure that you obtain locks on tables in the same order in every procedure, function, trigger, and batch. Cascading updates throw a wrench into this. There is no way to determine with any degree of certainty the order in which related rows will be locked when there is a cascading heirarchy present. This makes it much more difficult--if not impossible--to determine a locking order that will eliminate avoidable deadlocks. Another problem with cascading updates is that whenever a change is made, the rowversion (timestamp) on each affected row is updated. Do you really want to indicate that a Sales Order has been changed when only the salesperson's employee number has been changed? It makes sense to indicate a change when the Sales Order is assigned to another salesperson, but in this case the change is cosmetic, not material, and in my opinion should not occur. This brings up another problem: triggers on every affected table throughout the cascading heirarchy also fire. It's much more difficult to determine whether they fired for simply a cosmetic change, or if there is a material change that should be validated against business rules. All of these problems occur because a database that uses natural primary keys is riddled with redundancy. For this reason among others, I advocate the use of surrogate keys and in particular, IDENTITY at the physical level. Show quote "Lisa Pearlson" <no@spam.plz> wrote in message news:%23nPAi%2309FHA.3884@TK2MSFTNGP10.phx.gbl... >I understand 3, 4 and take 5 for granted (some internal DBMS matter I >presume), but could you elaborate a bit on 1 and 2? > 1) immutable means you can't do UPDATE MyTable SET identcol=123 WHERE > identcol=456 ? > 2) What does cascading have to do with it? (as I understand cascading, > it's like triggers where change in one record triggers changes in other > tables/records?) > > "Brian Selzer" <br***@selzer-software.com> wrote in message > news:OIHDWpz9FHA.3168@TK2MSFTNGP10.phx.gbl... >> Using the terms "field" and "record" will bring down the wrath of Celko! >> >> I use IDENTITY for a number of reasons, and none of them are for >> convenience: >> >> (1) An IDENTITY value is immutable. >> (2) Cascading updates are not required to maintain integrity. >> (3) IDENTITY reduces redundancy. >> (4) Joins generally perform better on IDENTITY values. >> (5) IDENTITY reduces lock contention. >> >> There are other reasons, and I could expound further on the merits of >> each of the above. >> >> It is not unusual in a logical database design to have tables with >> compound primary keys; however, when it comes time to implement the >> design, surrogates--including IDENTITY--make a lot more sense. >> >> One more thing: if you do use an IDENTITY PRIMARY KEY, be sure to create >> a UNIQUE constraint or index on the combination of columns that are >> together unique per row. Alternate keys should be enforced by the >> database with a UNIQUE constraint. >> >> >> "Lisa Pearlson" <no@spam.plz> wrote in message >> news:%23hOXsSz9FHA.3928@TK2MSFTNGP11.phx.gbl... >>> Hi, >>> >>> A very basic question: >>> Imagine I have a table with multiple fields, some strings, some integers >>> etc. >>> None of these fields are unique on their own, but all fields together is >>> unique per record and thus can be used as the primary key. >>> >>> However, referring to a record based on all these columns is quite >>> cumbersome (and not efficient, or does indexing take care of this nearly >>> entirely compared to a single identity field?), so how is this situation >>> best solved in practice. >>> >>> Do I use an ugly IDENTITY column just for 'convenience' or continue to >>> use all fields, resulting in huge queries that lose legibility, or is >>> there another way? >>> >>> Lisa >>> >> >> > > >> Let's get back to the basics of an RDBMS. Rows are not records; fields are not columns; tables are not files. If (a,b,c) is a key in the data model, then you have to make it a keyin the schema. Well, you want things screwed. Ignoring that BY DEFINITION, the proprietary IDENTITY is not a data type, not a relational and not verifiable, it is redundant in the face of a real key. How do you guarantee that you have the IDENTITY and the real key in synch? Most "ID-iots" do not bother with a real key (they mimic a sequential file and pointer chains instead) and get redundant rows when someone posts the same data multiple times. Answer: it is impossible and therefore data integrity is impossible. I am just starting to do some SOX consulting work; I will flunk your database for this. There was some discussion of this at CA a few weeks ago. A good RDBMS will handle the access for you, so that you do not have to drop down to that level. --CELKO-- (jcelko***@earthlink.net) writes:
> Ignoring that BY DEFINITION, the proprietary IDENTITY is not a data New readers should note that this is Joe Celko's private definition, and> type, not a relational and not verifiable, it is redundant in the face > of a real key. thus nothing to bother about. I could say that by definition Joe Celko is always wrong, but that would not be a very strong argument. A much stronger argument is that far too many of his posts consists of inaccurate standard rants, and where the main rationale for the post is to insult the person who asked. > How do you guarantee that you have the IDENTITY and the real key in In Lisa's case, it appears to be simple: she should have a UNIQUE> synch? cosntraints on the real key. However, there are plentyful of cases where there is no real key, or where any "real key" is far beyond being practically usable. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx So, DB2, Oracle, Sybase and Microsoft SQL Server are not 'good RDBMS'? Seems
the market disagrees. The whole point about using a surrogate key with the IDENTITY property is so that you gain better performance, reduce complexity for backups, security etc... as well as being able to change your natural key without having to build a horrendous transaction of multiple update statements that will lock up your tables while its being done - usually people cluster on the primary key (natural key). People only get redundant rows if they forget to add constraints to their table, the surrogate key definition would be,.. create table individual ( id int not null identity constraint sk_individual unique clustered, last_name varchar(50) not null, first_name varchar(50) not null, dob int not null, constraint pk_individual primary key nonclustered( last_name, first_name, dob ) ) Mind you, we both know that the natural key on individual isn't the above, in fact their isn't an easy one - consider a user group like my own, do you want me to ask people for the NI (social security) number ? I wouldn't get many members, I'd probably not even get my registration under the Data Protection act here in the UK passed either. Well fud - go out and get some bloody experience on real systems please! Show quote "--CELKO--" <jcelko***@earthlink.net> wrote in message news:1133661996.695550.257580@o13g2000cwo.googlegroups.com... >>> Let's get back to the basics of an RDBMS. Rows are not records; fields >>> are not columns; tables are not files. > > If (a,b,c) is a key in the data model, then you have to make it a key > in the schema. Well, you want things screwed. > > Ignoring that BY DEFINITION, the proprietary IDENTITY is not a data > type, not a relational and not verifiable, it is redundant in the face > of a real key. > > How do you guarantee that you have the IDENTITY and the real key in > synch? Most "ID-iots" do not bother with a real key (they mimic a > sequential file and pointer chains instead) and get redundant rows when > someone posts the same data multiple times. > > Answer: it is impossible and therefore data integrity is impossible. I > am just starting to do some SOX consulting work; I will flunk your > database for this. There was some discussion of this at CA a few weeks > ago. > > A good RDBMS will handle the access for you, so that you do not have to > drop down to that level. > According to C. J. Date in An Introduction to Database Systems, Seventh
Edition: Surrogate keys are keys in the usual relational sense but have the following specific properties: - They always involve exactly one attribute. - Their values serve solely as surrogates (hence the name) for the entities they stand for. In other words, such values serve merely to represent the fact that the corresponding entities exists-- they carry no additional information or meaning whatsoever. - When a new entity is inserted into the database, it is given a surrogate key value that has never been used before and will never be used again, even if the entity in question is subsequently deleted. Ideally, surrogate key values would be system-generated, but whether they are system- or user-generated has nothing to do with the basic ideas of surrogate keys as such. I believe that columns with the IDENTITY property meet this criteria. Although the actual integer value may be the same in more than one table, every column in which they are used should also participate in a declaritave referential integrity constraint, which serves to make the value unique throughout the database because a DRI constraint includes the table to which it belongs or refers in its definition. Even E. F. Codd embraced the concept of surrogates in a paper he wrote in 1979: Extending the Database Relational Model to Capture More Meaning, which, by the way, you've misinterpreted in some of your rants. Show quote "--CELKO--" <jcelko***@earthlink.net> wrote in message news:1133661996.695550.257580@o13g2000cwo.googlegroups.com... >>> Let's get back to the basics of an RDBMS. Rows are not records; fields >>> are not columns; tables are not files. > > If (a,b,c) is a key in the data model, then you have to make it a key > in the schema. Well, you want things screwed. > > Ignoring that BY DEFINITION, the proprietary IDENTITY is not a data > type, not a relational and not verifiable, it is redundant in the face > of a real key. > > How do you guarantee that you have the IDENTITY and the real key in > synch? Most "ID-iots" do not bother with a real key (they mimic a > sequential file and pointer chains instead) and get redundant rows when > someone posts the same data multiple times. > > Answer: it is impossible and therefore data integrity is impossible. I > am just starting to do some SOX consulting work; I will flunk your > database for this. There was some discussion of this at CA a few weeks > ago. > > A good RDBMS will handle the access for you, so that you do not have to > drop down to that level. >
Other interesting topics
|
|||||||||||||||||||||||