|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
auto number?Hi, I define a field as auto number field, usually how people deal with if
data is growing near 2147483647? Thanks. 1) Change the data type to something bigger.
2) Put some stuff in an archives. 3) Re-use previous numbers. And try not toscrew up your data warehouse. 4) If this is declared as a key, then re-think your whole design and come back with a relational schema that does not use sequential numbering for exposed physical locators. Thanks CELKO,
if I declare the key as GUID string, compare to identity number, the query will a little bit slow? Show quote "--CELKO--" <jcelko***@earthlink.net> wrote in message news:1115237750.148780.316040@f14g2000cwb.googlegroups.com... > 1) Change the data type to something bigger. > 2) Put some stuff in an archives. > 3) Re-use previous numbers. And try not toscrew up your data > warehouse. > 4) If this is declared as a key, then re-think your whole design and > come back with a relational schema that does not use sequential > numbering for exposed physical locators. > >> if I declare the key as GUID string, compare to identity number, the query will a little bit slow? <<Again, I think you are missing the basics here. Ask yourself what is the logical meaning of a GUID in the data model? Where does it occur in the reality you are trying to capture? The answer is that it lives only in the hardware that stores the data. By definition, it cannot be a key. Number (4) was the irght answer; you do not have a relational design and you are fishing for kludges to patch it up. > in the reality you are trying to capture? The answer is that it [a guid] I don't buy this statement. Clearly Guids can be generated on machines other > lives > only in the hardware that stores the data. than the database server on which they are stored. So what exactly do you mean by it "lives" only in the hardware that stores the data? Thomas >> Clearly Guids can be generated on machines other than the database server on which they are stored. So what exactly do you mean by it"lives" only in the hardware that stores the data? << Go to the reality of the model. Try to verify it as an identifier. That is, given an entity, can anyone tell you the GUID by looking at the entity? I can read the VIN off of a car, the UPC code from a retail package, etc. but the GUID is not an attribute of the entity. The best I can do is validate its format; I used to be able to get some hardware location data, but that is gone now. By definition, a key is a subset of attributes, so these exposed phsyical locators cannot be keys in an RDBMS. .. >>> Clearly Guids can be generated on machines other than the database If the Guid is stamped on the car you can do the same.> server on which they are stored. So what exactly do you mean by it > "lives" only in the hardware that stores the data? << > > Go to the reality of the model. Try to verify it as an identifier. > That is, given an entity, can anyone tell you the GUID by looking at > the entity? I can read the VIN off of a car, > UPC code from a If the Guid is stamped on the package or encoded as a bar code you can do the > retail package, etc. same. > but the GUID is not an attribute of the entity. It is if the database defines it to be. Just as the system defines a UPC or a VIN to be part of the entity's attributes. > The best I can do is validate its format; I used to be able to get some How the number was generated is irrelevant. You are assuming that a key must > hardware location data, but that is gone now. contain cosmic meaning about the entity. Remember, for that to work, the user must **understand** how to decypher the cosmic meaning from the key for it to be useful. You are further assuming that the Guid is designed to have cosmic meaning and it is not. It is simply a series of alpha numeric characters than uniquely identifies the entity. > By definition, a key is A key *can* be a subset of attributes *or* simply a means to uniquely identify > a subset of attributes, so these exposed phsyical locators cannot be > keys in an RDBMS. an entity (meaning it can be both). In the scenarios you mentioned above, the "key" was simply a bunch of gobbledegook that was stamped on something physical that a user could write down, go to the computer system and look up. Whether the user was able to discern any cosmic meaning from gobbledegook is secondary to its purpose. Thomas >> If the GUID is stamped on the car you can do the same. ...If the GUID is stamped on the package or encoded as a bar code you can do thesame. << Know any company that is doing that? What industry standards group enforces these GUIDs to allow me to place orders and interface with someone else in the retail industry? What information does the GUID give me about the entity? The manufacturer? product line? Anything? >> How the number was generated is irrelevant. << I want validation and verfication, there the generation rules are VERYirrelevant. >>You are assuming that a key must contain cosmic meaning about the entity. <<When you think that you can pull an identifier out of the air, totally unrelated to the entity and by decree in your local village make it an identifier, then you are dealing with "Cosmic Meaning" and mystical powers. Do you also beleive that God put a 17 digit hebrew number on the bottom of everything in creation and all we have to do is find it? >> key *can* be a subset of attributes *or* simply a means to uniquely identify an entity (meaning it can be both). <<Gee. I guess poor old Dr. Codd was wrong. He said a key HAD TO BE subset of attributes which uniquely identifies an entity. You keep confusing exposed physical locators or system surrogates (GUID, IDENTITY, ROWID, pointer chains, hash tables, bit vecrtors, B-Tree indexdes, et al) with a relational key. >>> If the GUID is stamped on the car you can do the same. ...If the Firstly, there is a specification before IEEE for a standard Guid (called a > GUID is stamped on the package or encoded as a bar code you can do the > same. << > > Know any company that is doing that? What industry standards group > enforces these GUIDs to allow me to place orders and interface with > someone else in the retail industry? Uniform Resource Identifier) implementation. I'm not sure of its status. Secondly, your argument sounds like a feature. If this feature is specifically requested, then I'd agree with you. But 99% of the PKs in an enterprise system do not have this requirement. They (users) simply want a unique way of identifying a given entity. An important aspect to understand is *how* they want to locate or get information about an entity. Do they want to be able to look a key value and know something about that entity or is it more likely that they'll plug the value into a computer system and have the computer system tell them far more information about the entity than could possibly be learned by a few meager bits of data buried in a key? > What information does the GUID Again, why does it need to? Outside customers and most internal users won't have > give me about the entity? The manufacturer? product line? Anything? the foggiest idea what cosmic meaning is buried in the key value. What does the value 1234567 mean to you? There very well may be cosmic meaning buried in that key that all of three people in the universe might understand and use. However, the cost of integrating cosmic meaning into a key value is not insignificant. Further, think of a simpler problem: customers. There is no international standard way of identifying a customer and you can't use their driver's license or SSN numbers for privacy reasons. The customers scenario, IMO is more common than the VIN scenario where you have a massive monolithic organization setting the key specification in stone. >>> How the number was generated is irrelevant. << If that feature is specifically desired, then I'd agree. However, where we > > I want validation and verfication, there the generation rules are VERY > irrelevant. clearly disagree is how often this is actually required. My argument is that this sort of check digit type validation is far less common than you clearly think. >>>You are assuming that a key must contain cosmic meaning about the Actually the ten commandments DO have an identifier. They are numbered! The > entity. << > > When you think that you can pull an identifier out of the air, totally > unrelated to the entity and by decree in your local village make it an > identifier, then you are dealing with "Cosmic Meaning" and mystical > powers. Do you also beleive that God put a 17 digit hebrew number on > the bottom of everything in creation and all we have to do is find it? number, by itself tells you nothing about the commandment unless you know the rest of the commandment. If you know nothing of the ten commandments and an evangelist tells you to obey the sixth commandment, you won't have any idea what that means without looking it up. Besides, the Hebrews lead us to believe that the commandments were created in order. Thus, "God" must have used the equivalent of an identity column. ;-> There are thousands of examples of key values for which no one has any clue as to the cosmic meaning inherit to them if there is any. As I said, not all keys need cosmic meaning. Forcing all keys to have cosmic meaning ignores how the key will actually be used. If users want the ability to look at a key and know something about the entity, then clearly having meaning in the key is important. However, if, as I suspect that no key can provide all the information that a user desires and that instead they will lean heavily towards simply plugging the key into a computer system to get more information, then creating a key with meaning is frar more costly in development time than it is worth it. >>> key *can* be a subset of attributes *or* simply a means to uniquely Firstly, Codd is not infallible. Secondly, at the time Codd made his assumptions > identify an entity (meaning it can be both). << > > Gee. I guess poor old Dr. Codd was wrong. He said a key HAD TO BE > subset of attributes which uniquely identifies an entity. You keep > confusing exposed physical locators or system surrogates (GUID, > IDENTITY, ROWID, pointer chains, hash tables, bit vecrtors, B-Tree > indexdes, et al) with a relational key. computer systems were not nearly sophisticated enough nor fast enough to keep up with users. Thus, having meaningful keys improved overall performance because users could, at times, get information from a key and avoid having to query the system itself. However, as systems have gotten far more sophisticated and far more powerful, this is less true. Most keys cannot possible contain enough information to be worthwhile. It is faster today to simply plug the value into a computer system. I'm not suggesting *all* keys should be surrogate or "exposed locators" as you call them. However, I am suggesting that not *all* keys need be natural. Thomas >> However, if, as I suspect that no key can provide all the information that a user desires and that instead they will lean heavilytowards simply plugging the key into a computer system to get more information, then creating a key with meaning is frar more costly in development time than it is worth << A key is supposed to identify an entity, not provide ALL the data about it. I do not expect my VIN number to include my gas mileage. What a user wants in a good key: Uniqueness: The first property is that key be unique. This is the most basic property it can have since without uniqueness it cannot be a key by definition. Uniqueness is necessary, but not sufficient. Uniqueness has a context. An identifier can be unique in the local database, in the enterprise across databases or unique universally. We would prefer the last of those three options. We can often get universal uniqueness with industry standard codes such the VIN numbers. We can get enterprise uniqueness with things like telephone extensions and email addresses. An identifier that is unique only in a single database is workable, but pretty much useless because it will lack the other desired properties. Stability: The second property we want is stability or invariance. The first kind of stability is within the schema itself and this applies to both key and non-key columns. The same data element should have the same representation wherever it appears in the schema. It should not be CHAR(n) in one place and INTEGER in another. The same basic set of constraints should apply to it. That is, if we use the VIN number as an identifier, then we can constraint it to be only for vehicles form Ford Motors; we cannot change the format of the VIN in one table and not in all others. The next kind of stability is over time. You do not want keys changing frequently or in unpredictable ways. Contrary to a popular myth, this does not mean that keys cannot ever change. As the scope of their context grows, they should be able to change. On 2005 January 01, the United States added one more digit to the UPC bar codes used in the retail industry. The reason was globalization and erosion of American industrial domination. The global bar code standard will be the European Article Number (EAN) Code. The American Universal Product Code (UPC) turned 30 years old in 2004 and was never so universal after all. The EAN was set up in 1977 and uses 13 digits, while the UPC has 12 digits of which you see 10 broken into two groups of five digits on a label. The Uniform Code Council, which sets the standards in North America, has the details for the conversion worked out. The conversion is stable because of this central, trusted authority. More than five billion bar-coded products are scanned every day on Earth. It has made data mining in retail possible and saved millions of man-hours. For the neo-Luddite programmers who think that changing a key is going to kill you, let me quote John Metzger, chief information officer of A&P. The grocery chain had 630 stores in 2004 and the grocery industry works 1% to 3% profit margins -- the smallest margins of any industry that is not taking a loss. A&P has handled the new barcode problem as part of a modernization of its technology systems. "It is important," Mr. Metzger said, "but it is not a shut-the-company-down kind of issue." Familiarity: It helps if the users know something about the data. This is not quite the same as validation, but it is related. Validation can tell you if the code is properly formed via some process; familiarity can tell you if it "feels right" because you know something about the context. Thus, ICD codes for disease would confuse a patient, but not a medical record clerk. Validation: Can you look at the data value and tell that it is wrong, without using an external source? For example, I know that '2004-02-30' is not a valid date because no such day exists on the Common Era calendar. Check digits and fixed format codes are one way of obtaining this. Verifiability: How do I verify a key? This also comes in context and in levels of trust. When I cash a check at the supermarket, they are willing to believe that the photo on the driver's license I present is really me, no matter how ugly it is. Or rather, they used to believe it was I; Kroger's grocery store chain is putting an ink-less fingerprinting system in place, just like many banks have done. When I get a passport, I need a birth certificate and fingerprinting. There is a little less trust here. When I get a security clearance, I also need to be investigated. There is a lot less trust. A key without a verification method has no data integrity and will lead the accumulation of bad data. Simplicity: A key should be as simple as possible, but no simpler. People, reports and other systems will use the keys. One man's simple is another man's complex. For an example of a horribly complex code that is in common international usage, look up the IBAN (International Standard Bank Number). But it is minimal in context. >> Firstly, Codd is not infallible.<< Heresy! :)>> Secondly, at the time Codd made his assumptions computer systems were not nearly sophisticated enough nor fast enough to keep up withusers. Thus, having meaningful keys improved overall performance because users could, at times, get information from a key and avoid having to query the system itself. << Dead wrong. In the early days, the file systems and early DBMS used pointers, row numbers, record numbering (identity) and other hardware level things to locate data because performance and storage was a huge issue. A longer meaningful key between two data sets would have slowed down the systems. Get out a book on IDMS, IMS, Total, etc. and look at their enternals. >> Most keys cannot possible contain enough information to be worthwhile. <<I'll remember that when I register my next car with a VIN, buy a book with an ISBN or make a bar-coded retail purchase. All of those transactions completed without a GUID, IDENTITY or Mystical Hebrew Number. <snip>
> A key is supposed to identify an entity, not provide ALL the data about So, where do you draw the line. Suppose 50% of the users want the color encoded > it. I do not expect my VIN number to include my gas mileage. What a > user wants in a good key: in a PK and 50% of the users want the name and address encoded in the PK. Trying make this type of prediction is tantamount to voodoo. Far simpler to have a number that they use to look up the value in their computer system. > Uniqueness: Obivously<snip> Show quote > Stability: This is where you and vehementally disagree. Having a key with cosmic data > > The second property we want is stability or invariance. The first kind > of stability is within the schema itself and this applies to both key > and non-key columns. The same data element should have the same > representation wherever it appears in the schema. It should not be > CHAR(n) in one place and INTEGER in another. The same basic set of > constraints should apply to it. That is, if we use the VIN number as > an identifier, then we can constraint it to be only for vehicles form > Ford Motors; we cannot change the format of the VIN in one table and > not in all others. > > The next kind of stability is over time. You do not want keys changing > frequently or in unpredictable ways. Contrary to a popular myth, this > does not mean that keys cannot ever change. As the scope of their > context grows, they should be able to change. > > On 2005 January 01, the United States added one more digit to the UPC > bar codes used in the retail industry. The reason was globalization > and erosion of American industrial domination. The global bar code > standard will be the European Article Number (EAN) Code. The American > Universal Product Code (UPC) turned 30 years old in 2004 and was never > so universal after all. > > The EAN was set up in 1977 and uses 13 digits, while the UPC has 12 > digits of which you see 10 broken into two groups of five digits on a > label. The Uniform Code Council, which sets the standards in North > America, has the details for the conversion worked out. The conversion > is stable because of this central, trusted authority. More than five > billion bar-coded products are scanned every day on Earth. It has made > data mining in retail possible and saved millions of man-hours. > > For the neo-Luddite programmers who think that changing a key is going > to kill you, let me quote John Metzger, chief information officer of > A&P. The grocery chain had 630 stores in 2004 and the grocery industry > works 1% to 3% profit margins -- the smallest margins of any industry > that is not taking a loss. A&P has handled the new barcode problem as > part of a modernization of its technology systems. "It is important," > Mr. Metzger said, "but it is not a shut-the-company-down kind of > issue." > embedded in the value is far from stable. If all keys are strings the change is lessened but still painful. Far more painful than it should. Further, it does not account for the attribute change scenario. Suppose we have a key value of SMITH01 and the person's name changes. We are left with three possibilites: 1. Leave the key and accept it is wrong. 2. Change the key and everywhere in the system and deal with the fact that printed versions of the key will no longer be traceable. 3. Deprecate the key and create a new and have to deal with lost histories. All three solutions are painful. Here's something else to consider, you keep talking about keys where the schema is set down by a standards board. In most database systems I have seen over the past ten years or so, a majority of entities cannot be tied to this type of key. Customers are good example. You have no standards body that designates a key that you can use. You can't use driver's licenses and you can't use SSN for security and privacy reasons. There many other examples. Yet another problem with this concept is that the information must be readily available at time of entry. For example, let's suppose you have a table of Companies and let's also suppose that companies are being entered hundreds every day. The cost for users to lookup a Dun and Bradstreet number is too costly and may not even be known nor relevant to the users. Thus, related to stability is that all of the constituent parts have to be knowable at time of entry. > Familiarity: How many people, I wonder that work at UPS, fully know *and use* the schema by > > It helps if the users know something about the data. This is not quite > the same as validation, but it is related. Validation can tell you if > the code is properly formed via some process; familiarity can tell you > if it "feels right" because you know something about the context. > Thus, ICD codes for disease would confuse a patient, but not a medical > record clerk. which tracking numbers are assembled? How many users at Card Services know how transaction numbers are assembled? I'll bet the number in both cases is extraordinarily small. People do not look at key values in a vacuum. They have the context with which the key is associated (e.g. a *customer's* number, a *postal code* value etc.) That is enough to know what and where to look up additional information. Your ICD codes is another example of a key designated by a large monolithic body. You can depend on a certain degree of stability with this type of key. However, there are many other types of keys in the universe that are not designated by a standards body. > Validation: Suppose I showed you this number 20050511. Is that a date? Maybe. Maybe not. > > Can you look at the data value and tell that it is wrong, without using > an external source? For example, I know that '2004-02-30' is not a > valid date because no such day exists on the Common Era calendar. > Check digits and fixed format codes are one way of obtaining this. Without *context* the key *in and of itself* is useless. Furthermore, this is feature that can be added later. If specifically requested up front, then I can understand integrating this into your key schema. However most keys do not require this type of validation. Show quote > Verifiability: In what context will this sort of verifiability be used? There are other ways to > > How do I verify a key? This also comes in context and in levels of > trust. When I cash a check at the supermarket, they are willing to > believe that the photo on the driver's license I present is really me, > no matter how ugly it is. Or rather, they used to believe it was I; > Kroger's grocery store chain is putting an ink-less fingerprinting > system in place, just like many banks have done. > > When I get a passport, I need a birth certificate and fingerprinting. > There is a little less trust here. When I get a security clearance, I > also need to be investigated. There is a lot less trust. > > A key without a verification method has no data integrity and will lead > the accumulation of bad data. "verify" the data than to integrate a bunch of meaning in the key. How do you verify that Customer 12345 is Bob Jones? You would look up customer 12345 and if the result was Bob Jones, you would know that someone gave you an invalid key. Not every key requires DOD levels of trust. Some keys need this type of verification. I would argue that most keys do not. > Simplicity: <snip>What could be simpler than a number? ;-> >>> Secondly, at the time Codd made his assumptions computer systems Did Codd envision that users would be innundated with thousands of keys (with > were not nearly sophisticated enough nor fast enough to keep up with > users. Thus, having meaningful keys improved overall performance > because users could, at times, get information from a key and avoid > having to query the system itself. << > > Dead wrong. In the early days, the file systems and early DBMS used > pointers, row numbers, record numbering (identity) and other hardware > level things to locate data because performance and storage was a huge > issue. A longer meaningful key between two data sets would have slowed > down the systems. Get out a book on IDMS, IMS, Total, etc. and look at > their enternals. context)? Did he envision that most users have no desire to learn 80 bazillion key schemas? Did he envision the ease with which users would enter these keys using copy-paste and bar codes? Did he envision millions of database systems with millions more tables across the globe that track abstract concepts for which there is no standards body and no use outside of the enterprise? Doubtful. Codd was using mathematical theory to solve specific design problems given the context of a paper-ful world. I agree longer keys are less desirable than shorter keys. However, there are many examples of long, "natural" keys (e.g. part numbers, credit card numbers). >>> Most keys cannot possible contain enough information to be Without looking it up, pull out a VIN and tell me what the 10th digit > worthwhile. << > > I'll remember that when I register my next car with a VIN, buy a book > with an ISBN or make a bar-coded retail purchase. All of those > transactions completed without a GUID, IDENTITY or Mystical Hebrew > Number. represents. I would bet that 90% or more of the population have no idea. Besides, once again the VIN is a key whose schema is set by standards body. What about the thousands of other abstract entities where there is no such key or such a key is not available at time of entry? By your recokening, you might as well encode all entity information into the key. After all, someone may need it one day. Thomas I have to point something out here ...
> This is where you and I vehementally disagree. This is not a YOU vs CELKO. This is a YOU vs {every standards organization from MISMO to ACORD}. It's like arguing that, despite all the troubles everyone has faced using it, asbestos is, in fact, a safe product to use just because YOU have never had a problem with it. And since the Asbestos studies are a few years old, things have changed since then (better resperators, etc), so there's really no need to go with anything else. >> This is where you and I vehementally disagree. Correct> This is not a YOU vs CELKO. > This is a YOU vs {every standards organization Incorrect.> from MISMO to ACORD}. Looking beyond your strawman abestos argument, let's make sure we understand the context of my statement. I made this comment in reference to the stability of keys that have meaning buried within them and are NOT defined by a standards board. I would argue that any key made of attribute values is by definition not stable. This is not me against Celko nor any standards organization. Clearly, if a standards body declares a schema for a key it has a degree of stability. However, as I said, this is NOT the case for the vast majority of keys that are required for systems. They do not have the luxury of a standards body to mandate the make up of a key. I have personally worked on projects where I had to correct problems created by the original designer who was as arrogant as you and Mr. Celko in regards to the "stability" of their magic key The original designer could not get their head around the possibility that certain information required by the key would not be available at time of entry or that the attributes that make up the key might change or that the attributes that make up the key would not be unique in some circumstances or that a mistake in data entry might occur. This is not the theory of gravity here. This is simple observation. If you have a key made of attribute values and one or more of those attribute values change you have three choices: 1. Leave it and accept that the data is wrong. (e.g. the customer SMITH01 changes their last name from Smith to Jones. Users now have to accept that some keys are accurate and some keys "look" accurate but are not and this breaks the usefulness of having magic meaning as users will always need to lookup the values to find the accurate version of the information). 2. Change the key value everywhere in the system. This means that any permanent stamp of the original key value will have no equivalent value in your system. (e.g. When you look up SMITH01 from an invoice which has now changed to JONES01, you won't find anything) 3. Deprecate the key and issue a new one. This is what banks do with account numbers. However, now you have to deal with the fact that the histories are misplaced. You have to either combine the histories or tie the histories of the two instances together. Again, this is similar to how banks handle the situation. This is much more effort and now imagine that this is a possibility on any entity in your design and you come to realize what a cost in development time and testing this kind of design creates. As I said, but that you failed to grasp, keys that have cosmic meaning buried within them do have a purpose. However, not every key in every system (nor even a majority IMO) that is not mimicing something dictated by a standards board requires this type key and in fact would be a mistake to do so and the cost of getting the key makeup wrong can be significant. Further, even if there is a standards body definition for a key in question, it does not always make sense to use it (e.g. the use of SSN outside of a government organization) Thomas > > This is a YOU vs {every standards organization Okay ... but, in other posts you have specifically argued against ISBN, UPC, > > from MISMO to ACORD}. > > I made this comment in reference to the stability of > keys that have meaning buried within them and are NOT defined by a standards > board. I would argue that any key made of attribute values is by definition not > stable. VIN, PH#, etc. as being no longer necessary in today's world because things have changed so much since their inception. > However, as I said, this is NOT the case for the vast majority of keys that are Don't be so sure about that; industries standardize things of important to > required for systems. They do not have the luxury of a standards body to mandate > the make up of a key. them. Take a look at (man-made) things around you, do some research, and you'll find that there is a standards body or classification for just about every item you find. It's ironic that, with how incredibly easy it is to do reasearch these days, there are so many system designers who don't even know such standards exist, let alone spend a few hours researching them. They just dive in and make a "tblCar" with "intCarId" (or "guidCarId"). > (e.g. the customer SMITH01 changes their last name from Smith to Jones Good designers would never design such a key; that's as stupid as using a GUID for an account number. Unchangable verifiable physical attributes are great key candidates but rarely can one be found on a customer, unless you're lucky enough to serve the physical location (elec. company). That's where system-generated verifiable (i.e., with checksums, etc) keys come into play. But since you also disagree with the rest of the world on verifiability, there's no point in going over that again. <snip>
>> I made this comment in reference to the stability of That's an exaggeration. I have argued that they are not necessary for every >> keys that have meaning buried within them and are NOT defined by a standards >> board. I would argue that any key made of attribute values is by definition >> not >> stable. > > Okay ... but, in other posts you have specifically argued against ISBN, UPC, > VIN, PH#, etc. as being no longer necessary in today's world because things > have changed so much since their inception. situation. Clearly, for the library of Congress, ISBN numbers were important but for customer numbers and some types of account numbers they are not. >> However, as I said, this is NOT the case for the vast majority of keys that Off the top of my head I can think of two: Films and Customers. I can certainly >> are >> required for systems. They do not have the luxury of a standards body to >> mandate >> the make up of a key. > > Don't be so sure about that; industries standardize things of important to > them. Take a look at (man-made) things around you, do some research, and > you'll find that there is a standards body or classification for just about > every item you find. come up with scheme for both of those. However, there is no standards body that designates an encoding for movies for example. Movies are "uniquely" identified in the loosest way by the title, year of production, producer, director etc. It is very possible (even probable) for a movie title to change just before release. Further, movie titles differ from one country to the next even though it is the same movie. Each labs has its own encoding scheme for films. In the case of customers, we have the problem of dealing with timliness of information. The customer may not have a driver's license nor government identification at time of entry. I fully agree that there are many standardized schemes for many keys. However, I have found that because of abstraction of the real world as well as timeliness of the information (which affects completeness) that standardized keys cannot always be used. If there is a standards body designating a given key *and* it is available at time of entry, then I agree that it should be used. World currenncies are a good example. However, it's that last part that makes the use of standardized keys, at times, untenable. <snip> >> (e.g. the customer SMITH01 changes their last name from Smith to Jones Alex, I'd agree that SMITH 01 is a bad key. Yet, the Northwind database (granted > > Good designers would never design such a key; that's as stupid as using a > GUID for an account number. Unchangable verifiable physical attributes are > great key candidates but rarely can one be found on a customer, unless you're > lucky enough to serve the physical location (elec. company). That's where > system-generated verifiable (i.e., with checksums, etc) keys come into play. > But since you also disagree with the rest of the world on verifiability, > there's no point in going over that again. not an ideal example of db design) uses a related scheme based on customer information that would produce the same problems although perhaps not in the same degree. Actually, my stance on verifiability is in line with most development shops: don't add features that were not specifically requested. That means evaluating on an entity by entity basis whether a given key schema requires the need for visual verification of some sort and if so what type. Not every project needs to meet DOD standards. Not every key needs this level of sophistication. Thomas >> They (users) simply want a unique way of identifying a given entity. An important aspect to understand is *how* they wantto locate or get information about an entity. Do they want to be able to look a key value and know something about that entity or is it more likely that they'll plug the value into a computer system and have the computer system tell them far more information about the entity than could possibly be learned by a few meager bits of data buried in a key? << How many people do you know that like to write out a full GUID in a query? How many of them find it fart easier than a familar encoding, like a phone number? A little checklist of desirable properties for a key is good way to do a design inspection. Uniqueness: The first property is that key be unique. This is the most basic property it can have since without uniqueness it cannot be a key by definition. Uniqueness is necessary, but not sufficient. Uniqueness has a context. An identifier can be unique in the local database, in the enterprise across databases or unique universally. We would prefer the last of those three options. We can often get universal uniqueness with industry standard codes such the VIN numbers. We can get enterprise uniqueness with things like telephone extensions and email addresses. An identifier that is unique only in a single database is workable, but pretty much useless because it will lack the other desired properties. Stability: The second property we want is stability or invariance. The first kind of stability is within the schema itself and this applies to both key and non-key columns. The same data element should have the same representation wherever it appears in the schema. It should not be CHAR(n) in one place and INTEGER in another. The same basic set of constraints should apply to it. That is, if we use the VIN number as an identifier, then we can constraint it to be only for vehicles form Ford Motors; we cannot change the format of the VIN in one table and not in all others. The next kind of stability is over time. You do not want keys changing frequently or in unpredictable ways. Contrary to a popular myth, this does not mean that keys cannot ever change. As the scope of their context grows, they should be able to change in a uniform manner for everyone at the same time. On 2005 January 01, the United States added one more digit to the UPC bar codes used in the retail industry. The reason was globalization and erosion of American industrial domination. The global bar code standard will be the European Article Number (EAN) Code. The American Universal Product Code (UPC) turned 30 years old in 2004 and was never so universal after all. For the neo-Luddite programmers who think that changing a key is going to kill you, let me quote John Metzger, chief information officer of A&P. The grocery chain had 630 stores in 2004 and the grocery industry works 1% to 3% profit margins -- the smallest margins of any industry that is not taking a loss. A&P has handled the new barcode problem as part of a modernization of its technology systems. "It is important," Mr. Metzger said, "but it is not a shut-the-company-down kind of issue." Familiarity: It helps if the users know something about the data. This is not quite the same as validation, but it is related. Validation can tell you if the code is properly formed via some process; familiarity can tell you if it "feels right" because you know something about the context. Thus, ICD codes for disease would confuse a patient, but not a medical record clerk. Validation: Can you look at the data value and tell that it is wrong, without using an external source? For example, I know that '2004-02-30' is not a valid date because no such day exists on the Common Era calendar. Check digits and fixed format codes are one way of obtaining this. Verifiability: How do I verify a key? This also comes in context and in levels of trust. When I cash a check at the supermarket, they are willing to believe that the photo on the driver's license I present is really me, no matter how ugly it is. Or rather, they used to believe it was I; Kroger's grocery store chain is putting an ink-less fingerprinting system in place, just like many banks have done. When I get a passport, I need a birth certificate and fingerprinting. There is a little less trust here. When I get a security clearance, I also need to be investigated. There is a lot less trust. A key without a verification method has no data integrity and will lead the accumulation of bad data. Simplicity: A key should be as simple as possible, but no simpler. People, reports and other systems will use the keys. Long, complex keys are more subject to error. One man's simple is another man's complex. For an example of a horribly complex code that is in common international usage, look up the IBAN (International Standard Bank Number). >> at the time Codd made his assumptions computer systems were not nearly sophisticated enough nor fast enough to keep up with users.Thus, having meaningful keys improved overall performance because users could, at times, get information from a key and avoid having to query the system itself. However, as systems have gotten far more sophisticated and far more powerful, this is less true. << That is just dead wrong. RDBMS was considered impractical in Codd's day! Read the history. One of the complaints was that the longer relational keys could not keep up with the pointer chains and indexing used at the time. Look at the internals of IDMS, IMS, Total, et al in the 1970's. The file systems and early DBMS systems used record numbering, pointer chains, indexes, hash tables and all kinds of physical locators instead of meaningful keys. All the linkages for traversals are done by the machine, not in the data model. Increased word size, processor speeds and paralellism are what made RDBMS possible. >> Most keys cannot possible contain enough information to be worthwhile. It is faster today to simply plug the value into a computersystem. << I'll remember that when I use my VIN to register my car, an ISBN to buy a book and try to complete a retail transaction with a standard bar code invovled. Somehow, even without being able to compute my gas mileage from my VIN, I am able to conduct business. <snip>
> How many people do you know that like to write out a full GUID in a They don't know they're typing in a guid. They're just typing in a bunch > query? How many of them find it fart easier than a familar encoding, > like a phone number? A little checklist of desirable properties for a > key is good way to do a design inspection. characters. I've seen part numbers and credit card transaction numbers that rival guids in length. Phone numbers present a good example. There was obviously a method behind the madness of the original phone number designations (I understand the particulars so no need to elaborate). But now that meaning is lost because of cell phones. I lived for three years in the Bay Area in CA with a Southern CA area code on my cell phone number. That "meaning" is now lost for the better IMO. What people really want from the number is not cosmic meaning, but stability. Most people would prefer a number that never changes over one that tells them where their neighbors are located. In fact, I would prefer that my number provide *no* indication of location but that is obviously is not going to happen soon. <snip> (Answered in a previous post) Thomas GUID is too long. Most of the time no problem generating unique numbers.
Morever, just spend some a few time to re-generate another new GUID if another GUID already exists in the database, although these might/may not happen. asteroid collider Show quote "Thomas Coleman" wrote: > <snip> > > How many people do you know that like to write out a full GUID in a > > query? How many of them find it fart easier than a familar encoding, > > like a phone number? A little checklist of desirable properties for a > > key is good way to do a design inspection. > > They don't know they're typing in a guid. They're just typing in a bunch > characters. I've seen part numbers and credit card transaction numbers that > rival guids in length. > > Phone numbers present a good example. There was obviously a method behind the > madness of the original phone number designations (I understand the particulars > so no need to elaborate). But now that meaning is lost because of cell phones. I > lived for three years in the Bay Area in CA with a Southern CA area code on my > cell phone number. That "meaning" is now lost for the better IMO. What people > really want from the number is not cosmic meaning, but stability. Most people > would prefer a number that never changes over one that tells them where their > neighbors are located. In fact, I would prefer that my number provide *no* > indication of location but that is obviously is not going to happen soon. > > <snip> (Answered in a previous post) > > > > Thomas > > > I use SQL Server "decimal" data type of "precision 14".
It can reach max, "99,999,999,999,999" thats 99 trillion and can fit into a (precision 15) "double" data type on a C++ client application. Show quote "Thomas Coleman" wrote: > >>> Clearly Guids can be generated on machines other than the database > > server on which they are stored. So what exactly do you mean by it > > "lives" only in the hardware that stores the data? << > > > > Go to the reality of the model. Try to verify it as an identifier. > > That is, given an entity, can anyone tell you the GUID by looking at > > the entity? I can read the VIN off of a car, > > If the Guid is stamped on the car you can do the same. > > > UPC code from a > > retail package, etc. > > If the Guid is stamped on the package or encoded as a bar code you can do the > same. > > > but the GUID is not an attribute of the entity. > > It is if the database defines it to be. Just as the system defines a UPC or a > VIN to be part of the entity's attributes. > > > The best I can do is validate its format; I used to be able to get some > > hardware location data, but that is gone now. > > How the number was generated is irrelevant. You are assuming that a key must > contain cosmic meaning about the entity. Remember, for that to work, the user > must **understand** how to decypher the cosmic meaning from the key for it to be > useful. You are further assuming that the Guid is designed to have cosmic > meaning and it is not. It is simply a series of alpha numeric characters than > uniquely identifies the entity. > > > By definition, a key is > > a subset of attributes, so these exposed phsyical locators cannot be > > keys in an RDBMS. > > A key *can* be a subset of attributes *or* simply a means to uniquely identify > an entity (meaning it can be both). In the scenarios you mentioned above, the > "key" was simply a bunch of gobbledegook that was stamped on something physical > that a user could write down, go to the computer system and look up. Whether the > user was able to discern any cosmic meaning from gobbledegook is secondary to > its purpose. > > > Thomas > > > I was thinking... and have just found a flaw. If each number in these 99
trillion numbers takes up only 1 byte of storage, I will need about 233 harddisks each capacity of 400 GB and a new shelf. trillionaire Show quote "trillionaire" wrote: > I use SQL Server "decimal" data type of "precision 14". > It can reach max, "99,999,999,999,999" thats 99 trillion and > can fit into a (precision 15) "double" data type on a C++ client application. > > > "Thomas Coleman" wrote: > > > >>> Clearly Guids can be generated on machines other than the database > > > server on which they are stored. So what exactly do you mean by it > > > "lives" only in the hardware that stores the data? << > > > > > > Go to the reality of the model. Try to verify it as an identifier. > > > That is, given an entity, can anyone tell you the GUID by looking at > > > the entity? I can read the VIN off of a car, > > > > If the Guid is stamped on the car you can do the same. > > > > > UPC code from a > > > retail package, etc. > > > > If the Guid is stamped on the package or encoded as a bar code you can do the > > same. > > > > > but the GUID is not an attribute of the entity. > > > > It is if the database defines it to be. Just as the system defines a UPC or a > > VIN to be part of the entity's attributes. > > > > > The best I can do is validate its format; I used to be able to get some > > > hardware location data, but that is gone now. > > > > How the number was generated is irrelevant. You are assuming that a key must > > contain cosmic meaning about the entity. Remember, for that to work, the user > > must **understand** how to decypher the cosmic meaning from the key for it to be > > useful. You are further assuming that the Guid is designed to have cosmic > > meaning and it is not. It is simply a series of alpha numeric characters than > > uniquely identifies the entity. > > > > > By definition, a key is > > > a subset of attributes, so these exposed phsyical locators cannot be > > > keys in an RDBMS. > > > > A key *can* be a subset of attributes *or* simply a means to uniquely identify > > an entity (meaning it can be both). In the scenarios you mentioned above, the > > "key" was simply a bunch of gobbledegook that was stamped on something physical > > that a user could write down, go to the computer system and look up. Whether the > > user was able to discern any cosmic meaning from gobbledegook is secondary to > > its purpose. > > > > > > Thomas > > > > > > I'm not sure I understand your. At precision 14, a decimal column will take up 9
bytes per row. If you mean that by using a data type this sufficiently large you can allow for more rows than can be stored with modern equipment, that's true. That would be true of a BigInt too. The problem is not storage capacity. The (only IMO) downside to identity columns is that you have to make a round trip to generate them. Granted, that can be mitigated through stored procedures. Capacity is not really Joe's argument against identities and Guids. His arguments are related to general design issues. Thomas The other way would be to add another column "Time" of data type datetime.
Primary key is Time Column + Identity Column. When Identity reached maximum, reset it back to 0. Then there is no need to generate unique id. zillionaire Show quote "Thomas Coleman" wrote: > I'm not sure I understand your. At precision 14, a decimal column will take up 9 > bytes per row. If you mean that by using a data type this sufficiently large you > can allow for more rows than can be stored with modern equipment, that's true. > That would be true of a BigInt too. The problem is not storage capacity. The > (only IMO) downside to identity columns is that you have to make a round trip to > generate them. Granted, that can be mitigated through stored procedures. > > Capacity is not really Joe's argument against identities and Guids. His > arguments are related to general design issues. > > > > Thomas > > > A Guid has a couple of advantages over an identity column. The primary advantage
is that it can be generated outside of the database system. That means you save a round trip to the database in order to generate a new key value. Another advantage of their somewhat universally unique nature is that it is easy to make certain types of database changes. For example, suppose you decide to combine two entities into a single table. With identity values, you will likely have to resequence many of the key values which means resequencing their child values. With guids, this is not necessary. You can combine the values into the table and not worry about resequencing or have to worry about child relationships. You can simply recreate the DRI to the new table and be done. Another advantage with using them as primary keys is that you can designate them as the RowGuid for replication purposes and save the automatic addition of a Guid by SQL Server's replication setup process. I'm not suggesting that Guids do not have their problems. The first is their size. At 16 bytes, they're on the large side. A second problem is their readability although as surrogate keys (ones hidden from the user) they are fine. That said, if a key is going to be 18 to 20 characters or more anyway, I don't see that you lose much by going with a guid. Another issue is the fact that "pure" guids are of course somewhat random and unique. That makes it difficult to index against them. However, there are some solutions to sacrafice some uniqueness probability for better indexing (google Nillison's COMB algorithm). Thomas Show quote "trillionaire" <trilliona***@discussions.microsoft.com> wrote in message news:A65428C1-989D-4925-BE2F-852112258B3E@microsoft.com... > The other way would be to add another column "Time" of data type datetime. > Primary key is Time Column + Identity Column. When Identity reached maximum, > reset it back to 0. Then there is no need to generate unique id. > > zillionaire > > > > "Thomas Coleman" wrote: > >> I'm not sure I understand your. At precision 14, a decimal column will take >> up 9 >> bytes per row. If you mean that by using a data type this sufficiently large >> you >> can allow for more rows than can be stored with modern equipment, that's >> true. >> That would be true of a BigInt too. The problem is not storage capacity. The >> (only IMO) downside to identity columns is that you have to make a round trip >> to >> generate them. Granted, that can be mitigated through stored procedures. >> >> Capacity is not really Joe's argument against identities and Guids. His >> arguments are related to general design issues. >> >> >> >> Thomas >> >> >> |
|||||||||||||||||||||||