|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Use of NULL and DEFAULTRULE: Provide Data or Go Meta SCOPE: Designing SQL Tables, UDFs and Stored Procedures MEANING: A good database design will restrict the occurrence of NULL. An overabundance of NULLs in a database suggests, among other things, that the schema is not properly normalized. It may also be difficult to determine what NULL actually means in a give context, e.g., data that is needed but missing, data that is not necessary, etc. This uncertainty is not good. We are essentially corrupting our data in the process of storing it. I believe there is one special case where NULL is valid. This is the case where the absence of valid data requires that we defer to the meta-data to determine what action should be taken. If we specify that a column can accept NULL values, then we are saying that it is ACCEPTABLE to defer to the meta-data for a situation in which the data is not provided (for whatever reason). If we specify that a column cannot accept NULL values, then we are saying that it is NOT ACCEPTABLE to defer to the meta-data for a situation in which the data is not provided (for whatever reason). By capturing the knowledge that it is necessary to defer to the meta-data, we are leaving it up to the meta-data to tell us how to handle the situation. The response stipulated by the meta-data is then free to be changed---without changing the data. This decouples changes in the meta-data response from the data itself. We should never use DEFAULT. We should use NULL instead. If we use DEFAULT to plug in valid data, then we have lost the knowledge that we should be deferring to the meta-data. We have corrupted that data by erasing the fact that it was not provided (for whatever reason). Whe have further corrupted the data by plugging in a value which may not have been valid in the past and/or may not be valid in the future. This rule of thumb attempts to avoid confustion between data and meta-data. Feel free to comment (or to ignore me completely). If you have seen this suggested before, in SQL or any other context, please let me know. Thanks, Gerard I think your rule of thumb sucks.
A NULL indicates the absence of a value. It does not indicate WHY the value does not exist--nor should it: if you need that information, add another column. Whether or not a column can contain NULLs is a multiplicity constraint--nothing more. It doesn't make any sense to ascribe additional meaning beyond that. It doesn't make any sense to "defer" to metadata once a row has been inserted into the database. I don't understand what you mean by "what action should be taken." It appears that you want to query the current state of the metadata in order to replace NULLs in a result set. This makes no sense for existing rows. DEFAULTs are good things. They specify the initial state of an entity. If you need to record which values were supplied, then you should add code to the stored procedure to record that information. Your rule of thum blurs the definition of metadata, thus increasing confusion. Show quote "Gerard Marshall Vignes" <Gerard Marshall Vig***@discussions.microsoft.com> wrote in message news:FA3008AF-93B8-48DC-8CAE-0B198BF9C12C@microsoft.com... >I have a rule of thumb which I would like to get opinions about. > > RULE: Provide Data or Go Meta > > SCOPE: Designing SQL Tables, UDFs and Stored Procedures > > MEANING: > > A good database design will restrict the occurrence of NULL. An > overabundance of NULLs in a database suggests, among other things, that > the > schema is not properly normalized. It may also be difficult to determine > what > NULL actually means in a give context, e.g., data that is needed but > missing, > data that is not necessary, etc. This uncertainty is not good. We are > essentially corrupting our data in the process of storing it. > > I believe there is one special case where NULL is valid. This is the case > where the absence of valid data requires that we defer to the meta-data to > determine what action should be taken. > > If we specify that a column can accept NULL values, then we are saying > that > it is ACCEPTABLE to defer to the meta-data for a situation in which the > data > is not provided (for whatever reason). > > If we specify that a column cannot accept NULL values, then we are saying > that it is NOT ACCEPTABLE to defer to the meta-data for a situation in > which > the data is not provided (for whatever reason). > > By capturing the knowledge that it is necessary to defer to the meta-data, > we are leaving it up to the meta-data to tell us how to handle the > situation. > The response stipulated by the meta-data is then free to be > changed---without > changing the data. This decouples changes in the meta-data response from > the > data itself. > > We should never use DEFAULT. We should use NULL instead. If we use DEFAULT > to plug in valid data, then we have lost the knowledge that we should be > deferring to the meta-data. We have corrupted that data by erasing the > fact > that it was not provided (for whatever reason). Whe have further corrupted > the data by plugging in a value which may not have been valid in the past > and/or may not be valid in the future. > > This rule of thumb attempts to avoid confustion between data and > meta-data. > > Feel free to comment (or to ignore me completely). > > If you have seen this suggested before, in SQL or any other context, > please > let me know. > > Thanks, > > Gerard > Thanks for your response, Brian!
Your comment about Multiplicity is interesting. NULLs are limited to the case of 0 or 1. An additional table with a link via a foreign key is able to handle a wider range 0 to N. It is possible to build fewer tables, with more fields and use NULLs to handle optional fields. That is poor design. It is better to create new tables for optional fields and restrict the use of NULLs. In fact, it is possible to design a schema that does not use NULLs at all. Such a schema may be more complex (more tables, more joins), but it will also be more extensible. You are missing the point about missing data. The objective is not to explain why the data is missing. The point is to avoid hiding the fact that the data is missing. If a field must be provided (NOT NULL) and it is not available, then the row will not be inserted. There is no confusion here. If a field is "optional," i.e., can be NULL, then there is an opportunity for confusion. Without deferring to meta-data (data about the data), we do not know what a NULL value means. Assuming the meaning of NULL [and assuming that meaning of NULL is universal] leads to confusion. This can give unexpected results. By deferring to the meta-data (data about the data) for a non-ambiguous response (action), we eliminate this confusion. A response could be to (1) ignore the field or (2) use some default value for the field. It makes more sense to let the meta-data determine what NULL means than to let each person or program come up with its own interpretation. Ignoring a field is an action. Using a default value in place of the field is an action. The point is to not read anything into the NULL that is not stipulated in the meta-data. DEFAULTs are poor technique. They hide the fact that no data was provided, and thus they corrupt the data. If there is a default, it belongs in the meta-data and not in the data. All things being equal, the simpler solution is more desirable. It is possible to shadow fields with other fields that describe necessary omissions. It is better to design a schema that naturally accommodates incomplete or otherwise imperfect data. Such a schema will not require guesswork or workaround code. Assuming stored procedures or triggers or any other specific programming technique is a mistake. This is design, and design should be flexible. Gerard Show quote "Brian Selzer" wrote: > I think your rule of thumb sucks. > > A NULL indicates the absence of a value. It does not indicate WHY the value > does not exist--nor should it: if you need that information, add another > column. > > Whether or not a column can contain NULLs is a multiplicity > constraint--nothing more. It doesn't make any sense to ascribe additional > meaning beyond that. > > It doesn't make any sense to "defer" to metadata once a row has been > inserted into the database. > > I don't understand what you mean by "what action should be taken." It > appears that you want to query the current state of the metadata in order to > replace NULLs in a result set. This makes no sense for existing rows. > > DEFAULTs are good things. They specify the initial state of an entity. > > If you need to record which values were supplied, then you should add code > to the stored procedure to record that information. > > Your rule of thum blurs the definition of metadata, thus increasing > confusion. > > > "Gerard Marshall Vignes" <Gerard Marshall Vig***@discussions.microsoft.com> > wrote in message news:FA3008AF-93B8-48DC-8CAE-0B198BF9C12C@microsoft.com... > >I have a rule of thumb which I would like to get opinions about. > > > > RULE: Provide Data or Go Meta > > > > SCOPE: Designing SQL Tables, UDFs and Stored Procedures > > > > MEANING: > > > > A good database design will restrict the occurrence of NULL. An > > overabundance of NULLs in a database suggests, among other things, that > > the > > schema is not properly normalized. It may also be difficult to determine > > what > > NULL actually means in a give context, e.g., data that is needed but > > missing, > > data that is not necessary, etc. This uncertainty is not good. We are > > essentially corrupting our data in the process of storing it. > > > > I believe there is one special case where NULL is valid. This is the case > > where the absence of valid data requires that we defer to the meta-data to > > determine what action should be taken. > > > > If we specify that a column can accept NULL values, then we are saying > > that > > it is ACCEPTABLE to defer to the meta-data for a situation in which the > > data > > is not provided (for whatever reason). > > > > If we specify that a column cannot accept NULL values, then we are saying > > that it is NOT ACCEPTABLE to defer to the meta-data for a situation in > > which > > the data is not provided (for whatever reason). > > > > By capturing the knowledge that it is necessary to defer to the meta-data, > > we are leaving it up to the meta-data to tell us how to handle the > > situation. > > The response stipulated by the meta-data is then free to be > > changed---without > > changing the data. This decouples changes in the meta-data response from > > the > > data itself. > > > > We should never use DEFAULT. We should use NULL instead. If we use DEFAULT > > to plug in valid data, then we have lost the knowledge that we should be > > deferring to the meta-data. We have corrupted that data by erasing the > > fact > > that it was not provided (for whatever reason). Whe have further corrupted > > the data by plugging in a value which may not have been valid in the past > > and/or may not be valid in the future. > > > > This rule of thumb attempts to avoid confustion between data and > > meta-data. > > > > Feel free to comment (or to ignore me completely). > > > > If you have seen this suggested before, in SQL or any other context, > > please > > let me know. > > > > Thanks, > > > > Gerard > > > > > For the record: I wasn't advocating the use of NULLs. I believe that they
have no place in base relations: "NULL value" is an oxymor0n. They can, however, be useful in relational expressions to simplify queries. > DEFAULTs are poor technique. They hide the fact that no data was provided, Hello McFly! DEFAULTs ARE a part of the metadata. What doesn't make sense > and thus they corrupt the data. If there is a default, it belongs in the > meta-data and not in the data. is "deferring" to the metadata once a row has been inserted. For example, consider two types of time stamps in a temporal database. One identifies the time that an external event occurred; the other identifies the time that information arrived in the database. For the first type, the information MUST be provided by the application; for the second type, DEFAULT(GETDATE()) is best. It doesn't make sense for the user to specify it during an INSERT because blocking may cause an interval to exist between the time that the INSERT is issued and the time that the information becomes part of the database, and it also makes no sense for the time stamp to be absent--in fact, it is usually imperative that it exist in order to extract the state of the database at a specific point-in-time. Your technique of "deferring" to the metadata for subsequent queries would cause vital information to be lost--that is, the time that the information arrived in the database. I reject your assertion that data is in any way corrupted by using DEFAULTs. Your argument that information is lost is unconvincing. Again, if you need to record the fact that a value was not provided, then explicitly record that fact in another column. The meaning of NULL IS universal. Interpretation of that meaning in context is outside the scope of the database. It is the purview of the application developer or of the user. All that is important from the database perspective is that no value exists. Again, if you need to record why it doesn't exist, then add another column and record it. Stored procedures and triggers are part of the database. Triggers are necessary to enforce some constraints that cannot be enforced declaratively--for example, transition constraints. Stored procedures provide a means to embed business logic in the database where it belongs. (Business rules are essentially database constraints.) Sometimes triggers are contraindicated for performance or logistical reasons. Stored procedures are the logical alternative. Show quote "Gerard Marshall Vignes" <GerardMarshallVig***@discussions.microsoft.com> wrote in message news:07461B4D-0703-4428-8795-4D21C180C531@microsoft.com... > Thanks for your response, Brian! > > Your comment about Multiplicity is interesting. NULLs are limited to the > case of 0 or 1. An additional table with a link via a foreign key is able > to > handle a wider range 0 to N. > > It is possible to build fewer tables, with more fields and use NULLs to > handle optional fields. That is poor design. It is better to create new > tables for optional fields and restrict the use of NULLs. In fact, it is > possible to design a schema that does not use NULLs at all. Such a schema > may > be more complex (more tables, more joins), but it will also be more > extensible. > > You are missing the point about missing data. The objective is not to > explain why the data is missing. The point is to avoid hiding the fact > that > the data is missing. > > If a field must be provided (NOT NULL) and it is not available, then the > row > will not be inserted. There is no confusion here. > > If a field is "optional," i.e., can be NULL, then there is an opportunity > for confusion. Without deferring to meta-data (data about the data), we do > not know what a NULL value means. Assuming the meaning of NULL [and > assuming > that meaning of NULL is universal] leads to confusion. This can give > unexpected results. > > By deferring to the meta-data (data about the data) for a non-ambiguous > response (action), we eliminate this confusion. A response could be to (1) > ignore the field or (2) use some default value for the field. It makes > more > sense to let the meta-data determine what NULL means than to let each > person > or program come up with its own interpretation. > > Ignoring a field is an action. Using a default value in place of the field > is an action. The point is to not read anything into the NULL that is not > stipulated in the meta-data. > > DEFAULTs are poor technique. They hide the fact that no data was provided, > and thus they corrupt the data. If there is a default, it belongs in the > meta-data and not in the data. > > All things being equal, the simpler solution is more desirable. It is > possible to shadow fields with other fields that describe necessary > omissions. It is better to design a schema that naturally accommodates > incomplete or otherwise imperfect data. Such a schema will not require > guesswork or workaround code. > > Assuming stored procedures or triggers or any other specific programming > technique is a mistake. This is design, and design should be flexible. > > Gerard > > "Brian Selzer" wrote: > >> I think your rule of thumb sucks. >> >> A NULL indicates the absence of a value. It does not indicate WHY the >> value >> does not exist--nor should it: if you need that information, add another >> column. >> >> Whether or not a column can contain NULLs is a multiplicity >> constraint--nothing more. It doesn't make any sense to ascribe >> additional >> meaning beyond that. >> >> It doesn't make any sense to "defer" to metadata once a row has been >> inserted into the database. >> >> I don't understand what you mean by "what action should be taken." It >> appears that you want to query the current state of the metadata in order >> to >> replace NULLs in a result set. This makes no sense for existing rows. >> >> DEFAULTs are good things. They specify the initial state of an entity. >> >> If you need to record which values were supplied, then you should add >> code >> to the stored procedure to record that information. >> >> Your rule of thum blurs the definition of metadata, thus increasing >> confusion. >> >> >> "Gerard Marshall Vignes" <Gerard Marshall >> Vig***@discussions.microsoft.com> >> wrote in message >> news:FA3008AF-93B8-48DC-8CAE-0B198BF9C12C@microsoft.com... >> >I have a rule of thumb which I would like to get opinions about. >> > >> > RULE: Provide Data or Go Meta >> > >> > SCOPE: Designing SQL Tables, UDFs and Stored Procedures >> > >> > MEANING: >> > >> > A good database design will restrict the occurrence of NULL. An >> > overabundance of NULLs in a database suggests, among other things, that >> > the >> > schema is not properly normalized. It may also be difficult to >> > determine >> > what >> > NULL actually means in a give context, e.g., data that is needed but >> > missing, >> > data that is not necessary, etc. This uncertainty is not good. We are >> > essentially corrupting our data in the process of storing it. >> > >> > I believe there is one special case where NULL is valid. This is the >> > case >> > where the absence of valid data requires that we defer to the meta-data >> > to >> > determine what action should be taken. >> > >> > If we specify that a column can accept NULL values, then we are saying >> > that >> > it is ACCEPTABLE to defer to the meta-data for a situation in which the >> > data >> > is not provided (for whatever reason). >> > >> > If we specify that a column cannot accept NULL values, then we are >> > saying >> > that it is NOT ACCEPTABLE to defer to the meta-data for a situation in >> > which >> > the data is not provided (for whatever reason). >> > >> > By capturing the knowledge that it is necessary to defer to the >> > meta-data, >> > we are leaving it up to the meta-data to tell us how to handle the >> > situation. >> > The response stipulated by the meta-data is then free to be >> > changed---without >> > changing the data. This decouples changes in the meta-data response >> > from >> > the >> > data itself. >> > >> > We should never use DEFAULT. We should use NULL instead. If we use >> > DEFAULT >> > to plug in valid data, then we have lost the knowledge that we should >> > be >> > deferring to the meta-data. We have corrupted that data by erasing the >> > fact >> > that it was not provided (for whatever reason). Whe have further >> > corrupted >> > the data by plugging in a value which may not have been valid in the >> > past >> > and/or may not be valid in the future. >> > >> > This rule of thumb attempts to avoid confustion between data and >> > meta-data. >> > >> > Feel free to comment (or to ignore me completely). >> > >> > If you have seen this suggested before, in SQL or any other context, >> > please >> > let me know. >> > >> > Thanks, >> > >> > Gerard >> > >> >> >> Thanks again Brian!
I agree with you, Brian, and with Scott. There are times when DEFAULT is good technique. Logging and other activities that support forensics are good examples. I was being a purist and taking my so-called "rule-of-thumb" too far. KEY POINT: Since our response to missing data may change over time, we need to retain the knowledge that the data was missing. This is the one thing that NULL does well. I advocate using the meta-data to stipulate the response to a NULL (missing data). Meta-data that remains meta-data is good meta-data. Meta-data that gets copied into data and erases the fact that no data was provided is not good meta-data. Allowing missing data without stipulating how to respond to missing data invites confusion. Is the record usable? Should it be retained? Your point about Stored Procedures and Triggers is good. I would add to that User Defined Functions, DTS Jobs (which can run VBScript & COM!) and new technologies that Microsoft is adding to SQL Server. I usually decide to implement logic in the database for performance reasons. Database servers are usually very powerful and can grind out code rapidly. Reducing the amount of data moving between the database and its clients reduces network traffic. This downside is that this produces systems that are highly non-portable. That is an entirely new thread, and probably one more interesting than this one :-) Thanks Again, Gerard Show quote "Brian Selzer" wrote: > For the record: I wasn't advocating the use of NULLs. I believe that they > have no place in base relations: "NULL value" is an oxymor0n. They can, > however, be useful in relational expressions to simplify queries. > > > DEFAULTs are poor technique. They hide the fact that no data was provided, > > and thus they corrupt the data. If there is a default, it belongs in the > > meta-data and not in the data. > > Hello McFly! DEFAULTs ARE a part of the metadata. What doesn't make sense > is "deferring" to the metadata once a row has been inserted. For example, > consider two types of time stamps in a temporal database. One identifies > the time that an external event occurred; the other identifies the time that > information arrived in the database. For the first type, the information > MUST be provided by the application; for the second type, DEFAULT(GETDATE()) > is best. It doesn't make sense for the user to specify it during an INSERT > because blocking may cause an interval to exist between the time that the > INSERT is issued and the time that the information becomes part of the > database, and it also makes no sense for the time stamp to be absent--in > fact, it is usually imperative that it exist in order to extract the state > of the database at a specific point-in-time. Your technique of "deferring" > to the metadata for subsequent queries would cause vital information to be > lost--that is, the time that the information arrived in the database. > > I reject your assertion that data is in any way corrupted by using DEFAULTs. > Your argument that information is lost is unconvincing. Again, if you need > to record the fact that a value was not provided, then explicitly record > that fact in another column. > > The meaning of NULL IS universal. Interpretation of that meaning in context > is outside the scope of the database. It is the purview of the application > developer or of the user. All that is important from the database > perspective is that no value exists. Again, if you need to record why it > doesn't exist, then add another column and record it. > > Stored procedures and triggers are part of the database. Triggers are > necessary to enforce some constraints that cannot be enforced > declaratively--for example, transition constraints. Stored procedures > provide a means to embed business logic in the database where it belongs. > (Business rules are essentially database constraints.) Sometimes triggers > are contraindicated for performance or logistical reasons. Stored > procedures are the logical alternative. > > "Gerard Marshall Vignes" <GerardMarshallVig***@discussions.microsoft.com> > wrote in message news:07461B4D-0703-4428-8795-4D21C180C531@microsoft.com... > > Thanks for your response, Brian! > > > > Your comment about Multiplicity is interesting. NULLs are limited to the > > case of 0 or 1. An additional table with a link via a foreign key is able > > to > > handle a wider range 0 to N. > > > > It is possible to build fewer tables, with more fields and use NULLs to > > handle optional fields. That is poor design. It is better to create new > > tables for optional fields and restrict the use of NULLs. In fact, it is > > possible to design a schema that does not use NULLs at all. Such a schema > > may > > be more complex (more tables, more joins), but it will also be more > > extensible. > > > > You are missing the point about missing data. The objective is not to > > explain why the data is missing. The point is to avoid hiding the fact > > that > > the data is missing. > > > > If a field must be provided (NOT NULL) and it is not available, then the > > row > > will not be inserted. There is no confusion here. > > > > If a field is "optional," i.e., can be NULL, then there is an opportunity > > for confusion. Without deferring to meta-data (data about the data), we do > > not know what a NULL value means. Assuming the meaning of NULL [and > > assuming > > that meaning of NULL is universal] leads to confusion. This can give > > unexpected results. > > > > By deferring to the meta-data (data about the data) for a non-ambiguous > > response (action), we eliminate this confusion. A response could be to (1) > > ignore the field or (2) use some default value for the field. It makes > > more > > sense to let the meta-data determine what NULL means than to let each > > person > > or program come up with its own interpretation. > > > > Ignoring a field is an action. Using a default value in place of the field > > is an action. The point is to not read anything into the NULL that is not > > stipulated in the meta-data. > > > > DEFAULTs are poor technique. They hide the fact that no data was provided, > > and thus they corrupt the data. If there is a default, it belongs in the > > meta-data and not in the data. > > > > All things being equal, the simpler solution is more desirable. It is > > possible to shadow fields with other fields that describe necessary > > omissions. It is better to design a schema that naturally accommodates > > incomplete or otherwise imperfect data. Such a schema will not require > > guesswork or workaround code. > > > > Assuming stored procedures or triggers or any other specific programming > > technique is a mistake. This is design, and design should be flexible. > > > > Gerard > > > > "Brian Selzer" wrote: > > > >> I think your rule of thumb sucks. > >> > >> A NULL indicates the absence of a value. It does not indicate WHY the > >> value > >> does not exist--nor should it: if you need that information, add another > >> column. > >> > >> Whether or not a column can contain NULLs is a multiplicity > >> constraint--nothing more. It doesn't make any sense to ascribe > >> additional > >> meaning beyond that. > >> > >> It doesn't make any sense to "defer" to metadata once a row has been > >> inserted into the database. > >> > >> I don't understand what you mean by "what action should be taken." It > >> appears that you want to query the current state of the metadata in order > >> to > >> replace NULLs in a result set. This makes no sense for existing rows. > >> > >> DEFAULTs are good things. They specify the initial state of an entity. > >> > >> If you need to record which values were supplied, then you should add > >> code > >> to the stored procedure to record that information. > >> > >> Your rule of thum blurs the definition of metadata, thus increasing > >> confusion. > >> > >> > >> "Gerard Marshall Vignes" <Gerard Marshall > >> Vig***@discussions.microsoft.com> > >> wrote in message > >> news:FA3008AF-93B8-48DC-8CAE-0B198BF9C12C@microsoft.com... > >> >I have a rule of thumb which I would like to get opinions about. > >> > > >> > RULE: Provide Data or Go Meta > >> > > >> > SCOPE: Designing SQL Tables, UDFs and Stored Procedures > >> > > >> > MEANING: > >> > > >> > A good database design will restrict the occurrence of NULL. An > >> > overabundance of NULLs in a database suggests, among other things, that > >> > the > >> > schema is not properly normalized. It may also be difficult to > >> > determine > >> > what > >> > NULL actually means in a give context, e.g., data that is needed but > >> > missing, > >> > data that is not necessary, etc. This uncertainty is not good. We are > >> > essentially corrupting our data in the process of storing it. > >> > > >> > I believe there is one special case where NULL is valid. This is the > >> > case > >> > where the absence of valid data requires that we defer to the meta-data > >> > to > >> > determine what action should be taken. > >> > > >> > If we specify that a column can accept NULL values, then we are saying > >> > that > >> > it is ACCEPTABLE to defer to the meta-data for a situation in which the > >> > data > >> > is not provided (for whatever reason). > >> > > >> > If we specify that a column cannot accept NULL values, then we are > >> > saying > >> > that it is NOT ACCEPTABLE to defer to the meta-data for a situation in > >> > which > >> > the data is not provided (for whatever reason). > >> > > >> > By capturing the knowledge that it is necessary to defer to the > >> > meta-data, > >> > we are leaving it up to the meta-data to tell us how to handle the > >> > situation. > >> > The response stipulated by the meta-data is then free to be > >> > changed---without > >> > changing the data. This decouples changes in the meta-data response > >> > from > >> > the > >> > data itself. > >> > > >> > We should never use DEFAULT. We should use NULL instead. If we use > >> > DEFAULT > >> > to plug in valid data, then we have lost the knowledge that we should > >> > be > >> > deferring to the meta-data. We have corrupted that data by erasing the > >> > fact > >> > that it was not provided (for whatever reason). Whe have further > >> > corrupted > >> > the data by plugging in a value which may not have been valid in the > >> > past > >> > and/or may not be valid in the future. > >> > > >> > This rule of thumb attempts to avoid confustion between data and > >> > meta-data. > >> > > >> > Feel free to comment (or to ignore me completely). > >> > > >> > If you have seen this suggested before, in SQL or any other context, > >> > please > >> > let me know. > >> > > >> > Thanks, > >> > > >> > Gerard > >> > > >> > >> > >> > > > > KEY POINT: Since our response to missing data may change over time, we The problem is that the metadata may also change over time, so you lose > need > to retain the knowledge that the data was missing. This is the one thing > that > NULL does well. I advocate using the meta-data to stipulate the response > to a > NULL (missing data). information either way. If the metadata changes today, and queries rely on it, then a query that was run yesterday will return different results than a query that ran today. One of Codd's rules stipulates that a nondestructive schema change shouldn't break existing applications. I would consider an application that returns different results on different days given the same set of data broken. Show quote "Gerard Marshall Vignes" <GerardMarshallVig***@discussions.microsoft.com> wrote in message news:8B80941F-0668-4EB2-B5E8-55EFFB117A16@microsoft.com... > Thanks again Brian! > > I agree with you, Brian, and with Scott. There are times when DEFAULT is > good technique. Logging and other activities that support forensics are > good > examples. I was being a purist and taking my so-called "rule-of-thumb" too > far. > > KEY POINT: Since our response to missing data may change over time, we > need > to retain the knowledge that the data was missing. This is the one thing > that > NULL does well. I advocate using the meta-data to stipulate the response > to a > NULL (missing data). > > Meta-data that remains meta-data is good meta-data. Meta-data that gets > copied into data and erases the fact that no data was provided is not good > meta-data. > > Allowing missing data without stipulating how to respond to missing data > invites confusion. Is the record usable? Should it be retained? > > Your point about Stored Procedures and Triggers is good. I would add to > that > User Defined Functions, DTS Jobs (which can run VBScript & COM!) and new > technologies that Microsoft is adding to SQL Server. > > I usually decide to implement logic in the database for performance > reasons. > Database servers are usually very powerful and can grind out code rapidly. > Reducing the amount of data moving between the database and its clients > reduces network traffic. This downside is that this produces systems that > are > highly non-portable. > > That is an entirely new thread, and probably one more interesting than > this > one :-) > > Thanks Again, > > Gerard > > > "Brian Selzer" wrote: > >> For the record: I wasn't advocating the use of NULLs. I believe that >> they >> have no place in base relations: "NULL value" is an oxymor0n. They can, >> however, be useful in relational expressions to simplify queries. >> >> > DEFAULTs are poor technique. They hide the fact that no data was >> > provided, >> > and thus they corrupt the data. If there is a default, it belongs in >> > the >> > meta-data and not in the data. >> >> Hello McFly! DEFAULTs ARE a part of the metadata. What doesn't make >> sense >> is "deferring" to the metadata once a row has been inserted. For >> example, >> consider two types of time stamps in a temporal database. One identifies >> the time that an external event occurred; the other identifies the time >> that >> information arrived in the database. For the first type, the information >> MUST be provided by the application; for the second type, >> DEFAULT(GETDATE()) >> is best. It doesn't make sense for the user to specify it during an >> INSERT >> because blocking may cause an interval to exist between the time that the >> INSERT is issued and the time that the information becomes part of the >> database, and it also makes no sense for the time stamp to be absent--in >> fact, it is usually imperative that it exist in order to extract the >> state >> of the database at a specific point-in-time. Your technique of >> "deferring" >> to the metadata for subsequent queries would cause vital information to >> be >> lost--that is, the time that the information arrived in the database. >> >> I reject your assertion that data is in any way corrupted by using >> DEFAULTs. >> Your argument that information is lost is unconvincing. Again, if you >> need >> to record the fact that a value was not provided, then explicitly record >> that fact in another column. >> >> The meaning of NULL IS universal. Interpretation of that meaning in >> context >> is outside the scope of the database. It is the purview of the >> application >> developer or of the user. All that is important from the database >> perspective is that no value exists. Again, if you need to record why it >> doesn't exist, then add another column and record it. >> >> Stored procedures and triggers are part of the database. Triggers are >> necessary to enforce some constraints that cannot be enforced >> declaratively--for example, transition constraints. Stored procedures >> provide a means to embed business logic in the database where it belongs. >> (Business rules are essentially database constraints.) Sometimes >> triggers >> are contraindicated for performance or logistical reasons. Stored >> procedures are the logical alternative. >> >> "Gerard Marshall Vignes" <GerardMarshallVig***@discussions.microsoft.com> >> wrote in message >> news:07461B4D-0703-4428-8795-4D21C180C531@microsoft.com... >> > Thanks for your response, Brian! >> > >> > Your comment about Multiplicity is interesting. NULLs are limited to >> > the >> > case of 0 or 1. An additional table with a link via a foreign key is >> > able >> > to >> > handle a wider range 0 to N. >> > >> > It is possible to build fewer tables, with more fields and use NULLs to >> > handle optional fields. That is poor design. It is better to create new >> > tables for optional fields and restrict the use of NULLs. In fact, it >> > is >> > possible to design a schema that does not use NULLs at all. Such a >> > schema >> > may >> > be more complex (more tables, more joins), but it will also be more >> > extensible. >> > >> > You are missing the point about missing data. The objective is not to >> > explain why the data is missing. The point is to avoid hiding the fact >> > that >> > the data is missing. >> > >> > If a field must be provided (NOT NULL) and it is not available, then >> > the >> > row >> > will not be inserted. There is no confusion here. >> > >> > If a field is "optional," i.e., can be NULL, then there is an >> > opportunity >> > for confusion. Without deferring to meta-data (data about the data), we >> > do >> > not know what a NULL value means. Assuming the meaning of NULL [and >> > assuming >> > that meaning of NULL is universal] leads to confusion. This can give >> > unexpected results. >> > >> > By deferring to the meta-data (data about the data) for a non-ambiguous >> > response (action), we eliminate this confusion. A response could be to >> > (1) >> > ignore the field or (2) use some default value for the field. It makes >> > more >> > sense to let the meta-data determine what NULL means than to let each >> > person >> > or program come up with its own interpretation. >> > >> > Ignoring a field is an action. Using a default value in place of the >> > field >> > is an action. The point is to not read anything into the NULL that is >> > not >> > stipulated in the meta-data. >> > >> > DEFAULTs are poor technique. They hide the fact that no data was >> > provided, >> > and thus they corrupt the data. If there is a default, it belongs in >> > the >> > meta-data and not in the data. >> > >> > All things being equal, the simpler solution is more desirable. It is >> > possible to shadow fields with other fields that describe necessary >> > omissions. It is better to design a schema that naturally accommodates >> > incomplete or otherwise imperfect data. Such a schema will not require >> > guesswork or workaround code. >> > >> > Assuming stored procedures or triggers or any other specific >> > programming >> > technique is a mistake. This is design, and design should be flexible. >> > >> > Gerard >> > >> > "Brian Selzer" wrote: >> > >> >> I think your rule of thumb sucks. >> >> >> >> A NULL indicates the absence of a value. It does not indicate WHY the >> >> value >> >> does not exist--nor should it: if you need that information, add >> >> another >> >> column. >> >> >> >> Whether or not a column can contain NULLs is a multiplicity >> >> constraint--nothing more. It doesn't make any sense to ascribe >> >> additional >> >> meaning beyond that. >> >> >> >> It doesn't make any sense to "defer" to metadata once a row has been >> >> inserted into the database. >> >> >> >> I don't understand what you mean by "what action should be taken." It >> >> appears that you want to query the current state of the metadata in >> >> order >> >> to >> >> replace NULLs in a result set. This makes no sense for existing rows. >> >> >> >> DEFAULTs are good things. They specify the initial state of an >> >> entity. >> >> >> >> If you need to record which values were supplied, then you should add >> >> code >> >> to the stored procedure to record that information. >> >> >> >> Your rule of thum blurs the definition of metadata, thus increasing >> >> confusion. >> >> >> >> >> >> "Gerard Marshall Vignes" <Gerard Marshall >> >> Vig***@discussions.microsoft.com> >> >> wrote in message >> >> news:FA3008AF-93B8-48DC-8CAE-0B198BF9C12C@microsoft.com... >> >> >I have a rule of thumb which I would like to get opinions about. >> >> > >> >> > RULE: Provide Data or Go Meta >> >> > >> >> > SCOPE: Designing SQL Tables, UDFs and Stored Procedures >> >> > >> >> > MEANING: >> >> > >> >> > A good database design will restrict the occurrence of NULL. An >> >> > overabundance of NULLs in a database suggests, among other things, >> >> > that >> >> > the >> >> > schema is not properly normalized. It may also be difficult to >> >> > determine >> >> > what >> >> > NULL actually means in a give context, e.g., data that is needed but >> >> > missing, >> >> > data that is not necessary, etc. This uncertainty is not good. We >> >> > are >> >> > essentially corrupting our data in the process of storing it. >> >> > >> >> > I believe there is one special case where NULL is valid. This is the >> >> > case >> >> > where the absence of valid data requires that we defer to the >> >> > meta-data >> >> > to >> >> > determine what action should be taken. >> >> > >> >> > If we specify that a column can accept NULL values, then we are >> >> > saying >> >> > that >> >> > it is ACCEPTABLE to defer to the meta-data for a situation in which >> >> > the >> >> > data >> >> > is not provided (for whatever reason). >> >> > >> >> > If we specify that a column cannot accept NULL values, then we are >> >> > saying >> >> > that it is NOT ACCEPTABLE to defer to the meta-data for a situation >> >> > in >> >> > which >> >> > the data is not provided (for whatever reason). >> >> > >> >> > By capturing the knowledge that it is necessary to defer to the >> >> > meta-data, >> >> > we are leaving it up to the meta-data to tell us how to handle the >> >> > situation. >> >> > The response stipulated by the meta-data is then free to be >> >> > changed---without >> >> > changing the data. This decouples changes in the meta-data response >> >> > from >> >> > the >> >> > data itself. >> >> > >> >> > We should never use DEFAULT. We should use NULL instead. If we use >> >> > DEFAULT >> >> > to plug in valid data, then we have lost the knowledge that we >> >> > should >> >> > be >> >> > deferring to the meta-data. We have corrupted that data by erasing >> >> > the >> >> > fact >> >> > that it was not provided (for whatever reason). Whe have further >> >> > corrupted >> >> > the data by plugging in a value which may not have been valid in the >> >> > past >> >> > and/or may not be valid in the future. >> >> > >> >> > This rule of thumb attempts to avoid confustion between data and >> >> > meta-data. >> >> > >> >> > Feel free to comment (or to ignore me completely). >> >> > >> >> > If you have seen this suggested before, in SQL or any other context, >> >> > please >> >> > let me know. >> >> > >> >> > Thanks, >> >> > >> >> > Gerard >> >> > >> >> >> >> >> >> >> >> >> Gerard is no longer participating in this discussion.
For further reference, please contact: Name: John Doe Phone: 999-999-9999 Address: 123 Primrose Lane, Anytown USA 99999-9999 Thank You :-) Show quote "Brian Selzer" wrote: > > KEY POINT: Since our response to missing data may change over time, we > > need > > to retain the knowledge that the data was missing. This is the one thing > > that > > NULL does well. I advocate using the meta-data to stipulate the response > > to a > > NULL (missing data). > > The problem is that the metadata may also change over time, so you lose > information either way. If the metadata changes today, and queries rely on > it, then a query that was run yesterday will return different results than a > query that ran today. One of Codd's rules stipulates that a nondestructive > schema change shouldn't break existing applications. I would consider an > application that returns different results on different days given the same > set of data broken. > > > "Gerard Marshall Vignes" <GerardMarshallVig***@discussions.microsoft.com> > wrote in message news:8B80941F-0668-4EB2-B5E8-55EFFB117A16@microsoft.com... > > Thanks again Brian! > > > > I agree with you, Brian, and with Scott. There are times when DEFAULT is > > good technique. Logging and other activities that support forensics are > > good > > examples. I was being a purist and taking my so-called "rule-of-thumb" too > > far. > > > > KEY POINT: Since our response to missing data may change over time, we > > need > > to retain the knowledge that the data was missing. This is the one thing > > that > > NULL does well. I advocate using the meta-data to stipulate the response > > to a > > NULL (missing data). > > > > Meta-data that remains meta-data is good meta-data. Meta-data that gets > > copied into data and erases the fact that no data was provided is not good > > meta-data. > > > > Allowing missing data without stipulating how to respond to missing data > > invites confusion. Is the record usable? Should it be retained? > > > > Your point about Stored Procedures and Triggers is good. I would add to > > that > > User Defined Functions, DTS Jobs (which can run VBScript & COM!) and new > > technologies that Microsoft is adding to SQL Server. > > > > I usually decide to implement logic in the database for performance > > reasons. > > Database servers are usually very powerful and can grind out code rapidly. > > Reducing the amount of data moving between the database and its clients > > reduces network traffic. This downside is that this produces systems that > > are > > highly non-portable. > > > > That is an entirely new thread, and probably one more interesting than > > this > > one :-) > > > > Thanks Again, > > > > Gerard > > > > > > "Brian Selzer" wrote: > > > >> For the record: I wasn't advocating the use of NULLs. I believe that > >> they > >> have no place in base relations: "NULL value" is an oxymor0n. They can, > >> however, be useful in relational expressions to simplify queries. > >> > >> > DEFAULTs are poor technique. They hide the fact that no data was > >> > provided, > >> > and thus they corrupt the data. If there is a default, it belongs in > >> > the > >> > meta-data and not in the data. > >> > >> Hello McFly! DEFAULTs ARE a part of the metadata. What doesn't make > >> sense > >> is "deferring" to the metadata once a row has been inserted. For > >> example, > >> consider two types of time stamps in a temporal database. One identifies > >> the time that an external event occurred; the other identifies the time > >> that > >> information arrived in the database. For the first type, the information > >> MUST be provided by the application; for the second type, > >> DEFAULT(GETDATE()) > >> is best. It doesn't make sense for the user to specify it during an > >> INSERT > >> because blocking may cause an interval to exist between the time that the > >> INSERT is issued and the time that the information becomes part of the > >> database, and it also makes no sense for the time stamp to be absent--in > >> fact, it is usually imperative that it exist in order to extract the > >> state > >> of the database at a specific point-in-time. Your technique of > >> "deferring" > >> to the metadata for subsequent queries would cause vital information to > >> be > >> lost--that is, the time that the information arrived in the database. > >> > >> I reject your assertion that data is in any way corrupted by using > >> DEFAULTs. > >> Your argument that information is lost is unconvincing. Again, if you > >> need > >> to record the fact that a value was not provided, then explicitly record > >> that fact in another column. > >> > >> The meaning of NULL IS universal. Interpretation of that meaning in > >> context > >> is outside the scope of the database. It is the purview of the > >> application > >> developer or of the user. All that is important from the database > >> perspective is that no value exists. Again, if you need to record why it > >> doesn't exist, then add another column and record it. > >> > >> Stored procedures and triggers are part of the database. Triggers are > >> necessary to enforce some constraints that cannot be enforced > >> declaratively--for example, transition constraints. Stored procedures > >> provide a means to embed business logic in the database where it belongs. > >> (Business rules are essentially database constraints.) Sometimes > >> triggers > >> are contraindicated for performance or logistical reasons. Stored > >> procedures are the logical alternative. > >> > >> "Gerard Marshall Vignes" <GerardMarshallVig***@discussions.microsoft.com> > >> wrote in message > >> news:07461B4D-0703-4428-8795-4D21C180C531@microsoft.com... > >> > Thanks for your response, Brian! > >> > > >> > Your comment about Multiplicity is interesting. NULLs are limited to > >> > the > >> > case of 0 or 1. An additional table with a link via a foreign key is > >> > able > >> > to > >> > handle a wider range 0 to N. > >> > > >> > It is possible to build fewer tables, with more fields and use NULLs to > >> > handle optional fields. That is poor design. It is better to create new > >> > tables for optional fields and restrict the use of NULLs. In fact, it > >> > is > >> > possible to design a schema that does not use NULLs at all. Such a > >> > schema > >> > may > >> > be more complex (more tables, more joins), but it will also be more > >> > extensible. > >> > > >> > You are missing the point about missing data. The objective is not to > >> > explain why the data is missing. The point is to avoid hiding the fact > >> > that > >> > the data is missing. > >> > > >> > If a field must be provided (NOT NULL) and it is not available, then > >> > the > >> > row > >> > will not be inserted. There is no confusion here. > >> > > >> > If a field is "optional," i.e., can be NULL, then there is an > >> > opportunity > >> > for confusion. Without deferring to meta-data (data about the data), we > >> > do > >> > not know what a NULL value means. Assuming the meaning of NULL [and > >> > assuming > >> > that meaning of NULL is universal] leads to confusion. This can give > >> > unexpected results. > >> > > >> > By deferring to the meta-data (data about the data) for a non-ambiguous > >> > response (action), we eliminate this confusion. A response could be to > >> > (1) > >> > ignore the field or (2) use some default value for the field. It makes > >> > more > >> > sense to let the meta-data determine what NULL means than to let each > >> > person > >> > or program come up with its own interpretation. > >> > > >> > Ignoring a field is an action. Using a default value in place of the > >> > field > >> > is an action. The point is to not read anything into the NULL that is > >> > not > >> > stipulated in the meta-data. > >> > > >> > DEFAULTs are poor technique. They hide the fact that no data was > >> > provided, > >> > and thus they corrupt the data. If there is a default, it belongs in > >> > the > >> > meta-data and not in the data. > >> > > >> > All things being equal, the simpler solution is more desirable. It is > >> > possible to shadow fields with other fields that describe necessary > >> > omissions. It is better to design a schema that naturally accommodates > >> > incomplete or otherwise imperfect data. Such a schema will not require > >> > guesswork or workaround code. > >> > > >> > Assuming stored procedures or triggers or any other specific > >> > programming > >> > technique is a mistake. This is design, and design should be flexible. > >> > > >> > Gerard > >> > > >> > "Brian Selzer" wrote: > >> > > >> >> I think your rule of thumb sucks. > >> >> > >> >> A NULL indicates the absence of a value. It does not indicate WHY the > >> >> value > >> >> does not exist--nor should it: if you need that information, add > >> >> another > >> >> column. > >> >> > >> >> Whether or not a column can contain NULLs is a multiplicity > >> >> constraint--nothing more. It doesn't make any sense to ascribe > >> >> additional > >> >> meaning beyond that. > >> >> > >> >> It doesn't make any sense to "defer" to metadata once a row has been > >> >> inserted into the database. > >> >> > >> >> I don't understand what you mean by "what action should be taken." It > >> >> appears that you want to query the current state of the metadata in > >> >> order > >> >> to > >> >> replace NULLs in a result set. This makes no sense for existing rows. > >> >> > >> >> DEFAULTs are good things. They specify the initial state of an > >> >> entity. > >> >> > >> >> If you need to record which values were supplied, then you should add > >> >> code > >> >> to the stored procedure to record that information. > >> >> > >> >> Your rule of thum blurs the definition of metadata, thus increasing > >> >> confusion. > >> >> > >> >> > >> >> "Gerard Marshall Vignes" <Gerard Marshall > >> >> Vig***@discussions.microsoft.com> > >> >> wrote in message > >> >> news:FA3008AF-93B8-48DC-8CAE-0B198BF9C12C@microsoft.com... > >> >> >I have a rule of thumb which I would like to get opinions about. > >> >> > > >> >> > RULE: Provide Data or Go Meta > >> >> > > >> >> > SCOPE: Designing SQL Tables, UDFs and Stored Procedures > >> >> > > >> >> > MEANING: > >> >> > > >> >> > A good database design will restrict the occurrence of NULL. An > >> >> > overabundance of NULLs in a database suggests, among other things, > >> >> > that > >> >> > the > >> >> > schema is not properly normalized. It may also be difficult to > >> >> > determine > >> >> > what > >> >> > NULL actually means in a give context, e.g., data that is needed but > >> >> > missing, > >> >> > data that is not necessary, etc. This uncertainty is not good. We > >> >> > are > >> >> > essentially corrupting our data in the process of storing it. > >> >> > > >> >> > I believe there is one special case where NULL is valid. This is the > >> >> > case > >> >> > where the absence of valid data requires that we defer to the > >> >> > meta-data > >> >> > to > >> >> > determine what action should be taken. > >> >> > > >> >> > If we specify that a column can accept NULL values, then we are > >> >> > saying > >> >> > that > >> >> > it is ACCEPTABLE to defer to the meta-data for a situation in which > >> >> > the > >> >> > data > >> >> > is not provided (for whatever reason). > >> >> > > >> >> > If we specify that a column cannot accept NULL values, then we are > >> >> > saying > >> >> > that it is NOT ACCEPTABLE to defer to the meta-data for a situation > >> >> > in > >> >> > which > >> >> > the data is not provided (for whatever reason). > >> >> > > >> >> > By capturing the knowledge that it is necessary to defer to the > >> >> > meta-data, > >> >> > we are leaving it up to the meta-data to tell us how to handle the > >> >> > situation. > >> >> > The response stipulated by the meta-data is then free to be > >> >> > changed---without > >> >> > changing the data. This decouples changes in the meta-data response > >> >> > from > >> >> > the > >> >> > data itself. > >> >> > > >> >> > We should never use DEFAULT. We should use NULL instead. If we use > >> >> > DEFAULT > >> >> > to plug in valid data, then we have lost the knowledge that we > >> >> > should > >> >> > be > >> >> > deferring to the meta-data. We have corrupted that data by erasing > >> >> > the > >> >> > fact > >> >> > that it was not provided (for whatever reason). Whe have further > A good database design will restrict the occurrence of NULL. An I agree with this rule.> overabundance of NULLs in a database suggests, among other things, that > the > schema is not properly normalized. It may also be difficult to determine > what > We should never use DEFAULT. We should use NULL instead. If we use DEFAULT I don't completely agree with this rule. One popular use for defaults is > to plug in valid data, then we have lost the knowledge that we should be auditing (to track the identity of the creator and datetime of creation). However, a gui-based system will often require that a "default" be visible (and changeable) to the user, thus obviating the need for the corresponding default within the schema. In general, I think a "rule of thumb" should avoid absolutes ("never"). Thanks Scott!
My mistake. I agree that absolutes should be avoided. Logging activities are an example of where DEFAULT is good. Thank you for pointing this out. I was a purist. Gerard Show quote "Scott Morris" wrote: > > A good database design will restrict the occurrence of NULL. An > > overabundance of NULLs in a database suggests, among other things, that > > the > > schema is not properly normalized. It may also be difficult to determine > > what > > I agree with this rule. > > > We should never use DEFAULT. We should use NULL instead. If we use DEFAULT > > to plug in valid data, then we have lost the knowledge that we should be > > I don't completely agree with this rule. One popular use for defaults is > auditing (to track the identity of the creator and datetime of creation). > However, a gui-based system will often require that a "default" be visible > (and changeable) to the user, thus obviating the need for the corresponding > default within the schema. In general, I think a "rule of thumb" should > avoid absolutes ("never"). > > > |
|||||||||||||||||||||||