|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Best practice BIT or SET('no','yes') ?I know it's quite common to use BIT fields for boolean values. CREATE TABLE tblTest ( door BIT DEFAULT 0, accept BIT DEFAULT 0 ) instead of: CREATE TABLE tblTest ( door SET('close','open') DEFAULT 'close', accept SET('no','yes') DEFAULT 'no' ) (By the way, I use MSSQL and MySQL, not sure if I'm using the right datatypes for MSSQL) I reckon the first uses less storage space, but the meaning of the values in the latter is more unanimous. So what's best practice? I'm tempted to use the latter, but I almost always see the first used everywhere. I'm definitely interested in what CELKO has to say about this. Lisa My opinion, FWIW
-Comarison operators with bit datatypes will be faster -You don't have to worry about differences in collations when comparing -More efficient storage on disk -You could always add the unanimous-ness (which I'm positive is not a word!) in a select that extracts the data out of these fields like select case when door = 0 then 'closed' else 'open' end from tblTest Proper T-SQL syntax for the character-based table would be CREATE TABLE tblTest ( door varchar(5) CHECK (door IN ('close','open')) DEFAULT 'close', accept varchar(5) CHECK (accept IN ('no','yes')) DEFAULT 'no' ) -- Show quote"Lisa Pearlson" wrote: > Hi, > > I know it's quite common to use BIT fields for boolean values. > > CREATE TABLE tblTest ( > door BIT DEFAULT 0, > accept BIT DEFAULT 0 > ) > > instead of: > > CREATE TABLE tblTest ( > door SET('close','open') DEFAULT 'close', > accept SET('no','yes') DEFAULT 'no' > ) > > (By the way, I use MSSQL and MySQL, not sure if I'm using the right > datatypes for MSSQL) > > I reckon the first uses less storage space, but the meaning of the values in > the latter is more unanimous. > So what's best practice? I'm tempted to use the latter, but I almost always > see the first used everywhere. > > I'm definitely interested in what CELKO has to say about this. > > Lisa > > > Here we use the latter one cuz too many different programmers working off
the same database and each programmer has their own programs that they are responsible for. I prefer to use the first one to save size. Show quote "Lisa Pearlson" <no@spam.plz> wrote in message news:O1u6wZDKGHA.3984@TK2MSFTNGP14.phx.gbl... > Hi, > > I know it's quite common to use BIT fields for boolean values. > > CREATE TABLE tblTest ( > door BIT DEFAULT 0, > accept BIT DEFAULT 0 > ) > > instead of: > > CREATE TABLE tblTest ( > door SET('close','open') DEFAULT 'close', > accept SET('no','yes') DEFAULT 'no' > ) > > (By the way, I use MSSQL and MySQL, not sure if I'm using the right > datatypes for MSSQL) > > I reckon the first uses less storage space, but the meaning of the values > in the latter is more unanimous. > So what's best practice? I'm tempted to use the latter, but I almost > always see the first used everywhere. > > I'm definitely interested in what CELKO has to say about this. > > Lisa > Lisa Pearlson wrote:
Show quote > Hi, My main objection to BIT is that its behaviour is too strange and> > I know it's quite common to use BIT fields for boolean values. > > CREATE TABLE tblTest ( > door BIT DEFAULT 0, > accept BIT DEFAULT 0 > ) > > instead of: > > CREATE TABLE tblTest ( > door SET('close','open') DEFAULT 'close', > accept SET('no','yes') DEFAULT 'no' > ) > > (By the way, I use MSSQL and MySQL, not sure if I'm using the right > datatypes for MSSQL) > > I reckon the first uses less storage space, but the meaning of the values in > the latter is more unanimous. > So what's best practice? I'm tempted to use the latter, but I almost always > see the first used everywhere. > > I'm definitely interested in what CELKO has to say about this. > > Lisa counter-intuitive. For example, can you predict or explain the results of the following INSERT statement and the SELECT statement? CREATE TABLE T (x BIT NOT NULL); INSERT INTO T VALUES (2); SELECT MAX(x) FROM T; In most cases BIT's saving on storage is probably modest. I prefer concise, readable status codes (CHAR(1) for example) and when I need to add a third status I don't need to change the datatype I just change a CHECK constraint. -- David Portas, SQL Server MVP Whenever possible please post enough code to reproduce your problem. Including CREATE TABLE and INSERT statements usually helps. State what version of SQL Server you are using and specify the content of any error messages. SQL Server Books Online: http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx -- Like David, I usually use a 1 character flag. In my experience, when I
initially think I have a binary status, it is really a multi-varied flag. In other words, as soon as I code to tell if the door is open or closed, someone else wants to know if it's locked. Payson Lisa Pearlson wrote: Show quote > Hi, > > I know it's quite common to use BIT fields for boolean values. > > CREATE TABLE tblTest ( > door BIT DEFAULT 0, > accept BIT DEFAULT 0 > ) > > instead of: > > CREATE TABLE tblTest ( > door SET('close','open') DEFAULT 'close', > accept SET('no','yes') DEFAULT 'no' > ) > > (By the way, I use MSSQL and MySQL, not sure if I'm using the right > datatypes for MSSQL) > > I reckon the first uses less storage space, but the meaning of the values in > the latter is more unanimous. > So what's best practice? I'm tempted to use the latter, but I almost always > see the first used everywhere. > > I'm definitely interested in what CELKO has to say about this. > > Lisa > In other words, as soon as I code to tell if the door is open or I love the analogy. :-)> closed, someone else wants to know if it's locked. -- Show quoteTibor Karaszi, SQL Server MVP http://www.karaszi.com/sqlserver/default.asp http://www.solidqualitylearning.com/ Blog: http://solidqualitylearning.com/blogs/tibor/ "Payson" <payso***@hotmail.com> wrote in message news:1138914380.158598.148590@f14g2000cwb.googlegroups.com... > Like David, I usually use a 1 character flag. In my experience, when I > initially think I have a binary status, it is really a multi-varied > flag. In other words, as soon as I code to tell if the door is open or > closed, someone else wants to know if it's locked. > > Payson > > Lisa Pearlson wrote: >> Hi, >> >> I know it's quite common to use BIT fields for boolean values. >> >> CREATE TABLE tblTest ( >> door BIT DEFAULT 0, >> accept BIT DEFAULT 0 >> ) >> >> instead of: >> >> CREATE TABLE tblTest ( >> door SET('close','open') DEFAULT 'close', >> accept SET('no','yes') DEFAULT 'no' >> ) >> >> (By the way, I use MSSQL and MySQL, not sure if I'm using the right >> datatypes for MSSQL) >> >> I reckon the first uses less storage space, but the meaning of the values in >> the latter is more unanimous. >> So what's best practice? I'm tempted to use the latter, but I almost always >> see the first used everywhere. >> >> I'm definitely interested in what CELKO has to say about this. >> >> Lisa > >> In other words, as soon as I code to tell if the door is open or closed, someone else wants to know if it's locked. I love the analogy. :-) <Me, too!! I think I will steal it in the next edition of one of my books! Hey, Imitation is the sincerest form of flattery; Plagiarism is the sincerest form of imitation! Lisa--
Great question. One way of clearing this up is by naming your columns in a way where it won't be ambiguous. If you know that a Door only has two states, Open and Closed, the meaning will be clear with a BIT datatype if you use a convention like "IsDoorOpen". That way, it's obvious what "1" and "0" mean. If it's possible that a door will have more than 2 states in the future, use a single character {[O]pen, [C]losed, [A]jar} to represent the state of the door. In that case, you might call your column DoorStatus, or DoorState... Likewise, you can name your "Accept" column "IsAccepted" and clarify the meaning there. At my company, we use "Is" to prefix just about all of the boolean values that we put in our code for just this reason. HTH -Dave Markle Show quote "Lisa Pearlson" wrote: > Hi, > > I know it's quite common to use BIT fields for boolean values. > > CREATE TABLE tblTest ( > door BIT DEFAULT 0, > accept BIT DEFAULT 0 > ) > > instead of: > > CREATE TABLE tblTest ( > door SET('close','open') DEFAULT 'close', > accept SET('no','yes') DEFAULT 'no' > ) > > (By the way, I use MSSQL and MySQL, not sure if I'm using the right > datatypes for MSSQL) > > I reckon the first uses less storage space, but the meaning of the values in > the latter is more unanimous. > So what's best practice? I'm tempted to use the latter, but I almost always > see the first used everywhere. > > I'm definitely interested in what CELKO has to say about this. > > Lisa > > > Stop writing code as if you were an assembly language programmer in
1957. Machine level things like a BIT or BYTE datatype have no place in a high level language like SQL. SQL is a high level language; it is abstract and defined without regard to PHYSICAL implementation. This basic principle of data modeling is called data abstraction. Bits and Bytes are the <i>lowest<i> units of hardware-specific, physical implementation you can get. Are you on a high-end or low-end machine? Does the machine have 8, 16, 32, 64, or 128 bit words? Twos complement or ones complement math? Hey, the standards allow decimal machines, so bits do not exist at all!! What about NULLs? To be an SQL datatype, you have to have NULLs, so what is a NULL bit? By definition a bit, is on or off and has no NULL. What does the implementation of the host languages do with bits? Did you know that +1, +0, -0 and -1 are all used for BOOLEANs, but not consistently (look at C# and VB from the same vendor)? That means <i>all<i> the host languages -- present, future and not-yet-defined. Surely, no good programmer would ever write non-portable code by getting to such a low level as bit fiddling!! There are two situations in practice. Either the bits are individual attributes or they are used as a vector to represent a single attribute. In the case of a single attribute, the encoding is limited to two values, which do not port to host languages or other SQLs, cannot be easily understood by an end user, and which cannot be expanded. Use CHAR(1) which will move. In the second case what some Newbies, who are still thinking in terms of second and third generation programming languages or even punch cards, do is build a vector for a series of "yes/no" status codes, failing to see the status vector as a single attribute. Did you ever play the children's game "20 Questions" when you were young? Bingo!! Imagine you have six components for a loan approval, so you allocate bits in your second generation model of the world. You have 64 possible vectors, but only 5 of them are valid (i.e. you cannot be rejected for bankruptcy and still have good credit). For your data integrity, you can: 1) Ignore the problem. This is actually what <i>most<i> newbies do. 2) Write elaborate CHECK() constraints with user defined functions or proprietary bit level library functions that cannot port and that run like cold glue. Now we add a 7-th condition to the vector -- which end does it go on? Why? How did you get it in the right place on all the possible hardware that it will ever use? Did all the code that references a bit in a word by its position do it right after the change? You need to sit down and think about how to design an encoding of the data that is high level, general enough to expand, abstract and portable. For example, is that loan approval a hierarchical code? concatenation code? vector code? etc? Did you provide codes for unknown, missing and N/A values? It is not easy to design such things! Get a copy of SQL PROGRAMMING STYLE and look at the chapters on design encoding schemes. An excellent post Celko. I wish more of your posts were like this.
i.e. You focused more on the solution and the reasons behind it than insulting folks who are doing it wrong. When you make posts like this one we can all learn a little bit, without being offended in the process. Show quote "--CELKO--" <jcelko***@earthlink.net> wrote in message news:1139090746.005907.99750@g14g2000cwa.googlegroups.com... > Stop writing code as if you were an assembly language programmer in > 1957. > > Machine level things like a BIT or BYTE datatype have no place in a > high level language like SQL. SQL is a high level language; it is > abstract and defined without regard to PHYSICAL implementation. This > basic principle of data modeling is called data abstraction. > > Bits and Bytes are the <i>lowest<i> units of hardware-specific, > physical implementation you can get. Are you on a high-end or low-end > machine? Does the machine have 8, 16, 32, 64, or 128 bit words? Twos > complement or ones complement math? Hey, the standards allow decimal > machines, so bits do not exist at all!! What about NULLs? To be an > SQL datatype, you have to have NULLs, so what is a NULL bit? By > definition a bit, is on or off and has no NULL. > > What does the implementation of the host languages do with bits? Did > you know that +1, +0, -0 and -1 are all used for BOOLEANs, but not > consistently (look at C# and VB from the same vendor)? That means > <i>all<i> the host languages -- present, future and not-yet-defined. > Surely, no good programmer would ever write non-portable code by > getting to such a low level as bit fiddling!! > > There are two situations in practice. Either the bits are individual > attributes or they are used as a vector to represent a single > attribute. In the case of a single attribute, the encoding is limited > to two values, which do not port to host languages or other SQLs, > cannot be easily understood by an end user, and which cannot be > expanded. Use CHAR(1) which will move. > > In the second case what some Newbies, who are still thinking in terms > of second and third generation programming languages or even punch > cards, do is build a vector for a series of "yes/no" status codes, > failing to see the status vector as a single attribute. Did you ever > play the children's game "20 Questions" when you were young? Bingo!! > > Imagine you have six components for a loan approval, so you allocate > bits in your second generation model of the world. You have 64 possible > vectors, but only 5 of them are valid (i.e. you cannot be rejected for > bankruptcy and still have good credit). For your data integrity, you > can: > > 1) Ignore the problem. This is actually what <i>most<i> newbies do. > > 2) Write elaborate CHECK() constraints with user defined functions or > proprietary bit level library functions that cannot port and that run > like cold glue. > > Now we add a 7-th condition to the vector -- which end does it go on? > Why? How did you get it in the right place on all the possible > hardware that it will ever use? Did all the code that references a bit > in a word by its position do it right after the change? > > You need to sit down and think about how to design an encoding of the > data that is high level, general enough to expand, abstract and > portable. For example, is that loan approval a hierarchical code? > concatenation code? vector code? etc? Did you provide codes for > unknown, missing and N/A values? It is not easy to design such things! > > Get a copy of SQL PROGRAMMING STYLE and look at the chapters on design > encoding schemes. > >> An excellent post Celko. I wish more of your posts were like this. << They are! That was one of my standard "cut & paste" replies.Lisa has not done her homework, I think she can be saved. She has not panicked when I hit her with my "Zen Stick", so she gets longer answers. And if she gives me something useful for a book, she gets free copy. Not sure if this was directed at me, but I said I'm tempted to use SET
instead of BIT, which seems more common. However, I meant ENUM instead of SET, and I guess neither even exists in MS-SQL unlike MySQL. So the recommendation is to cryptically encode the meaning of a value into a CHAR(1). That's very dissapointing for an RDBMS I think. So if I have a field that can only have the values 'black' or 'blue', I have to make the field CHAR(1) and use 'b' for one and some other character for the other color, and leave people guessing what they stand for. And then to enforce that only these 2 colors can be entered and no other character, I have to add check constraints? Seems like an ENUM data type like in MySQL would've been a much suitable solution. Lisa Show quote "--CELKO--" <jcelko***@earthlink.net> wrote in message news:1139090746.005907.99750@g14g2000cwa.googlegroups.com... > Stop writing code as if you were an assembly language programmer in > 1957. > > Machine level things like a BIT or BYTE datatype have no place in a > high level language like SQL. SQL is a high level language; it is > abstract and defined without regard to PHYSICAL implementation. This > basic principle of data modeling is called data abstraction. > > Bits and Bytes are the <i>lowest<i> units of hardware-specific, > physical implementation you can get. Are you on a high-end or low-end > machine? Does the machine have 8, 16, 32, 64, or 128 bit words? Twos > complement or ones complement math? Hey, the standards allow decimal > machines, so bits do not exist at all!! What about NULLs? To be an > SQL datatype, you have to have NULLs, so what is a NULL bit? By > definition a bit, is on or off and has no NULL. > > What does the implementation of the host languages do with bits? Did > you know that +1, +0, -0 and -1 are all used for BOOLEANs, but not > consistently (look at C# and VB from the same vendor)? That means > <i>all<i> the host languages -- present, future and not-yet-defined. > Surely, no good programmer would ever write non-portable code by > getting to such a low level as bit fiddling!! > > There are two situations in practice. Either the bits are individual > attributes or they are used as a vector to represent a single > attribute. In the case of a single attribute, the encoding is limited > to two values, which do not port to host languages or other SQLs, > cannot be easily understood by an end user, and which cannot be > expanded. Use CHAR(1) which will move. > > In the second case what some Newbies, who are still thinking in terms > of second and third generation programming languages or even punch > cards, do is build a vector for a series of "yes/no" status codes, > failing to see the status vector as a single attribute. Did you ever > play the children's game "20 Questions" when you were young? Bingo!! > > Imagine you have six components for a loan approval, so you allocate > bits in your second generation model of the world. You have 64 possible > vectors, but only 5 of them are valid (i.e. you cannot be rejected for > bankruptcy and still have good credit). For your data integrity, you > can: > > 1) Ignore the problem. This is actually what <i>most<i> newbies do. > > 2) Write elaborate CHECK() constraints with user defined functions or > proprietary bit level library functions that cannot port and that run > like cold glue. > > Now we add a 7-th condition to the vector -- which end does it go on? > Why? How did you get it in the right place on all the possible > hardware that it will ever use? Did all the code that references a bit > in a word by its position do it right after the change? > > You need to sit down and think about how to design an encoding of the > data that is high level, general enough to expand, abstract and > portable. For example, is that loan approval a hierarchical code? > concatenation code? vector code? etc? Did you provide codes for > unknown, missing and N/A values? It is not easy to design such things! > > Get a copy of SQL PROGRAMMING STYLE and look at the chapters on design > encoding schemes. > >> So if I have a field [sic] that can only have the values 'black' or 'blue', I have to make the field CHAR(1) and use 'b' for one and some other character for the other color, and leave people guessing what they stand for.And then to enforce that only these 2 colors can be entered and no other character, I have to add check constraints? << Columns are not fields; here is one of many reason.This is a bad example. Colors have a lot of standard encoding schemes (Land Color number, Pantone system, RGB, etc.). This is the DOMAIN concept which is not part of a field model. I would hold up the Standard like a Bible and make your column into your industry standard system. Now restrict it with a CHECK() constraint for now. When you allow "bloody red", "serious infection green", etc. just change the constraint to expand the domain. >> Seems like an ENUM data type like in MySQL would've been a much suitable solution. <<No, just use a CHECK() in SQL. And never have a column with mixed data types int he RM. Well, when I was thinking what the advantages are of an ENUM datatype,
efficient storage by reference (eg ENUM('red', 'blue')), I realized the most obvious: I create a table Colors, with these 2 colors, and then use foreign key constraint in another table. I'm guessing in essence that's what an ENUM does. Same could be done with "Door states" with rows like "open", "closed", "locked" etc. Maybe it's a bit extra work, but it's better modelling than encoding the meaning of a value in a single char with check constraints. using CHAR(1) over BIT may be more portable, but in my opinion still not good database design. That's still too much like machine code, but instead of boolean, you use char. Creating an extra table might be less 'practicle' but wouldn't you agree it's more in agreement with good RDBMS modelling principles than CHAR(1) ? Lisa --CELKO-- schreef: Show quote > >> So if I have a field [sic] that can only have the values 'black' or 'blue', I have to make the field CHAR(1) and use 'b' for one and some other character for the other color, and leave people guessing what they stand for.And then to enforce that only these 2 colors can be entered and no other character, I have to add check constraints? << > > Columns are not fields; here is one of many reason. > > This is a bad example. Colors have a lot of standard encoding schemes > (Land Color number, Pantone system, RGB, etc.). This is the DOMAIN > concept which is not part of a field model. > > I would hold up the Standard like a Bible and make your column into > your industry standard system. Now restrict it with a CHECK() > constraint for now. When you allow "bloody red", "serious infection > green", etc. just change the constraint to expand the domain. > > >> Seems like an ENUM data type like in MySQL would've been a much suitable > solution. << > > No, just use a CHECK() in SQL. And never have a column with mixed data > types int he RM. |
|||||||||||||||||||||||