Home All Groups Group Topic Archive Search About

Best practice BIT or SET('no','yes') ?

Author
2 Feb 2006 8:23 PM
Lisa Pearlson
Hi,

I know it's quite common to use BIT fields for boolean values.

CREATE TABLE tblTest (
door    BIT DEFAULT 0,
accept BIT DEFAULT 0
)

instead of:

CREATE TABLE tblTest (
door    SET('close','open') DEFAULT 'close',
accept SET('no','yes') DEFAULT 'no'
)

(By the way, I use MSSQL and MySQL, not sure if I'm using the right
datatypes for MSSQL)

I reckon the first uses less storage space, but the meaning of the values in
the latter is more unanimous.
So what's best practice? I'm tempted to use the latter, but I almost always
see the first used everywhere.

I'm definitely interested in what CELKO has to say about this.

Lisa

Author
2 Feb 2006 8:50 PM
Mark Williams
My opinion, FWIW

-Comarison operators with bit datatypes will be faster
-You don't have to worry about differences in collations when comparing
-More efficient storage on disk
-You could always add the unanimous-ness (which I'm positive is not a word!)
in a select that extracts the data out of these fields like

select case when door = 0 then 'closed' else 'open' end from tblTest

Proper T-SQL syntax for the character-based table would be

CREATE TABLE tblTest (
door varchar(5) CHECK (door IN ('close','open')) DEFAULT 'close',
accept varchar(5) CHECK (accept IN ('no','yes')) DEFAULT 'no'
)
--

Show quote
"Lisa Pearlson" wrote:

> Hi,
>
> I know it's quite common to use BIT fields for boolean values.
>
> CREATE TABLE tblTest (
> door    BIT DEFAULT 0,
> accept BIT DEFAULT 0
> )
>
> instead of:
>
> CREATE TABLE tblTest (
> door    SET('close','open') DEFAULT 'close',
> accept SET('no','yes') DEFAULT 'no'
> )
>
> (By the way, I use MSSQL and MySQL, not sure if I'm using the right
> datatypes for MSSQL)
>
> I reckon the first uses less storage space, but the meaning of the values in
> the latter is more unanimous.
> So what's best practice? I'm tempted to use the latter, but I almost always
> see the first used everywhere.
>
> I'm definitely interested in what CELKO has to say about this.
>
> Lisa
>
>
>
Author
2 Feb 2006 8:50 PM
Grant
Here we use the latter one cuz too many different programmers working off
the same database and each programmer has their own programs that they are
responsible for.





I prefer to use the first one to save size.



Show quote
"Lisa Pearlson" <no@spam.plz> wrote in message
news:O1u6wZDKGHA.3984@TK2MSFTNGP14.phx.gbl...
> Hi,
>
> I know it's quite common to use BIT fields for boolean values.
>
> CREATE TABLE tblTest (
> door    BIT DEFAULT 0,
> accept BIT DEFAULT 0
> )
>
> instead of:
>
> CREATE TABLE tblTest (
> door    SET('close','open') DEFAULT 'close',
> accept SET('no','yes') DEFAULT 'no'
> )
>
> (By the way, I use MSSQL and MySQL, not sure if I'm using the right
> datatypes for MSSQL)
>
> I reckon the first uses less storage space, but the meaning of the values
> in the latter is more unanimous.
> So what's best practice? I'm tempted to use the latter, but I almost
> always see the first used everywhere.
>
> I'm definitely interested in what CELKO has to say about this.
>
> Lisa
>
Author
2 Feb 2006 8:56 PM
David Portas
Lisa Pearlson wrote:

Show quote
> Hi,
>
> I know it's quite common to use BIT fields for boolean values.
>
> CREATE TABLE tblTest (
> door    BIT DEFAULT 0,
> accept BIT DEFAULT 0
> )
>
> instead of:
>
> CREATE TABLE tblTest (
> door    SET('close','open') DEFAULT 'close',
> accept SET('no','yes') DEFAULT 'no'
> )
>
> (By the way, I use MSSQL and MySQL, not sure if I'm using the right
> datatypes for MSSQL)
>
> I reckon the first uses less storage space, but the meaning of the values in
> the latter is more unanimous.
> So what's best practice? I'm tempted to use the latter, but I almost always
> see the first used everywhere.
>
> I'm definitely interested in what CELKO has to say about this.
>
> Lisa

My main objection to BIT is that its behaviour is too strange and
counter-intuitive. For example, can you predict or explain the results
of the following INSERT statement and the SELECT statement?

CREATE TABLE T (x BIT NOT NULL);

INSERT INTO T VALUES (2);
SELECT MAX(x) FROM T;

In most cases BIT's saving on storage is probably modest. I prefer
concise, readable status codes (CHAR(1) for example) and when I need to
add a third status I don't need to change the datatype I just change a
CHECK constraint.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--
Author
2 Feb 2006 9:06 PM
Payson
Like David, I usually use a 1 character flag.  In my experience, when I
initially think I have  a binary status, it is really a multi-varied
flag.  In other words, as soon as I code to tell if the door is open or
closed, someone else wants to know if it's locked.

Payson

Lisa Pearlson wrote:
Show quote
> Hi,
>
> I know it's quite common to use BIT fields for boolean values.
>
> CREATE TABLE tblTest (
> door    BIT DEFAULT 0,
> accept BIT DEFAULT 0
> )
>
> instead of:
>
> CREATE TABLE tblTest (
> door    SET('close','open') DEFAULT 'close',
> accept SET('no','yes') DEFAULT 'no'
> )
>
> (By the way, I use MSSQL and MySQL, not sure if I'm using the right
> datatypes for MSSQL)
>
> I reckon the first uses less storage space, but the meaning of the values in
> the latter is more unanimous.
> So what's best practice? I'm tempted to use the latter, but I almost always
> see the first used everywhere.
>
> I'm definitely interested in what CELKO has to say about this.
>
> Lisa
Author
3 Feb 2006 8:22 AM
Tibor Karaszi
> In other words, as soon as I code to tell if the door is open or
> closed, someone else wants to know if it's locked.


I love the analogy. :-)

Show quote
"Payson" <payso***@hotmail.com> wrote in message
news:1138914380.158598.148590@f14g2000cwb.googlegroups.com...
> Like David, I usually use a 1 character flag.  In my experience, when I
> initially think I have  a binary status, it is really a multi-varied
> flag.  In other words, as soon as I code to tell if the door is open or
> closed, someone else wants to know if it's locked.
>
> Payson
>
> Lisa Pearlson wrote:
>> Hi,
>>
>> I know it's quite common to use BIT fields for boolean values.
>>
>> CREATE TABLE tblTest (
>> door    BIT DEFAULT 0,
>> accept BIT DEFAULT 0
>> )
>>
>> instead of:
>>
>> CREATE TABLE tblTest (
>> door    SET('close','open') DEFAULT 'close',
>> accept SET('no','yes') DEFAULT 'no'
>> )
>>
>> (By the way, I use MSSQL and MySQL, not sure if I'm using the right
>> datatypes for MSSQL)
>>
>> I reckon the first uses less storage space, but the meaning of the values in
>> the latter is more unanimous.
>> So what's best practice? I'm tempted to use the latter, but I almost always
>> see the first used everywhere.
>>
>> I'm definitely interested in what CELKO has to say about this.
>>
>> Lisa
>
Author
4 Feb 2006 9:56 PM
--CELKO--
>> In other words, as soon as I code to tell if the door is open or  closed, someone else wants to know if it's locked.
I love the analogy. :-)  <

Me, too!!  I think I will steal it in the next edition of one of my
books! Hey, Imitation is the sincerest form of flattery;  Plagiarism is
the sincerest form of imitation!
Author
4 Feb 2006 8:08 PM
Dave Markle
Lisa--

Great question.  One way of clearing this up is by naming your columns in a
way where it won't be ambiguous.  If you know that a Door only has two
states, Open and Closed, the meaning will be clear with a BIT datatype if you
use a convention like "IsDoorOpen".  That way, it's obvious what "1" and "0"
mean.  If it's possible that a door will have more than 2 states in the
future, use a single character {[O]pen, [C]losed, [A]jar} to represent the
state of the door.  In that case, you might call your column DoorStatus, or
DoorState...  Likewise, you can name your "Accept" column "IsAccepted" and
clarify the meaning there.  At my company, we use "Is" to prefix just about
all of the boolean values that we put in our code for just this reason.

HTH
-Dave Markle

Show quote
"Lisa Pearlson" wrote:

> Hi,
>
> I know it's quite common to use BIT fields for boolean values.
>
> CREATE TABLE tblTest (
> door    BIT DEFAULT 0,
> accept BIT DEFAULT 0
> )
>
> instead of:
>
> CREATE TABLE tblTest (
> door    SET('close','open') DEFAULT 'close',
> accept SET('no','yes') DEFAULT 'no'
> )
>
> (By the way, I use MSSQL and MySQL, not sure if I'm using the right
> datatypes for MSSQL)
>
> I reckon the first uses less storage space, but the meaning of the values in
> the latter is more unanimous.
> So what's best practice? I'm tempted to use the latter, but I almost always
> see the first used everywhere.
>
> I'm definitely interested in what CELKO has to say about this.
>
> Lisa
>
>
>
Author
4 Feb 2006 10:05 PM
--CELKO--
Stop writing code as if you were an assembly language programmer in
1957.

Machine level things like a BIT or BYTE datatype have no place in a
high level language like SQL.  SQL is a high level language; it is
abstract and defined without regard to PHYSICAL implementation.  This
basic principle of data modeling is called data abstraction.

Bits and Bytes are the <i>lowest<i> units of hardware-specific,
physical implementation you can get.  Are you on a high-end or low-end
machine?  Does the machine have 8, 16, 32, 64, or 128 bit words? Twos
complement or ones complement math?  Hey, the standards allow decimal
machines, so bits do not exist at all!!  What about NULLs?  To be an
SQL datatype, you have to have NULLs, so what is a NULL bit?  By
definition a bit, is on or off and has no NULL.

What does the implementation of the host languages do with bits?  Did
you know that +1, +0, -0 and -1 are all used for BOOLEANs, but not
consistently (look at C# and VB from the same vendor)?  That means
<i>all<i> the host languages -- present, future and not-yet-defined.
Surely, no good programmer would ever write non-portable code by
getting to such a low level as bit fiddling!!

There are two situations in practice.  Either the bits are individual
attributes or they are used as a vector to represent a single
attribute.  In the case of a single attribute, the encoding is limited
to two values, which do not port to host languages or other SQLs,
cannot be easily understood by an end user, and which cannot be
expanded.  Use CHAR(1) which will move.

In the second case what some Newbies, who are still thinking in terms
of second and third generation programming languages or even punch
cards, do is build a vector for a series of "yes/no" status codes,
failing to see the status vector as a single attribute.  Did you ever
play the children's game "20 Questions" when you were young?  Bingo!!

Imagine you have six components for a loan approval, so you allocate
bits in your second generation model of the world. You have 64 possible
vectors, but only 5 of them are valid (i.e. you cannot be rejected for
bankruptcy and still have good credit).  For your data integrity, you
can:

1) Ignore the problem.  This is actually what <i>most<i> newbies do.

2) Write elaborate CHECK() constraints with user defined functions or
proprietary bit level library functions that cannot port and that run
like cold glue.

Now we add a 7-th condition to the vector -- which end does it go on?
Why?  How did you get it in the right place on all the possible
hardware that it will ever use?  Did all the code that references a bit
in a word by its position do it right after the change?

You need to sit down and think about how to design an encoding of the
data that is high level, general enough to expand, abstract and
portable.  For example, is that loan approval a hierarchical code?
concatenation code? vector code? etc?  Did you provide codes for
unknown, missing and N/A values? It is not easy to design such things!

Get a copy of SQL PROGRAMMING STYLE and look at the chapters on design
encoding schemes.
Author
6 Feb 2006 3:16 PM
Jim Underwood
An excellent post Celko.  I wish more of your posts were like this.

i.e. You focused more on the solution and the reasons behind it than
insulting folks who are doing it wrong.

When you make posts like this one we can all learn a little bit, without
being offended in the process.


Show quote
"--CELKO--" <jcelko***@earthlink.net> wrote in message
news:1139090746.005907.99750@g14g2000cwa.googlegroups.com...
> Stop writing code as if you were an assembly language programmer in
> 1957.
>
> Machine level things like a BIT or BYTE datatype have no place in a
> high level language like SQL.  SQL is a high level language; it is
> abstract and defined without regard to PHYSICAL implementation.  This
> basic principle of data modeling is called data abstraction.
>
> Bits and Bytes are the <i>lowest<i> units of hardware-specific,
> physical implementation you can get.  Are you on a high-end or low-end
> machine?  Does the machine have 8, 16, 32, 64, or 128 bit words? Twos
> complement or ones complement math?  Hey, the standards allow decimal
> machines, so bits do not exist at all!!  What about NULLs?  To be an
> SQL datatype, you have to have NULLs, so what is a NULL bit?  By
> definition a bit, is on or off and has no NULL.
>
> What does the implementation of the host languages do with bits?  Did
> you know that +1, +0, -0 and -1 are all used for BOOLEANs, but not
> consistently (look at C# and VB from the same vendor)?  That means
> <i>all<i> the host languages -- present, future and not-yet-defined.
> Surely, no good programmer would ever write non-portable code by
> getting to such a low level as bit fiddling!!
>
> There are two situations in practice.  Either the bits are individual
> attributes or they are used as a vector to represent a single
> attribute.  In the case of a single attribute, the encoding is limited
> to two values, which do not port to host languages or other SQLs,
> cannot be easily understood by an end user, and which cannot be
> expanded.  Use CHAR(1) which will move.
>
> In the second case what some Newbies, who are still thinking in terms
> of second and third generation programming languages or even punch
> cards, do is build a vector for a series of "yes/no" status codes,
> failing to see the status vector as a single attribute.  Did you ever
> play the children's game "20 Questions" when you were young?  Bingo!!
>
> Imagine you have six components for a loan approval, so you allocate
> bits in your second generation model of the world. You have 64 possible
> vectors, but only 5 of them are valid (i.e. you cannot be rejected for
> bankruptcy and still have good credit).  For your data integrity, you
> can:
>
> 1) Ignore the problem.  This is actually what <i>most<i> newbies do.
>
> 2) Write elaborate CHECK() constraints with user defined functions or
> proprietary bit level library functions that cannot port and that run
> like cold glue.
>
> Now we add a 7-th condition to the vector -- which end does it go on?
> Why?  How did you get it in the right place on all the possible
> hardware that it will ever use?  Did all the code that references a bit
> in a word by its position do it right after the change?
>
> You need to sit down and think about how to design an encoding of the
> data that is high level, general enough to expand, abstract and
> portable.  For example, is that loan approval a hierarchical code?
> concatenation code? vector code? etc?  Did you provide codes for
> unknown, missing and N/A values? It is not easy to design such things!
>
> Get a copy of SQL PROGRAMMING STYLE and look at the chapters on design
> encoding schemes.
>
Author
7 Feb 2006 1:03 AM
--CELKO--
>> An excellent post Celko.  I wish more of your posts were like this. <<

They are!  That was one of my standard "cut & paste" replies.

Lisa has not done her homework, I think she can be saved. She has not
panicked when I hit her with my "Zen Stick", so she gets longer
answers.  And if she gives me something useful for a book, she gets
free copy.
Author
6 Feb 2006 9:38 PM
Lisa Pearlson
Not sure if this was directed at me, but I said I'm tempted to use SET
instead of BIT, which seems more common.
However, I meant ENUM instead of SET, and I guess neither even exists in
MS-SQL unlike MySQL.

So the recommendation is to cryptically encode the meaning of a value into a
CHAR(1). That's very dissapointing for an RDBMS I think.

So if I have a field that can only have the values 'black' or 'blue', I have
to make the field CHAR(1) and use 'b' for one and some other character for
the other color, and leave people guessing what they stand for.
And then to enforce that only these 2 colors can be entered and no other
character, I have to add check constraints?

Seems like an ENUM data type like in MySQL would've been a much suitable
solution.

Lisa


Show quote
"--CELKO--" <jcelko***@earthlink.net> wrote in message
news:1139090746.005907.99750@g14g2000cwa.googlegroups.com...
> Stop writing code as if you were an assembly language programmer in
> 1957.
>
> Machine level things like a BIT or BYTE datatype have no place in a
> high level language like SQL.  SQL is a high level language; it is
> abstract and defined without regard to PHYSICAL implementation.  This
> basic principle of data modeling is called data abstraction.
>
> Bits and Bytes are the <i>lowest<i> units of hardware-specific,
> physical implementation you can get.  Are you on a high-end or low-end
> machine?  Does the machine have 8, 16, 32, 64, or 128 bit words? Twos
> complement or ones complement math?  Hey, the standards allow decimal
> machines, so bits do not exist at all!!  What about NULLs?  To be an
> SQL datatype, you have to have NULLs, so what is a NULL bit?  By
> definition a bit, is on or off and has no NULL.
>
> What does the implementation of the host languages do with bits?  Did
> you know that +1, +0, -0 and -1 are all used for BOOLEANs, but not
> consistently (look at C# and VB from the same vendor)?  That means
> <i>all<i> the host languages -- present, future and not-yet-defined.
> Surely, no good programmer would ever write non-portable code by
> getting to such a low level as bit fiddling!!
>
> There are two situations in practice.  Either the bits are individual
> attributes or they are used as a vector to represent a single
> attribute.  In the case of a single attribute, the encoding is limited
> to two values, which do not port to host languages or other SQLs,
> cannot be easily understood by an end user, and which cannot be
> expanded.  Use CHAR(1) which will move.
>
> In the second case what some Newbies, who are still thinking in terms
> of second and third generation programming languages or even punch
> cards, do is build a vector for a series of "yes/no" status codes,
> failing to see the status vector as a single attribute.  Did you ever
> play the children's game "20 Questions" when you were young?  Bingo!!
>
> Imagine you have six components for a loan approval, so you allocate
> bits in your second generation model of the world. You have 64 possible
> vectors, but only 5 of them are valid (i.e. you cannot be rejected for
> bankruptcy and still have good credit).  For your data integrity, you
> can:
>
> 1) Ignore the problem.  This is actually what <i>most<i> newbies do.
>
> 2) Write elaborate CHECK() constraints with user defined functions or
> proprietary bit level library functions that cannot port and that run
> like cold glue.
>
> Now we add a 7-th condition to the vector -- which end does it go on?
> Why?  How did you get it in the right place on all the possible
> hardware that it will ever use?  Did all the code that references a bit
> in a word by its position do it right after the change?
>
> You need to sit down and think about how to design an encoding of the
> data that is high level, general enough to expand, abstract and
> portable.  For example, is that loan approval a hierarchical code?
> concatenation code? vector code? etc?  Did you provide codes for
> unknown, missing and N/A values? It is not easy to design such things!
>
> Get a copy of SQL PROGRAMMING STYLE and look at the chapters on design
> encoding schemes.
>
Author
7 Feb 2006 12:51 AM
--CELKO--
>> So if I have a field [sic] that can only have the values 'black' or 'blue', I have to make the field CHAR(1) and use 'b' for one and some other character for the other color, and leave people guessing what they stand for.And then to enforce that only these 2 colors can be entered and no other character, I have to add check constraints? <<

Columns are not fields; here is one of many reason.

This is a bad example. Colors have a lot of standard encoding schemes
(Land Color number, Pantone system, RGB, etc.).  This is the DOMAIN
concept which is not part of a field model.

I would hold up the Standard like a Bible and make your column into
your industry standard system.  Now restrict it with a CHECK()
constraint for now.  When you allow "bloody red", "serious infection
green", etc.  just change the constraint to expand the domain.

>> Seems like an ENUM data type like in MySQL would've been a much suitable
solution. <<

No, just use a CHECK() in SQL. And never have a column with mixed data
types int he RM.
Author
7 Feb 2006 10:54 AM
Lisa Pearlson
Well, when I was thinking what the advantages are of an ENUM datatype,
efficient storage by reference (eg ENUM('red', 'blue')), I realized the
most obvious:

I create a table Colors, with these 2 colors, and then use foreign key
constraint in another table. I'm guessing in essence that's what an
ENUM does.

Same could be done with "Door states" with rows like "open", "closed",
"locked" etc.
Maybe it's a bit extra work, but it's better modelling than encoding
the meaning of a value in a single char with check constraints.

using CHAR(1) over BIT may be more portable, but in my opinion still
not good database design. That's still too much like machine code, but
instead of boolean, you use char.

Creating an extra table might be less 'practicle' but wouldn't you
agree it's more in agreement with good RDBMS modelling principles than
CHAR(1) ?

Lisa


--CELKO-- schreef:

Show quote
> >> So if I have a field [sic] that can only have the values 'black' or 'blue', I have to make the field CHAR(1) and use 'b' for one and some other character for the other color, and leave people guessing what they stand for.And then to enforce that only these 2 colors can be entered and no other character, I have to add check constraints? <<
>
> Columns are not fields; here is one of many reason.
>
> This is a bad example. Colors have a lot of standard encoding schemes
> (Land Color number, Pantone system, RGB, etc.).  This is the DOMAIN
> concept which is not part of a field model.
>
> I would hold up the Standard like a Bible and make your column into
> your industry standard system.  Now restrict it with a CHECK()
> constraint for now.  When you allow "bloody red", "serious infection
> green", etc.  just change the constraint to expand the domain.
>
> >> Seems like an ENUM data type like in MySQL would've been a much suitable
> solution. <<
>
> No, just use a CHECK() in SQL. And never have a column with mixed data
> types int he RM.
Author
7 Feb 2006 10:58 AM
Lisa Pearlson
practicle => practical

AddThis Social Bookmark Button