Home All Groups Group Topic Archive Search About

Table design question on LARGE table

Author
28 Dec 2005 8:50 PM
Michael
Hi Everyone,
I have several tables in a current DB app I'm working on that contain over
120 fields. I'm looking for suggestions to cut down the number of fields that
I need. I know that most fields will not be affected by my redesign, but I
currently have several fields that could have mulitple options. For example:
Attention   (field name)
Good
Variable
Other
Mood   (Field Name)
Normal
Hesitant
Tense
Pressured
Soft
ect
As you could see, the first field would normally only have one option
selected, but the next field could have more than one option selected. This
group of fields is one of the fields under redesign. I have about 15 of these
option fields. One of the design options I was looking into was possible
having a bit respresent each option for the fields. For example,
Mood   (Field Name)
Normal   (Checked)
Hesitant  (Not Checked)
Tense     (Not Checked)
Pressured  (Checked)
Soft        (Not Checked)
This would be stored in the database in the field Mood like so
10010
What do you think. What are some other good design options for something
like this. The reason I'm going about this is to try and conserve space and
time. If I break out all the options for the 15 fields I will end up with at
least 60-80 more fields. And this is alot when you add them to the existing
190 fields for this one table. I'm already hit a SQL Server limit on one
table and had to split the table up. And I'm trying to keep it simple but
effient. Thanks for any help or suggestions.
Michael

Author
28 Dec 2005 9:00 PM
David Portas
Michael wrote:
Show quote
> Hi Everyone,
> I have several tables in a current DB app I'm working on that contain over
> 120 fields. I'm looking for suggestions to cut down the number of fields that
> I need. I know that most fields will not be affected by my redesign, but I
> currently have several fields that could have mulitple options. For example:
> Attention   (field name)
> Good
> Variable
> Other
> Mood   (Field Name)
> Normal
> Hesitant
> Tense
> Pressured
> Soft
> ect
> As you could see, the first field would normally only have one option
> selected, but the next field could have more than one option selected. This
> group of fields is one of the fields under redesign. I have about 15 of these
> option fields. One of the design options I was looking into was possible
> having a bit respresent each option for the fields. For example,
> Mood   (Field Name)
> Normal   (Checked)
> Hesitant  (Not Checked)
> Tense     (Not Checked)
> Pressured  (Checked)
> Soft        (Not Checked)
> This would be stored in the database in the field Mood like so
> 10010
> What do you think. What are some other good design options for something
> like this. The reason I'm going about this is to try and conserve space and
> time. If I break out all the options for the 15 fields I will end up with at
> least 60-80 more fields. And this is alot when you add them to the existing
> 190 fields for this one table. I'm already hit a SQL Server limit on one
> table and had to split the table up. And I'm trying to keep it simple but
> effient. Thanks for any help or suggestions.
> Michael

Quote: "the next field could have more than one option selected". Do
you mean you have more than one value in this single column? In that
case I strongly recommend you decompose that data into a separate
table. I can't give much more advice just on the basis of what you
posted. It might have been easier if you'd just posted a CREATE TABLE
statement.

Are you familiar with database design principles such as normalization
and the normal forms? Those rules should be your guide and you can find
them in any textbook. Bit-mapping columns is one of the worst possible
design mistakes but if you don't know the basics of design this isn't
really the place to learn.

--
David Portas
SQL Server MVP
--
Author
28 Dec 2005 11:21 PM
Chris2
"Michael" <Mich***@discussions.microsoft.com> wrote in message
news:2713CDA2-FC62-438D-8754-0299A7666105@microsoft.com...
> Hi Everyone,
> I have several tables in a current DB app I'm working on that
contain over
> 120 fields. I'm looking for suggestions to cut down the number of
fields that
> I need. I know that most fields will not be affected by my
redesign, but I
> currently have several fields that could have mulitple options.
For example:
> Attention   (field name)
> Good
> Variable
> Other
> Mood   (Field Name)
> Normal
> Hesitant
> Tense
> Pressured
> Soft
> ect

Michael,

It looks like you need to move information about "Mood" and
"Attention" into separate tables.  Perhaps:

Note: I made *many* assumptions about tables and column names.  They
are just examples.

CREATE TABLE Patients
(PatientID          INTEGER
,FName              VARCHAR(72)
,LName              VARCHAR(72)
,CONSTRAINT pk_Patients
PRIMARY KEY (PatientID)
)

CREATE TABLE MoodTypes
(MoodTypeID         INTEGER
,MoodName           VARCHAR(48)
,MoodDescription    VARCHAR(8000)
,CONSTRAINT pk_MoodTypes
PRIMARY KEY (MoodTypeID)
)

CREATE TABLE Moods
(PatientID          INTEGER
,MoodTypeID         INTEGER
,MoodDate           INTEGER
,CONSTRAINT pk_Moods
PRIMARY KEY (PatientID
             ,MoodTypeID
             ,MoodDate)
,CONSTRAINT fk_Moods_Patients_PatientID
FOREIGN KEY (PatientID)
REFERENCES Patients (PatientID)
,CONSTRAINT fk_Moods_MoodTypes_MoodTypeID
FOREIGN KEY (MoodTypeID)
REFERENCES MoodTypes (MoodTypeID)
)

CREATE TABLE AttentionTypes
(AttentionTypeID    INTEGER
,AttentionName      VARCHAR(48)
,AttentionDesc      VARCHAR(8000)
,CONSTRAINT pk_AttentionTypes
PRIMARY KEY (AttentionTypeID)
)

CREATE TABLE Attention
(PatientID          INTEGER
,AttentionTypeID    INTEGER
,AttentionDate      DATETIME
,CONSTRAINT pk_Attention
PRIMARY KEY (PatientID
             ,AttentionTypeID
             ,AttentionDate)
,CONSTRAINT fk_Attention_Patients_PatientID
FOREIGN KEY (PatientID)
REFERENCES Patients (PatientID)
,CONSTRAINT fk_Attention_AttentionTypes_AttentionTypeID
FOREIGN KEY (AttentionTypeID)
REFERENCES AttentionTypes (AttentionTypeID)
)

/*
DROP TABLE Attention
DROP TABLE AttentionTypes
DROP TABLE Moods
DROP TABLE MoodTypes
DROP TABLE Patients
*/


> As you could see, the first field would normally only have one
option
> selected, but the next field could have more than one option
selected.

This issue is solved by the above schema.


> This
> group of fields is one of the fields under redesign. I have about
15 of these
> option fields. One of the design options I was looking into was
possible
> having a bit respresent each option for the fields.

If at all possible, this choice should be avoided.  It will cause
many problems (especially with attempts to use "aggregation" like
SUM, COUNT, etc.).


> For example,
> Mood   (Field Name)
> Normal   (Checked)
> Hesitant  (Not Checked)
> Tense     (Not Checked)
> Pressured  (Checked)
> Soft        (Not Checked)
> This would be stored in the database in the field Mood like so
> 10010
> What do you think.

This would be storing multiple values in one column, and collapsing
multiple meanings into one column.  This goes against "1st Normal
Form" (see the links below), and will cause many problems.

Also, are you really planning on using bitwise operators to
manipulate the ones and zeroes of a byte in a column?  Extracting
information from this will be very difficult (in comparison to doing
so with the example schema above).


> What are some other good design options for something
> like this.

See above.


> The reason I'm going about this is to try and conserve space and
> time.

Processing speed?  Any CPU clock cycles saved from manipulating a
smaller data value will be more than lost by the all the hoops that
will be jumped through to extract the data.

You will conserve storage space, no doubt.  However, unless you are
programming for thin clients, PDAs, or low-bandwith wireless
connections, I cannot conceive of a reason to do so.  Gigabytes of
storage are fairly cheap at the moment, and proceeding to fall as
the years go by.  (Soon enough our hard drives will be rated in
Terrabytes . . .)


> If I break out all the options for the 15 fields I will end up
with at
> least 60-80 more fields. And this is alot when you add them to the
existing
> 190 fields for this one table. I'm already hit a SQL Server limit
on one
> table and had to split the table up. And I'm trying to keep it
simple but
> effient. Thanks for any help or suggestions.
> Michael
>

You are welcome.  I have also included some links on Database
Normalization below.


Database Normalization:

---------------------------

Googling on Database Normalization will provide many good articles.
Most of them are written for other database professionals (the
actual written rules of Database Normalization will throw you for a
loop if you don't know exactly what they are talking about).

Database Normalization is a proces of decomposition, of breaking
down information until you get down the basic elements that describe
entities (the people, places, or things in your business), and
making sure that the information kept about an entity (i.e. "table")
*really* belongs to it, and not to something else (or somewhere
else).

---------------------------

Database Normalization:

Basics:

About.com
http://databases.about.com/od/specificproducts/a/normalization.htm


Intermediate:

MySQL's website:
http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html


Advanced:

University of Texas:
(I quite like this whole site, since it has a handy menu on the
right
describing many important aspects of database normalization and
modeling.)
http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html


Sincerely,

Chris O.
Author
29 Dec 2005 1:26 PM
Michael
Hi Guys,
Thanks for the help and suggestions. I think I will go about it in the ways
you all where suggesting. I think using a lookup table for the value may work
best. Thanks again, now I have to get back to this layout. Happy Holidays.
Michael


Show quote
"Chris2" wrote:

>
> "Michael" <Mich***@discussions.microsoft.com> wrote in message
> news:2713CDA2-FC62-438D-8754-0299A7666105@microsoft.com...
> > Hi Everyone,
> > I have several tables in a current DB app I'm working on that
> contain over
> > 120 fields. I'm looking for suggestions to cut down the number of
> fields that
> > I need. I know that most fields will not be affected by my
> redesign, but I
> > currently have several fields that could have mulitple options.
> For example:
> > Attention   (field name)
> > Good
> > Variable
> > Other
> > Mood   (Field Name)
> > Normal
> > Hesitant
> > Tense
> > Pressured
> > Soft
> > ect
>
> Michael,
>
> It looks like you need to move information about "Mood" and
> "Attention" into separate tables.  Perhaps:
>
> Note: I made *many* assumptions about tables and column names.  They
> are just examples.
>
> CREATE TABLE Patients
> (PatientID          INTEGER
> ,FName              VARCHAR(72)
> ,LName              VARCHAR(72)
> ,CONSTRAINT pk_Patients
>  PRIMARY KEY (PatientID)
> )
>
> CREATE TABLE MoodTypes
> (MoodTypeID         INTEGER
> ,MoodName           VARCHAR(48)
> ,MoodDescription    VARCHAR(8000)
> ,CONSTRAINT pk_MoodTypes
>  PRIMARY KEY (MoodTypeID)
> )
>
> CREATE TABLE Moods
> (PatientID          INTEGER
> ,MoodTypeID         INTEGER
> ,MoodDate           INTEGER
> ,CONSTRAINT pk_Moods
>  PRIMARY KEY (PatientID
>              ,MoodTypeID
>              ,MoodDate)
> ,CONSTRAINT fk_Moods_Patients_PatientID
>  FOREIGN KEY (PatientID)
>  REFERENCES Patients (PatientID)
> ,CONSTRAINT fk_Moods_MoodTypes_MoodTypeID
>  FOREIGN KEY (MoodTypeID)
>  REFERENCES MoodTypes (MoodTypeID)
> )
>
> CREATE TABLE AttentionTypes
> (AttentionTypeID    INTEGER
> ,AttentionName      VARCHAR(48)
> ,AttentionDesc      VARCHAR(8000)
> ,CONSTRAINT pk_AttentionTypes
>  PRIMARY KEY (AttentionTypeID)
> )
>
> CREATE TABLE Attention
> (PatientID          INTEGER
> ,AttentionTypeID    INTEGER
> ,AttentionDate      DATETIME
> ,CONSTRAINT pk_Attention
>  PRIMARY KEY (PatientID
>              ,AttentionTypeID
>              ,AttentionDate)
> ,CONSTRAINT fk_Attention_Patients_PatientID
>  FOREIGN KEY (PatientID)
>  REFERENCES Patients (PatientID)
> ,CONSTRAINT fk_Attention_AttentionTypes_AttentionTypeID
>  FOREIGN KEY (AttentionTypeID)
>  REFERENCES AttentionTypes (AttentionTypeID)
> )
>
> /*
> DROP TABLE Attention
> DROP TABLE AttentionTypes
> DROP TABLE Moods
> DROP TABLE MoodTypes
> DROP TABLE Patients
> */
>
>
> > As you could see, the first field would normally only have one
> option
> > selected, but the next field could have more than one option
> selected.
>
> This issue is solved by the above schema.
>
>
> > This
> > group of fields is one of the fields under redesign. I have about
> 15 of these
> > option fields. One of the design options I was looking into was
> possible
> > having a bit respresent each option for the fields.
>
> If at all possible, this choice should be avoided.  It will cause
> many problems (especially with attempts to use "aggregation" like
> SUM, COUNT, etc.).
>
>
> > For example,
> > Mood   (Field Name)
> > Normal   (Checked)
> > Hesitant  (Not Checked)
> > Tense     (Not Checked)
> > Pressured  (Checked)
> > Soft        (Not Checked)
> > This would be stored in the database in the field Mood like so
> > 10010
> > What do you think.
>
> This would be storing multiple values in one column, and collapsing
> multiple meanings into one column.  This goes against "1st Normal
> Form" (see the links below), and will cause many problems.
>
> Also, are you really planning on using bitwise operators to
> manipulate the ones and zeroes of a byte in a column?  Extracting
> information from this will be very difficult (in comparison to doing
> so with the example schema above).
>
>
> > What are some other good design options for something
> > like this.
>
> See above.
>
>
> > The reason I'm going about this is to try and conserve space and
> > time.
>
> Processing speed?  Any CPU clock cycles saved from manipulating a
> smaller data value will be more than lost by the all the hoops that
> will be jumped through to extract the data.
>
> You will conserve storage space, no doubt.  However, unless you are
> programming for thin clients, PDAs, or low-bandwith wireless
> connections, I cannot conceive of a reason to do so.  Gigabytes of
> storage are fairly cheap at the moment, and proceeding to fall as
> the years go by.  (Soon enough our hard drives will be rated in
> Terrabytes . . .)
>
>
> > If I break out all the options for the 15 fields I will end up
> with at
> > least 60-80 more fields. And this is alot when you add them to the
> existing
> > 190 fields for this one table. I'm already hit a SQL Server limit
> on one
> > table and had to split the table up. And I'm trying to keep it
> simple but
> > effient. Thanks for any help or suggestions.
> > Michael
> >
>
> You are welcome.  I have also included some links on Database
> Normalization below.
>
>
> Database Normalization:
>
> ---------------------------
>
> Googling on Database Normalization will provide many good articles.
> Most of them are written for other database professionals (the
> actual written rules of Database Normalization will throw you for a
> loop if you don't know exactly what they are talking about).
>
> Database Normalization is a proces of decomposition, of breaking
> down information until you get down the basic elements that describe
> entities (the people, places, or things in your business), and
> making sure that the information kept about an entity (i.e. "table")
> *really* belongs to it, and not to something else (or somewhere
> else).
>
> ---------------------------
>
> Database Normalization:
>
> Basics:
>
> About.com
> http://databases.about.com/od/specificproducts/a/normalization.htm
>
>
> Intermediate:
>
> MySQL's website:
> http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html
>
>
> Advanced:
>
> University of Texas:
> (I quite like this whole site, since it has a handy menu on the
> right
> describing many important aspects of database normalization and
> modeling.)
> http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html
>
>
> Sincerely,
>
> Chris O.
>
>
>
Author
31 Dec 2005 8:32 PM
Joe from WI
If you doing a new design, normalization is the way to go.

I would also highly recommend the use of surrogate keys (for example,
PatientPK int identity primary key clustered).  When referenced in other
tables, PatientFK.  Why?  It becomes self-documenting as you can clearly see
what the primary key is.  And in related tables, what the foreign keys are. 
I've scene other suffixes used such as "RowId", "Key", etc.

Whatever you do, avoid the use of "ID" on keys.  If you use surrogate keys,
what do you do when you have PatientID the surrogate and PatientID the
identification number assigned by an outside person or process?  Some would
argue that this identification number is unique and should be the primary
key.  I couldn't disagree more.  Why?  Because that identification number
will change.  It ALWAYS does.  Sooner or later, numbers get reassigned due to
merger, new software, new business rules, etc.  Then, you have a huge mess on
your hands.  When you are relating data, ALWAYS, ALWAYS, ALWAYS use your own
key values.

Just my two cents,
Joe
Author
2 Jan 2006 3:18 AM
--CELKO--
You will need to research the proper sizes and codes for all of these
attributes. The post office uses a CHAR(35) line for a name on an
address label.  There are standard codes for shrinks to use, so you
have those for all of this, right?  Etc. You make up your own codes
only when research proves that you need to -- never start without the
reserch.

>> One of the design options I was looking into was possible having a bit respresent each option for the fields [sic]. <<

Dear God, NO!  You have never some much as looked at a book on
RDBMS,have you?  When you stop calling columns and attributes fields,
you will begin to get the concepts right. Names with postfixes like
"type_id" is obviously absurd -- how can an attribute be both an
identifer for an entity and a value of an attribute??  Again, this is
basic data modeling.

What is the unit of work?  Observations about patients!  Each set of
observations occurs in time, so we need a timestamp on it. Then we have
all those attributes we record in that observation.


CREATE TABLE PatientObservations
(patient_nbr INTEGER NOT NULL
report_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
PRIMARY KEY (patient_nbr, report_date),
last_name VARCHAR(20) NOT NULL,
first_name VARCHAR(20) NOT NULL,
mood_code INTEGER DEFAULT 0 NOT NULL -- default to unknown code
    REFERENCES Moods(mood_code)
    ON UPDATE CASCADE
    ON DELETE CASCADE,
attention_code INTEGER NOT NULL
    REFERENCES Attention(attention_code)
    ON UPDATE CASCADE
    ON DELETE CASCADE,
etc.);
Author
2 Jan 2006 5:35 PM
Chris2
"--CELKO--" <jcelko***@earthlink.net> wrote in message
news:1136171911.126387.23080@f14g2000cwb.googlegroups.com...

<snip>

Show quote
>
> CREATE TABLE PatientObservations
> (patient_nbr INTEGER NOT NULL
>  report_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
>  PRIMARY KEY (patient_nbr, report_date),
>  last_name VARCHAR(20) NOT NULL,
>  first_name VARCHAR(20) NOT NULL,
>  mood_code INTEGER DEFAULT 0 NOT NULL -- default to unknown code
>     REFERENCES Moods(mood_code)
>     ON UPDATE CASCADE
>     ON DELETE CASCADE,
>  attention_code INTEGER NOT NULL
>     REFERENCES Attention(attention_code)
>     ON UPDATE CASCADE
>     ON DELETE CASCADE,
>  etc.);
>

--CELKO--,

Michael's original requirements were for one attention code per
observation, and multiple mood codes per observation.  This is not
fulfilled by the above example.

Keeping a patient's name elements in the above table does not seem
like an optimum choice.  That data belongs in the Patients table.


Sincerely,

Chris O.
Author
3 Jan 2006 1:29 AM
--CELKO--
>> Keeping a patient's name elements in the above table does not seem like an optimum choice.  That data belongs in the Patients table. <<

Agreed! My Bad!

AddThis Social Bookmark Button