|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Table design question on LARGE tableI have several tables in a current DB app I'm working on that contain over 120 fields. I'm looking for suggestions to cut down the number of fields that I need. I know that most fields will not be affected by my redesign, but I currently have several fields that could have mulitple options. For example: Attention (field name) Good Variable Other Mood (Field Name) Normal Hesitant Tense Pressured Soft ect As you could see, the first field would normally only have one option selected, but the next field could have more than one option selected. This group of fields is one of the fields under redesign. I have about 15 of these option fields. One of the design options I was looking into was possible having a bit respresent each option for the fields. For example, Mood (Field Name) Normal (Checked) Hesitant (Not Checked) Tense (Not Checked) Pressured (Checked) Soft (Not Checked) This would be stored in the database in the field Mood like so 10010 What do you think. What are some other good design options for something like this. The reason I'm going about this is to try and conserve space and time. If I break out all the options for the 15 fields I will end up with at least 60-80 more fields. And this is alot when you add them to the existing 190 fields for this one table. I'm already hit a SQL Server limit on one table and had to split the table up. And I'm trying to keep it simple but effient. Thanks for any help or suggestions. Michael Michael wrote:
Show quote > Hi Everyone, Quote: "the next field could have more than one option selected". Do> I have several tables in a current DB app I'm working on that contain over > 120 fields. I'm looking for suggestions to cut down the number of fields that > I need. I know that most fields will not be affected by my redesign, but I > currently have several fields that could have mulitple options. For example: > Attention (field name) > Good > Variable > Other > Mood (Field Name) > Normal > Hesitant > Tense > Pressured > Soft > ect > As you could see, the first field would normally only have one option > selected, but the next field could have more than one option selected. This > group of fields is one of the fields under redesign. I have about 15 of these > option fields. One of the design options I was looking into was possible > having a bit respresent each option for the fields. For example, > Mood (Field Name) > Normal (Checked) > Hesitant (Not Checked) > Tense (Not Checked) > Pressured (Checked) > Soft (Not Checked) > This would be stored in the database in the field Mood like so > 10010 > What do you think. What are some other good design options for something > like this. The reason I'm going about this is to try and conserve space and > time. If I break out all the options for the 15 fields I will end up with at > least 60-80 more fields. And this is alot when you add them to the existing > 190 fields for this one table. I'm already hit a SQL Server limit on one > table and had to split the table up. And I'm trying to keep it simple but > effient. Thanks for any help or suggestions. > Michael you mean you have more than one value in this single column? In that case I strongly recommend you decompose that data into a separate table. I can't give much more advice just on the basis of what you posted. It might have been easier if you'd just posted a CREATE TABLE statement. Are you familiar with database design principles such as normalization and the normal forms? Those rules should be your guide and you can find them in any textbook. Bit-mapping columns is one of the worst possible design mistakes but if you don't know the basics of design this isn't really the place to learn. -- David Portas SQL Server MVP -- "Michael" <Mich***@discussions.microsoft.com> wrote in message redesign, but Inews:2713CDA2-FC62-438D-8754-0299A7666105@microsoft.com... > Hi Everyone, > I have several tables in a current DB app I'm working on that contain over > 120 fields. I'm looking for suggestions to cut down the number of fields that > I need. I know that most fields will not be affected by my > currently have several fields that could have mulitple options. Michael,For example: > Attention (field name) > Good > Variable > Other > Mood (Field Name) > Normal > Hesitant > Tense > Pressured > Soft > ect It looks like you need to move information about "Mood" and "Attention" into separate tables. Perhaps: Note: I made *many* assumptions about tables and column names. They are just examples. CREATE TABLE Patients (PatientID INTEGER ,FName VARCHAR(72) ,LName VARCHAR(72) ,CONSTRAINT pk_Patients PRIMARY KEY (PatientID) ) CREATE TABLE MoodTypes (MoodTypeID INTEGER ,MoodName VARCHAR(48) ,MoodDescription VARCHAR(8000) ,CONSTRAINT pk_MoodTypes PRIMARY KEY (MoodTypeID) ) CREATE TABLE Moods (PatientID INTEGER ,MoodTypeID INTEGER ,MoodDate INTEGER ,CONSTRAINT pk_Moods PRIMARY KEY (PatientID ,MoodTypeID ,MoodDate) ,CONSTRAINT fk_Moods_Patients_PatientID FOREIGN KEY (PatientID) REFERENCES Patients (PatientID) ,CONSTRAINT fk_Moods_MoodTypes_MoodTypeID FOREIGN KEY (MoodTypeID) REFERENCES MoodTypes (MoodTypeID) ) CREATE TABLE AttentionTypes (AttentionTypeID INTEGER ,AttentionName VARCHAR(48) ,AttentionDesc VARCHAR(8000) ,CONSTRAINT pk_AttentionTypes PRIMARY KEY (AttentionTypeID) ) CREATE TABLE Attention (PatientID INTEGER ,AttentionTypeID INTEGER ,AttentionDate DATETIME ,CONSTRAINT pk_Attention PRIMARY KEY (PatientID ,AttentionTypeID ,AttentionDate) ,CONSTRAINT fk_Attention_Patients_PatientID FOREIGN KEY (PatientID) REFERENCES Patients (PatientID) ,CONSTRAINT fk_Attention_AttentionTypes_AttentionTypeID FOREIGN KEY (AttentionTypeID) REFERENCES AttentionTypes (AttentionTypeID) ) /* DROP TABLE Attention DROP TABLE AttentionTypes DROP TABLE Moods DROP TABLE MoodTypes DROP TABLE Patients */ > As you could see, the first field would normally only have one This issue is solved by the above schema.option > selected, but the next field could have more than one option selected. > This 15 of these> group of fields is one of the fields under redesign. I have about > option fields. One of the design options I was looking into was If at all possible, this choice should be avoided. It will causepossible > having a bit respresent each option for the fields. many problems (especially with attempts to use "aggregation" like SUM, COUNT, etc.). > For example, This would be storing multiple values in one column, and collapsing> Mood (Field Name) > Normal (Checked) > Hesitant (Not Checked) > Tense (Not Checked) > Pressured (Checked) > Soft (Not Checked) > This would be stored in the database in the field Mood like so > 10010 > What do you think. multiple meanings into one column. This goes against "1st Normal Form" (see the links below), and will cause many problems. Also, are you really planning on using bitwise operators to manipulate the ones and zeroes of a byte in a column? Extracting information from this will be very difficult (in comparison to doing so with the example schema above). > What are some other good design options for something See above.> like this. > The reason I'm going about this is to try and conserve space and Processing speed? Any CPU clock cycles saved from manipulating a> time. smaller data value will be more than lost by the all the hoops that will be jumped through to extract the data. You will conserve storage space, no doubt. However, unless you are programming for thin clients, PDAs, or low-bandwith wireless connections, I cannot conceive of a reason to do so. Gigabytes of storage are fairly cheap at the moment, and proceeding to fall as the years go by. (Soon enough our hard drives will be rated in Terrabytes . . .) > If I break out all the options for the 15 fields I will end up You are welcome. I have also included some links on Databasewith at > least 60-80 more fields. And this is alot when you add them to the existing > 190 fields for this one table. I'm already hit a SQL Server limit on one > table and had to split the table up. And I'm trying to keep it simple but > effient. Thanks for any help or suggestions. > Michael > Normalization below. Database Normalization: --------------------------- Googling on Database Normalization will provide many good articles. Most of them are written for other database professionals (the actual written rules of Database Normalization will throw you for a loop if you don't know exactly what they are talking about). Database Normalization is a proces of decomposition, of breaking down information until you get down the basic elements that describe entities (the people, places, or things in your business), and making sure that the information kept about an entity (i.e. "table") *really* belongs to it, and not to something else (or somewhere else). --------------------------- Database Normalization: Basics: About.com http://databases.about.com/od/specificproducts/a/normalization.htm Intermediate: MySQL's website: http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html Advanced: University of Texas: (I quite like this whole site, since it has a handy menu on the right describing many important aspects of database normalization and modeling.) http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html Sincerely, Chris O. Hi Guys,
Thanks for the help and suggestions. I think I will go about it in the ways you all where suggesting. I think using a lookup table for the value may work best. Thanks again, now I have to get back to this layout. Happy Holidays. Michael Show quote "Chris2" wrote: > > "Michael" <Mich***@discussions.microsoft.com> wrote in message > news:2713CDA2-FC62-438D-8754-0299A7666105@microsoft.com... > > Hi Everyone, > > I have several tables in a current DB app I'm working on that > contain over > > 120 fields. I'm looking for suggestions to cut down the number of > fields that > > I need. I know that most fields will not be affected by my > redesign, but I > > currently have several fields that could have mulitple options. > For example: > > Attention (field name) > > Good > > Variable > > Other > > Mood (Field Name) > > Normal > > Hesitant > > Tense > > Pressured > > Soft > > ect > > Michael, > > It looks like you need to move information about "Mood" and > "Attention" into separate tables. Perhaps: > > Note: I made *many* assumptions about tables and column names. They > are just examples. > > CREATE TABLE Patients > (PatientID INTEGER > ,FName VARCHAR(72) > ,LName VARCHAR(72) > ,CONSTRAINT pk_Patients > PRIMARY KEY (PatientID) > ) > > CREATE TABLE MoodTypes > (MoodTypeID INTEGER > ,MoodName VARCHAR(48) > ,MoodDescription VARCHAR(8000) > ,CONSTRAINT pk_MoodTypes > PRIMARY KEY (MoodTypeID) > ) > > CREATE TABLE Moods > (PatientID INTEGER > ,MoodTypeID INTEGER > ,MoodDate INTEGER > ,CONSTRAINT pk_Moods > PRIMARY KEY (PatientID > ,MoodTypeID > ,MoodDate) > ,CONSTRAINT fk_Moods_Patients_PatientID > FOREIGN KEY (PatientID) > REFERENCES Patients (PatientID) > ,CONSTRAINT fk_Moods_MoodTypes_MoodTypeID > FOREIGN KEY (MoodTypeID) > REFERENCES MoodTypes (MoodTypeID) > ) > > CREATE TABLE AttentionTypes > (AttentionTypeID INTEGER > ,AttentionName VARCHAR(48) > ,AttentionDesc VARCHAR(8000) > ,CONSTRAINT pk_AttentionTypes > PRIMARY KEY (AttentionTypeID) > ) > > CREATE TABLE Attention > (PatientID INTEGER > ,AttentionTypeID INTEGER > ,AttentionDate DATETIME > ,CONSTRAINT pk_Attention > PRIMARY KEY (PatientID > ,AttentionTypeID > ,AttentionDate) > ,CONSTRAINT fk_Attention_Patients_PatientID > FOREIGN KEY (PatientID) > REFERENCES Patients (PatientID) > ,CONSTRAINT fk_Attention_AttentionTypes_AttentionTypeID > FOREIGN KEY (AttentionTypeID) > REFERENCES AttentionTypes (AttentionTypeID) > ) > > /* > DROP TABLE Attention > DROP TABLE AttentionTypes > DROP TABLE Moods > DROP TABLE MoodTypes > DROP TABLE Patients > */ > > > > As you could see, the first field would normally only have one > option > > selected, but the next field could have more than one option > selected. > > This issue is solved by the above schema. > > > > This > > group of fields is one of the fields under redesign. I have about > 15 of these > > option fields. One of the design options I was looking into was > possible > > having a bit respresent each option for the fields. > > If at all possible, this choice should be avoided. It will cause > many problems (especially with attempts to use "aggregation" like > SUM, COUNT, etc.). > > > > For example, > > Mood (Field Name) > > Normal (Checked) > > Hesitant (Not Checked) > > Tense (Not Checked) > > Pressured (Checked) > > Soft (Not Checked) > > This would be stored in the database in the field Mood like so > > 10010 > > What do you think. > > This would be storing multiple values in one column, and collapsing > multiple meanings into one column. This goes against "1st Normal > Form" (see the links below), and will cause many problems. > > Also, are you really planning on using bitwise operators to > manipulate the ones and zeroes of a byte in a column? Extracting > information from this will be very difficult (in comparison to doing > so with the example schema above). > > > > What are some other good design options for something > > like this. > > See above. > > > > The reason I'm going about this is to try and conserve space and > > time. > > Processing speed? Any CPU clock cycles saved from manipulating a > smaller data value will be more than lost by the all the hoops that > will be jumped through to extract the data. > > You will conserve storage space, no doubt. However, unless you are > programming for thin clients, PDAs, or low-bandwith wireless > connections, I cannot conceive of a reason to do so. Gigabytes of > storage are fairly cheap at the moment, and proceeding to fall as > the years go by. (Soon enough our hard drives will be rated in > Terrabytes . . .) > > > > If I break out all the options for the 15 fields I will end up > with at > > least 60-80 more fields. And this is alot when you add them to the > existing > > 190 fields for this one table. I'm already hit a SQL Server limit > on one > > table and had to split the table up. And I'm trying to keep it > simple but > > effient. Thanks for any help or suggestions. > > Michael > > > > You are welcome. I have also included some links on Database > Normalization below. > > > Database Normalization: > > --------------------------- > > Googling on Database Normalization will provide many good articles. > Most of them are written for other database professionals (the > actual written rules of Database Normalization will throw you for a > loop if you don't know exactly what they are talking about). > > Database Normalization is a proces of decomposition, of breaking > down information until you get down the basic elements that describe > entities (the people, places, or things in your business), and > making sure that the information kept about an entity (i.e. "table") > *really* belongs to it, and not to something else (or somewhere > else). > > --------------------------- > > Database Normalization: > > Basics: > > About.com > http://databases.about.com/od/specificproducts/a/normalization.htm > > > Intermediate: > > MySQL's website: > http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html > > > Advanced: > > University of Texas: > (I quite like this whole site, since it has a handy menu on the > right > describing many important aspects of database normalization and > modeling.) > http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html > > > Sincerely, > > Chris O. > > > If you doing a new design, normalization is the way to go.
I would also highly recommend the use of surrogate keys (for example, PatientPK int identity primary key clustered). When referenced in other tables, PatientFK. Why? It becomes self-documenting as you can clearly see what the primary key is. And in related tables, what the foreign keys are. I've scene other suffixes used such as "RowId", "Key", etc. Whatever you do, avoid the use of "ID" on keys. If you use surrogate keys, what do you do when you have PatientID the surrogate and PatientID the identification number assigned by an outside person or process? Some would argue that this identification number is unique and should be the primary key. I couldn't disagree more. Why? Because that identification number will change. It ALWAYS does. Sooner or later, numbers get reassigned due to merger, new software, new business rules, etc. Then, you have a huge mess on your hands. When you are relating data, ALWAYS, ALWAYS, ALWAYS use your own key values. Just my two cents, Joe You will need to research the proper sizes and codes for all of these
attributes. The post office uses a CHAR(35) line for a name on an address label. There are standard codes for shrinks to use, so you have those for all of this, right? Etc. You make up your own codes only when research proves that you need to -- never start without the reserch. >> One of the design options I was looking into was possible having a bit respresent each option for the fields [sic]. << Dear God, NO! You have never some much as looked at a book onRDBMS,have you? When you stop calling columns and attributes fields, you will begin to get the concepts right. Names with postfixes like "type_id" is obviously absurd -- how can an attribute be both an identifer for an entity and a value of an attribute?? Again, this is basic data modeling. What is the unit of work? Observations about patients! Each set of observations occurs in time, so we need a timestamp on it. Then we have all those attributes we record in that observation. CREATE TABLE PatientObservations (patient_nbr INTEGER NOT NULL report_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL, PRIMARY KEY (patient_nbr, report_date), last_name VARCHAR(20) NOT NULL, first_name VARCHAR(20) NOT NULL, mood_code INTEGER DEFAULT 0 NOT NULL -- default to unknown code REFERENCES Moods(mood_code) ON UPDATE CASCADE ON DELETE CASCADE, attention_code INTEGER NOT NULL REFERENCES Attention(attention_code) ON UPDATE CASCADE ON DELETE CASCADE, etc.); "--CELKO--" <jcelko***@earthlink.net> wrote in message <snip>news:1136171911.126387.23080@f14g2000cwb.googlegroups.com... Show quote > --CELKO--,> CREATE TABLE PatientObservations > (patient_nbr INTEGER NOT NULL > report_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL, > PRIMARY KEY (patient_nbr, report_date), > last_name VARCHAR(20) NOT NULL, > first_name VARCHAR(20) NOT NULL, > mood_code INTEGER DEFAULT 0 NOT NULL -- default to unknown code > REFERENCES Moods(mood_code) > ON UPDATE CASCADE > ON DELETE CASCADE, > attention_code INTEGER NOT NULL > REFERENCES Attention(attention_code) > ON UPDATE CASCADE > ON DELETE CASCADE, > etc.); > Michael's original requirements were for one attention code per observation, and multiple mood codes per observation. This is not fulfilled by the above example. Keeping a patient's name elements in the above table does not seem like an optimum choice. That data belongs in the Patients table. Sincerely, Chris O. |
|||||||||||||||||||||||