|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Table design questionlength of this post, but I'm somewhat at a loss here. I am creating an items table which will contain records for many different types of products. Each product has a number of common attributes (cost, retail, etc.), but each type of product also has attributes specific to itself. As an example, think of a coffee shop. It sells coffee by the cup, so we would want to keep track of size, and ingredients. The shop also sells bulk coffee, coffee makers, pastries, CD's, and internet access. Each of these requires a different set of attributes that I want to track. The simple solution would be to create an items table with all the necessary columns. I see two drawbacks to doing this, however. The first is aesthetic - it just strikes me as wrong to have a table with 100 columns. The second drawback is that if we ever add a new type of product, and it is likely that we will, I would have to add new columns to the table, and I would probably have to do a fair amount of programming on the front end to allow users to enter data and to sort/select on the new fields. Another possible solution that I have been considering is to create a child table, in which each attribute for an item represents a row in the table. So if my main item table looks like the following: Table - Item1 (ItemNo, ItemType, Description, Cost, Retail, etc.) I would have a child table, Item2 as follows: Table - Item2 (ItemNo, Attribute, Value) For each item, I could have any number of attributes (size, color, etc.) By designing the database this way, I eliminate the two issues I have with a simple 1 level item table. It really doesn't matter how many attributes an item has, nor is it a problem to add new types of items and new attributes. It would likely be easier to implement this type of structure on the front end. I could have an item maintenance form in which the user selects the item, and its attributes are displayed in a grid. The problem I foresee has to do with sorting, selecting, and displaying the attributes and their values. In a single level table, it is easy to select records for a given set of criteria such as column1 = x, column2 = y, and column3 = z. In the two level structure, we can not do this. In order to retrieve records, I will have join my parent table with the child (attribute) table. SELECT field1, field2, etc. FROM Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo WHERE Item2.Attribute = 'MyAttribute' AND Item2.Value = 'MyValue' This would allow me to select records where, using my coffee shop as an example, the country of origin is Columbia. However, if I try to select based on two criteria, this will not work. Since each Item2 record can have only one attribute, the following query will not return any records: SELECT field1, field2, etc. FROM Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo WHERE ( Item2.Attribute = 'Origin' AND Item2.Value = 'Columbia' ) AND ( Item2.Attribute = 'Decaf' AND Item2.Value = 'True' ) An 'OR' condition instead of the 'AND' condition would return records, but it would return all coffee from Columbia, not just decaf, and all decaf regardless of origin. I have been playing around with the PIVOT function, and Common Table Elements. I can pivot the Item2 attributes into CTE, then join the results with my Item1 table. WITH MyPivot(it1ItemNo, Origin, Decaf) AS ( SELECT it1ItemNo, [Origin] AS Origin, [Decaf] AS Decaf FROM it2Item2 PIVOT ( MIN(Item2.Value) FOR Attribute IN (Origin, Decaf) ) AS MyPivot ) SELECT i.*, p.Origin, p.Decaf FROM Item1 i JOIN MyPivot p ON i.ItemNo = p.ItemNo WHERE p.Origin = 'Columbia' AND p.Decaf = 'True' This gives me the records I want, but it seems like an awful lot of work to get there. Again, apologies for the length of this post, but here at last are my questions: 1. Which would be considered (in the world of good SQL design) the better way to create my tables? Is there another way that I have not considered? 2. If the two level design is the better way, how do I sort and select based on the attributes in the child table? Is there a way to do this other than using the Pivot and CTE method I used above? 3. If I used the CTE method (or something similar), what type of resources would it take. I'm not looking at a huge database or a tremendous number of users, but I wouldn't want the server to have to run a monster query every time a user asks for data. Thanks in advance for any help you can give me. Best regards, Scott Lichtenberg Scott,
It is called an "entity attribute value" approach, and it is usually a disaster at most several months after the rubber hits the road - the performance is horrible. Google up "entity attribute value" and see for yourself. Alexander Kuznetsov wrote:
> Scott, So why is it usually a disaster? If you have nicely indexed tables then> > It is called an "entity attribute value" approach, and it is usually a > disaster at most several months after the rubber hits the road - the > performance is horrible. > Google up "entity attribute value" and see for yourself. retrieving the relevant attributes for a particular item or group of items should be nice and efficient. I suspect the problem comes when people try to use this approach, but then try to create views and tables to represent the final table within SQL stored procedures. I've used this before with no performance issues, but I didn't try to use SQL to build back up the list of attributes into columns, instead I pulled the objects and their raw datatables back into my c#, then only joined them as they were bound to the display. Because you would rarely want to display more than 40 or so items on a screen there was no performance issue. Obviously this might not be relevant to the OP depending on the platform, but personally I liked the model, I think it just needs to be used with the right tools. Cheers Will How about having a base item table (containing the item number and
attributes common to all) and a set of tables for each product type, containing the item number (key) and only those attributes for that product type? For example, the base table has: BaseTable(ItemNo CommonAtttribute1 CommonAttribute2) and product type 1 would have: Type1Table(ItemNo, ProductType1Attribute1, ProductType1Attribute2) and product type 2 would have Type2Table(ItemNo, ProductType2Attribute1, ProductType2Attribute2) The product-specific tables would simply have no rows for items of a different type. Queries are simple joins - you don't even have to store the product type, since an inner join with a product type table would exclude items not of the correct type. Where you need to include multiple products in a query you can use outer joins which would return nulls in unmatched columns. The downside would be that you cannot dynamically add attributes - all the attributes are 'wired-in' to the schema. Show quoteHide quote "Scott Lichtenberg" <x**@zzz.com> wrote in message news:%23tdSD2khGHA.1244@TK2MSFTNGP04.phx.gbl... > > I have a question regarding database design. Apologies in advance for the > length of this post, but I'm somewhat at a loss here. > > > I am creating an items table which will contain records for many different > types of products. Each product has a number of common attributes (cost, > retail, etc.), but each type of product also has attributes specific to > itself. As an example, think of a coffee shop. It sells coffee by the > cup, > so we would want to keep track of size, and ingredients. The shop also > sells bulk coffee, coffee makers, pastries, CD's, and internet access. > Each > of these requires a different set of attributes that I want to track. > > The simple solution would be to create an items table with all the > necessary > columns. I see two drawbacks to doing this, however. The first is > aesthetic - it just strikes me as wrong to have a table with 100 columns. > The second drawback is that if we ever add a new type of product, and it > is > likely that we will, I would have to add new columns to the table, and I > would probably have to do a fair amount of programming on the front end to > allow users to enter data and to sort/select on the new fields. > > > Another possible solution that I have been considering is to create a > child > table, in which each attribute for an item represents a row in the table. > So if my main item table looks like the following: > > Table - Item1 (ItemNo, ItemType, Description, Cost, Retail, etc.) > > I would have a child table, Item2 as follows: > > Table - Item2 (ItemNo, Attribute, Value) > > For each item, I could have any number of attributes (size, color, etc.) > > By designing the database this way, I eliminate the two issues I have with > a > simple 1 level item table. It really doesn't matter how many attributes > an > item has, nor is it a problem to add new types of items and new > attributes. > It would likely be easier to implement this type of structure on the front > end. I could have an item maintenance form in which the user selects the > item, and its attributes are displayed in a grid. > > The problem I foresee has to do with sorting, selecting, and displaying > the > attributes and their values. In a single level table, it is easy to > select > records for a given set of criteria such as column1 = x, column2 = y, and > column3 = z. In the two level structure, we can not do this. In order to > retrieve records, I will have join my parent table with the child > (attribute) table. > > SELECT field1, field2, etc. > FROM Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo > WHERE Item2.Attribute = 'MyAttribute' > AND Item2.Value = 'MyValue' > > > This would allow me to select records where, using my coffee shop as an > example, the country of origin is Columbia. However, if I try to select > based on two criteria, this will not work. Since each Item2 record can > have > only one attribute, the following query will not return any records: > > SELECT field1, field2, etc. > FROM Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo > WHERE ( Item2.Attribute = 'Origin' AND Item2.Value = > 'Columbia' ) > AND ( Item2.Attribute = 'Decaf' AND Item2.Value = > 'True' ) > > An 'OR' condition instead of the 'AND' condition would return records, but > it would return all coffee from Columbia, not just decaf, and all decaf > regardless of origin. > > I have been playing around with the PIVOT function, and Common Table > Elements. I can pivot the Item2 attributes into CTE, then join > the results with my Item1 table. > > WITH MyPivot(it1ItemNo, Origin, Decaf) > AS > ( > SELECT it1ItemNo, [Origin] AS Origin, [Decaf] AS Decaf > FROM it2Item2 > PIVOT ( MIN(Item2.Value) FOR Attribute IN (Origin, Decaf) ) > AS MyPivot > ) > > SELECT i.*, p.Origin, p.Decaf > FROM Item1 i > JOIN MyPivot p > ON i.ItemNo = p.ItemNo > WHERE p.Origin = 'Columbia' AND p.Decaf = 'True' > > > This gives me the records I want, but it seems like an awful lot of work > to > get there. Again, apologies for the length of this post, but here at last > are my questions: > > 1. Which would be considered (in the world of good SQL design) the better > way to create my tables? Is there another way that I have not considered? > > 2. If the two level design is the better way, how do I sort and select > based on the attributes in the child table? Is there a way to do this > other > than using the Pivot and CTE method I used above? > > 3. If I used the CTE method (or something similar), what type of > resources > would it take. I'm not looking at a huge database or a tremendous number > of > users, but I wouldn't want the server to have to run a monster query every > time a user asks for data. > > > Thanks in advance for any help you can give me. > > Best regards, > Scott Lichtenberg > > "Graham Morris" <Graywing@newsgroup.nospam> wrote in message I would consider this "downside" the main argument for this approach, rathernews:eXvEtJlhGHA.4080@TK2MSFTNGP03.phx.gbl... > > The downside would be that you cannot dynamically add attributes - all the > attributes are 'wired-in' to the schema. > than against it. Since you have a seperate table for each product type, this may not really be a problem. If you get a new product type, you create a new table, since it does not fit into any of the existing tables. If you need to add an attribute to an existing product type you add a column to that particular table, which is what you would do in any database. Do a search Search on these
The Relational Database Model Normalisation of data The Entity-Relationship Model and heres my 2 cents on structure of tables - hope it makes sense Tables Items (item_no, item_description, is_active(1 or 0 to track), latest_action(date of adds/updates)) Item_type (item_type_no, item_no, type, type_description, is_active, latest_action) Item_attributes (item_att_no, item_type_no or item_no, is_active, latest_action, origin, etc) Item_Costs (item_no, cost, is_active, latest_action, etc) Item_Sales (item_no, sale, etc)........ Item_Origin (item_org_no, item_att_no, origin_description, etc) in other words, try to keep all tables lean - for efficiency and relate them properly so the queries will run efficiently (yes joins are ok if tables are correctly set) and you will not have to do alot of maintanance (new tables etc) on the database and no pivots for data, etc. When you add an item, its just another ROW. When you change the cost, its just another ROW. When you de-activate a product (no longer sell) update is_active to 0 When you add a new origin add another ROW. When you change and attribute -- add new attribute and deactivate old attribute. All items, types, attributes, costs, sale_prices should just be adding another row or updateing existing row. But you may want to track the differneces in prices (add new with is_Active = 1, and update old by is_active = 0) Update the date to track when you made the change...... Queries select I.item_description, IT.type_description, IA.origin, etc. FROM Items I (nolock) INNER JOIN Item_type IT (nolock) ON I.item_no = IT.item_no INNER JOIN Item_attributes IA (nolock) ON I.item_no = IA.item_no WHERE I.item_no = 1 -- this could be single coffee AND IT.item_type_no = 1 -- this could be non-decaf AND IA.origin = 1 --this could be Columbian AND is_active = 1 -- only active product coffee / non-decaf / Columbian select I.item_description, IT.type_description, etc. FROM Items I (nolock) INNER JOIN Item_type IT (nolock) ON I.item_no = IT.item_no INNER JOIN Item_attributes IA (nolock) ON I.item_no = IA.item_no WHERE I.item_no = 11 -- this could be internet access rental AND IT.item_type_no = 11 -- this could be by the minute AND is_active = 1 -- only active product (with latest pricing) Internet access / minute rate Hope this gives you some better ideas. Show quoteHide quote "Jim Underwood" wrote: > > "Graham Morris" <Graywing@newsgroup.nospam> wrote in message > news:eXvEtJlhGHA.4080@TK2MSFTNGP03.phx.gbl... > > > > The downside would be that you cannot dynamically add attributes - all the > > attributes are 'wired-in' to the schema. > > > I would consider this "downside" the main argument for this approach, rather > than against it. > > Since you have a seperate table for each product type, this may not really > be a problem. If you get a new product type, you create a new table, since > it does not fit into any of the existing tables. If you need to add an > attribute to an existing product type you add a column to that particular > table, which is what you would do in any database. > > > Scott,
in order to query two or more attributes you'll need a copy of the attributes table for each attribute you want to query, eg: SELECT field1, field2, etc. FROM main_table m INNER JOIN Items o ON m.ItemNo = o.ItemNo INNER JOIN Items t ON m.ItemNo = t.ItemNo AND o.ItemNo = t.ItemNo WHERE o.Attribute = 'Origin' AND o.Value = 'Columbia' AND t.Attribute = 'Decaf' AND t.Value = 'True' Not to say I'm recommending this method, but that's how you would query it. Damien Show quoteHide quote "Scott Lichtenberg" wrote: > > I have a question regarding database design. Apologies in advance for the > length of this post, but I'm somewhat at a loss here. > > > I am creating an items table which will contain records for many different > types of products. Each product has a number of common attributes (cost, > retail, etc.), but each type of product also has attributes specific to > itself. As an example, think of a coffee shop. It sells coffee by the cup, > so we would want to keep track of size, and ingredients. The shop also > sells bulk coffee, coffee makers, pastries, CD's, and internet access. Each > of these requires a different set of attributes that I want to track. > > The simple solution would be to create an items table with all the necessary > columns. I see two drawbacks to doing this, however. The first is > aesthetic - it just strikes me as wrong to have a table with 100 columns. > The second drawback is that if we ever add a new type of product, and it is > likely that we will, I would have to add new columns to the table, and I > would probably have to do a fair amount of programming on the front end to > allow users to enter data and to sort/select on the new fields. > > > Another possible solution that I have been considering is to create a child > table, in which each attribute for an item represents a row in the table. > So if my main item table looks like the following: > > Table - Item1 (ItemNo, ItemType, Description, Cost, Retail, etc.) > > I would have a child table, Item2 as follows: > > Table - Item2 (ItemNo, Attribute, Value) > > For each item, I could have any number of attributes (size, color, etc.) > > By designing the database this way, I eliminate the two issues I have with a > simple 1 level item table. It really doesn't matter how many attributes an > item has, nor is it a problem to add new types of items and new attributes. > It would likely be easier to implement this type of structure on the front > end. I could have an item maintenance form in which the user selects the > item, and its attributes are displayed in a grid. > > The problem I foresee has to do with sorting, selecting, and displaying the > attributes and their values. In a single level table, it is easy to select > records for a given set of criteria such as column1 = x, column2 = y, and > column3 = z. In the two level structure, we can not do this. In order to > retrieve records, I will have join my parent table with the child > (attribute) table. > > SELECT field1, field2, etc. > FROM Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo > WHERE Item2.Attribute = 'MyAttribute' > AND Item2.Value = 'MyValue' > > > This would allow me to select records where, using my coffee shop as an > example, the country of origin is Columbia. However, if I try to select > based on two criteria, this will not work. Since each Item2 record can have > only one attribute, the following query will not return any records: > > SELECT field1, field2, etc. > FROM Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo > WHERE ( Item2.Attribute = 'Origin' AND Item2.Value = > 'Columbia' ) > AND ( Item2.Attribute = 'Decaf' AND Item2.Value = > 'True' ) > > An 'OR' condition instead of the 'AND' condition would return records, but > it would return all coffee from Columbia, not just decaf, and all decaf > regardless of origin. > > I have been playing around with the PIVOT function, and Common Table > Elements. I can pivot the Item2 attributes into CTE, then join > the results with my Item1 table. > > WITH MyPivot(it1ItemNo, Origin, Decaf) > AS > ( > SELECT it1ItemNo, [Origin] AS Origin, [Decaf] AS Decaf > FROM it2Item2 > PIVOT ( MIN(Item2.Value) FOR Attribute IN (Origin, Decaf) ) > AS MyPivot > ) > > SELECT i.*, p.Origin, p.Decaf > FROM Item1 i > JOIN MyPivot p > ON i.ItemNo = p.ItemNo > WHERE p.Origin = 'Columbia' AND p.Decaf = 'True' > > > This gives me the records I want, but it seems like an awful lot of work to > get there. Again, apologies for the length of this post, but here at last > are my questions: > > 1. Which would be considered (in the world of good SQL design) the better > way to create my tables? Is there another way that I have not considered? > > 2. If the two level design is the better way, how do I sort and select > based on the attributes in the child table? Is there a way to do this other > than using the Pivot and CTE method I used above? > > 3. If I used the CTE method (or something similar), what type of resources > would it take. I'm not looking at a huge database or a tremendous number of > users, but I wouldn't want the server to have to run a monster query every > time a user asks for data. > > > Thanks in advance for any help you can give me. > > Best regards, > Scott Lichtenberg > > > An Alternative way to do the searching if you do choose to go with the
attribute value approach which might be a bit more generic could be to select an item and a count of how many of the criteria it satisfies, then filter that to only have the maximum number. Not sure how it would perform, but it would save having to join on the table for each parameter, and would allow optional parameters. One reason you may be forced down the attribute value approach is if the user requires the ability to customise the products themselves (which is why I've had to follow this route in the past). So far I haven't seen anyones solution that accomodates this. Cheers Will The code might look something like this (though I've not tested this). CREATE PROC SearchProducts @Att1 varchar(50), @Val1 varchar(50) = null, @Att2 varchar(50), @Val2 varchar(50) = null AS DECLARE @NumArgs int SET @NumArgs = CASE WHEN @Val1 IS NULL THEN 0 ELSE 1 END + CASE WHEN @Val2 IS NULL THEN 0 ELSE 1 END SELECT Fields FROM Main_Table as t INNER JOIN ( SELECT m.ItemID, COUNT(DISTINCT m.Attribute) as ArgsSat FROM Items as m WHERE (@Val1 is null OR (m.Attribute = @Att1 AND m.Value = @Val1)) OR (@Val2 is null or (m.Attribute = @Att2 AND m.Value = @Val2)) GROUP BY m.ItemID ) as Filter on Filter.ItemID = t.ItemID AND @NumArgs = Filter.ArgsSat Damien wrote: Show quoteHide quote > Scott, > > in order to query two or more attributes you'll need a copy of the > attributes table for each attribute you want to query, eg: > > SELECT field1, field2, etc. > FROM main_table m > INNER JOIN Items o ON m.ItemNo = o.ItemNo > INNER JOIN Items t ON m.ItemNo = t.ItemNo > AND o.ItemNo = t.ItemNo > WHERE o.Attribute = 'Origin' > AND o.Value = 'Columbia' > AND t.Attribute = 'Decaf' > AND t.Value = 'True' > > Not to say I'm recommending this method, but that's how you would query it. > > Damien > > > "Scott Lichtenberg" wrote: > > > > > I have a question regarding database design. Apologies in advance for the > > length of this post, but I'm somewhat at a loss here. > > > > > > I am creating an items table which will contain records for many different > > types of products. Each product has a number of common attributes (cost, > > retail, etc.), but each type of product also has attributes specific to > > itself. As an example, think of a coffee shop. It sells coffee by the cup, > > so we would want to keep track of size, and ingredients. The shop also > > sells bulk coffee, coffee makers, pastries, CD's, and internet access. Each > > of these requires a different set of attributes that I want to track. > > > > The simple solution would be to create an items table with all the necessary > > columns. I see two drawbacks to doing this, however. The first is > > aesthetic - it just strikes me as wrong to have a table with 100 columns. > > The second drawback is that if we ever add a new type of product, and it is > > likely that we will, I would have to add new columns to the table, and I > > would probably have to do a fair amount of programming on the front end to > > allow users to enter data and to sort/select on the new fields. > > > > > > Another possible solution that I have been considering is to create a child > > table, in which each attribute for an item represents a row in the table. > > So if my main item table looks like the following: > > > > Table - Item1 (ItemNo, ItemType, Description, Cost, Retail, etc.) > > > > I would have a child table, Item2 as follows: > > > > Table - Item2 (ItemNo, Attribute, Value) > > > > For each item, I could have any number of attributes (size, color, etc.) > > > > By designing the database this way, I eliminate the two issues I have with a > > simple 1 level item table. It really doesn't matter how many attributes an > > item has, nor is it a problem to add new types of items and new attributes. > > It would likely be easier to implement this type of structure on the front > > end. I could have an item maintenance form in which the user selects the > > item, and its attributes are displayed in a grid. > > > > The problem I foresee has to do with sorting, selecting, and displaying the > > attributes and their values. In a single level table, it is easy to select > > records for a given set of criteria such as column1 = x, column2 = y, and > > column3 = z. In the two level structure, we can not do this. In order to > > retrieve records, I will have join my parent table with the child > > (attribute) table. > > > > SELECT field1, field2, etc. > > FROM Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo > > WHERE Item2.Attribute = 'MyAttribute' > > AND Item2.Value = 'MyValue' > > > > > > This would allow me to select records where, using my coffee shop as an > > example, the country of origin is Columbia. However, if I try to select > > based on two criteria, this will not work. Since each Item2 record can have > > only one attribute, the following query will not return any records: > > > > SELECT field1, field2, etc. > > FROM Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo > > WHERE ( Item2.Attribute = 'Origin' AND Item2.Value = > > 'Columbia' ) > > AND ( Item2.Attribute = 'Decaf' AND Item2.Value = > > 'True' ) > > > > An 'OR' condition instead of the 'AND' condition would return records, but > > it would return all coffee from Columbia, not just decaf, and all decaf > > regardless of origin. > > > > I have been playing around with the PIVOT function, and Common Table > > Elements. I can pivot the Item2 attributes into CTE, then join > > the results with my Item1 table. > > > > WITH MyPivot(it1ItemNo, Origin, Decaf) > > AS > > ( > > SELECT it1ItemNo, [Origin] AS Origin, [Decaf] AS Decaf > > FROM it2Item2 > > PIVOT ( MIN(Item2.Value) FOR Attribute IN (Origin, Decaf) ) > > AS MyPivot > > ) > > > > SELECT i.*, p.Origin, p.Decaf > > FROM Item1 i > > JOIN MyPivot p > > ON i.ItemNo = p.ItemNo > > WHERE p.Origin = 'Columbia' AND p.Decaf = 'True' > > > > > > This gives me the records I want, but it seems like an awful lot of work to > > get there. Again, apologies for the length of this post, but here at last > > are my questions: > > > > 1. Which would be considered (in the world of good SQL design) the better > > way to create my tables? Is there another way that I have not considered? > > > > 2. If the two level design is the better way, how do I sort and select > > based on the attributes in the child table? Is there a way to do this other > > than using the Pivot and CTE method I used above? > > > > 3. If I used the CTE method (or something similar), what type of resources > > would it take. I'm not looking at a huge database or a tremendous number of > > users, but I wouldn't want the server to have to run a monster query every > > time a user asks for data. > > > > > > Thanks in advance for any help you can give me. > > > > Best regards, > > Scott Lichtenberg > > > > > > Actually, just realised that the optional paramaters screw that
suggestion up, but without them it should work: WHERE (m.Attribute = @Att1 AND m.Value = @Val1) OR (m.Attribute = @Att2 AND m.Value = @Val2) Will wrote: Show quoteHide quote > An Alternative way to do the searching if you do choose to go with the > attribute value approach which might be a bit more generic could be to > select an item and a count of how many of the criteria it satisfies, > then filter that to only have the maximum number. Not sure how it would > perform, but it would save having to join on the table for each > parameter, and would allow optional parameters. > > One reason you may be forced down the attribute value approach is if > the user requires the ability to customise the products themselves > (which is why I've had to follow this route in the past). So far I > haven't seen anyones solution that accomodates this. > > Cheers > Will > > The code might look something like this (though I've not tested this). > > CREATE PROC SearchProducts > @Att1 varchar(50), > @Val1 varchar(50) = null, > @Att2 varchar(50), > @Val2 varchar(50) = null > AS > > DECLARE @NumArgs int > SET @NumArgs = CASE WHEN @Val1 IS NULL THEN 0 ELSE 1 END + > CASE WHEN @Val2 IS NULL THEN 0 ELSE 1 END > SELECT Fields > FROM Main_Table as t > INNER JOIN > ( > SELECT m.ItemID, COUNT(DISTINCT m.Attribute) as ArgsSat > FROM Items as m > WHERE > (@Val1 is null OR (m.Attribute = @Att1 AND m.Value = @Val1)) > OR > (@Val2 is null or (m.Attribute = @Att2 AND m.Value = @Val2)) > > GROUP BY m.ItemID > ) as Filter on Filter.ItemID = t.ItemID AND @NumArgs = Filter.ArgsSat > > Damien wrote: > > Scott, > > > > in order to query two or more attributes you'll need a copy of the > > attributes table for each attribute you want to query, eg: > > > > SELECT field1, field2, etc. > > FROM main_table m > > INNER JOIN Items o ON m.ItemNo = o.ItemNo > > INNER JOIN Items t ON m.ItemNo = t.ItemNo > > AND o.ItemNo = t.ItemNo > > WHERE o.Attribute = 'Origin' > > AND o.Value = 'Columbia' > > AND t.Attribute = 'Decaf' > > AND t.Value = 'True' > > > > Not to say I'm recommending this method, but that's how you would query it. > > > > Damien > > > > > > "Scott Lichtenberg" wrote: > > > > > > > > I have a question regarding database design. Apologies in advance for the > > > length of this post, but I'm somewhat at a loss here. > > > > > > > > > I am creating an items table which will contain records for many different > > > types of products. Each product has a number of common attributes (cost, > > > retail, etc.), but each type of product also has attributes specific to > > > itself. As an example, think of a coffee shop. It sells coffee by the cup, > > > so we would want to keep track of size, and ingredients. The shop also > > > sells bulk coffee, coffee makers, pastries, CD's, and internet access. Each > > > of these requires a different set of attributes that I want to track. > > > > > > The simple solution would be to create an items table with all the necessary > > > columns. I see two drawbacks to doing this, however. The first is > > > aesthetic - it just strikes me as wrong to have a table with 100 columns. > > > The second drawback is that if we ever add a new type of product, and it is > > > likely that we will, I would have to add new columns to the table, and I > > > would probably have to do a fair amount of programming on the front end to > > > allow users to enter data and to sort/select on the new fields. > > > > > > > > > Another possible solution that I have been considering is to create a child > > > table, in which each attribute for an item represents a row in the table. > > > So if my main item table looks like the following: > > > > > > Table - Item1 (ItemNo, ItemType, Description, Cost, Retail, etc.) > > > > > > I would have a child table, Item2 as follows: > > > > > > Table - Item2 (ItemNo, Attribute, Value) > > > > > > For each item, I could have any number of attributes (size, color, etc.) > > > > > > By designing the database this way, I eliminate the two issues I have with a > > > simple 1 level item table. It really doesn't matter how many attributes an > > > item has, nor is it a problem to add new types of items and new attributes. > > > It would likely be easier to implement this type of structure on the front > > > end. I could have an item maintenance form in which the user selects the > > > item, and its attributes are displayed in a grid. > > > > > > The problem I foresee has to do with sorting, selecting, and displaying the > > > attributes and their values. In a single level table, it is easy to select > > > records for a given set of criteria such as column1 = x, column2 = y, and > > > column3 = z. In the two level structure, we can not do this. In order to > > > retrieve records, I will have join my parent table with the child > > > (attribute) table. > > > > > > SELECT field1, field2, etc. > > > FROM Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo > > > WHERE Item2.Attribute = 'MyAttribute' > > > AND Item2.Value = 'MyValue' > > > > > > > > > This would allow me to select records where, using my coffee shop as an > > > example, the country of origin is Columbia. However, if I try to select > > > based on two criteria, this will not work. Since each Item2 record can have > > > only one attribute, the following query will not return any records: > > > > > > SELECT field1, field2, etc. > > > FROM Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo > > > WHERE ( Item2.Attribute = 'Origin' AND Item2.Value = > > > 'Columbia' ) > > > AND ( Item2.Attribute = 'Decaf' AND Item2.Value = > > > 'True' ) > > > > > > An 'OR' condition instead of the 'AND' condition would return records, but > > > it would return all coffee from Columbia, not just decaf, and all decaf > > > regardless of origin. > > > > > > I have been playing around with the PIVOT function, and Common Table > > > Elements. I can pivot the Item2 attributes into CTE, then join > > > the results with my Item1 table. > > > > > > WITH MyPivot(it1ItemNo, Origin, Decaf) > > > AS > > > ( > > > SELECT it1ItemNo, [Origin] AS Origin, [Decaf] AS Decaf > > > FROM it2Item2 > > > PIVOT ( MIN(Item2.Value) FOR Attribute IN (Origin, Decaf) ) > > > AS MyPivot > > > ) > > > > > > SELECT i.*, p.Origin, p.Decaf > > > FROM Item1 i > > > JOIN MyPivot p > > > ON i.ItemNo = p.ItemNo > > > WHERE p.Origin = 'Columbia' AND p.Decaf = 'True' > > > > > > > > > This gives me the records I want, but it seems like an awful lot of work to > > > get there. Again, apologies for the length of this post, but here at last > > > are my questions: > > > > > > 1. Which would be considered (in the world of good SQL design) the better > > > way to create my tables? Is there another way that I have not considered? > > > > > > 2. If the two level design is the better way, how do I sort and select > > > based on the attributes in the child table? Is there a way to do this other > > > than using the Pivot and CTE method I used above? > > > > > > 3. If I used the CTE method (or something similar), what type of resources > > > would it take. I'm not looking at a huge database or a tremendous number of > > > users, but I wouldn't want the server to have to run a monster query every > > > time a user asks for data. > > > > > > > > > Thanks in advance for any help you can give me. > > > > > > Best regards, > > > Scott Lichtenberg > > > > > > > > > Scott Lichtenberg wrote:
> I have a question regarding database design. Apologies in advance for the A problem I have faced many times. The issue is relating "inheritance" > length of this post, but I'm somewhat at a loss here. > > > I am creating an items table which will contain records for many different > types of products. Each product has a number of common attributes (cost, > retail, etc.), but each type of product also has attributes specific to > itself. (which is a clearly useful OO design principle) to relational design (which is a "proven" mathematical way of representing data). I even though I had figured out the idealistically "correct" solution at one point out of the various options, but each time I come back to this type of problem, I think it over again for the specific situation at hand. Having the attributes (name and value) as rows is not a solution that I ever really consider. You lose domain integrity... eg, your cost attribute is surely positive decimal, your description is varchar, quantity in stock is positive integer, last ordered date is datetime. But if you store them all in a generic "key->value" table, you can no longer enforce these domain constraints other than by coding a rather lengthy trigger, and updating that trigger each time a new attribute is added. You also run into problems if you want to relate (in an ad hoc manner) different types of items on some shared attributes. You can make your trigger shorter by also including a column defining the "data type" for the value, and basing your trigger rules off that column... but this seems completely counter to the whole idea of a relational (or quasi relational) database. (Having said the above, I must admit that every database I have built usually ends up with a table of system parameters, with a key->value pair. But this is reserved for storing things like decimal rounding, directory path information, arguments to external programs used by the database, etc. It is a table of database constants and is never treated as a "relation" for the purpose of projection etc, it's just global storage) What we have to keep in mind is, WHY are we using a "relational" database to store our data at all? Surely it is so we can perform "relational" operations on it? If we take that as a given, then we have to find some balance between the flexibility of "object" based storage, and the power of sql based queries. There are three solutions that I have tried. The first is the "very wide table". Lots of nullable columns, lots of null values. I have only used this once as part of a messaging system, where physically remote systems needed a way to pass information back and forth, and the class of message required different parameters. The ease of transferring data between identically structured tables overrode most other concerns. But once again, this was not really treated as a relation, it's more like the global storage table I discussed above, but a staging area for temporary data that is still easily accessible to sql. (Think LPARAM, WPARAM in windows messaging, but with a much bigger param count). Again, I would not use this type of structure for "real" data. The second and third design patterns both have "names", in that they are common solutions, although I always forget what those names are. First I will just describe them, and then I will go over the pros and cons I have encountered, and the work involved, which anyone using each pattern would be sure to encounter. Solution 2 is to have a seperate table for every type of item. Each table has all attributes that belong to each item. So to take your coffee shop analogy, the "cup of coffee" products have a table containing name, cost, cup used. The "bulk coffee" products have a table containing name, cost, supplier, inventory. The internet access products have name, cost, bandwidth allocation, subscription duration. As you can see, each product is likely to have attributes in common with other products (name and cost are ubiquitous), but those attributes are still stored by each table for the particular product represented by that table. Solution 3 is to have a "base class" table containing shared attributes, and further tables which extend (inherit from?) this base table for each specific item. Excuse my shorthand and off the cuff naming Base table: Products[productId (PK), name, cost] Extended: CupOfCoffeeProducts[ProductId (PK,FK), cup used] Extended: BulkProducts[ProductId (PK,FK), supplier, Inventory] Extended: InternetProducts[ProductID (PK,FK), bandwidth, duration] etc. Now, using this data is where it gets really tricky. Each of solution 2 and 3 can be made to appear as the other, and as the "very wide table" solution, through the use of views, using lots of unions and/or lots of left joins. But this can be a VERY LARGE amount of work to set up, and can be expensive in terms of performance as well. Why would you want to do something like this anyway? Well, the most obvious scenario (and one obviously derived from object oriented design) is that you might sometimes want to treat all products as "the same" in some way. Eg, how many "products" do we sell, regardless of type? How many did we sell last month? How much revenue did we get from those sales? Or an even more illustrative scenario, what if you can place an order, where each line item can be one or more of any kind of product? For a situation like this, the base/extended structure of solution 3 is obviously advantageous. You can create an orders table Orders[OrderNbr (PK), Customer] and an ordered items table... OrderedItems[OrderNbr (PK,FK), ProductID (FK), Quantity] And everything looks nicely keyed and safe. This is all well and good, and you can easily display those ordered items for invoicing or whatever, using a fairly simple query on base attributes select O.OrderNbr, O.Customer, P.Name, P.Cost, I.Quantity from Orders O join OrderedItems I on I.orderNbr = o.orderNbr join Products P on I.productId = P.productId Easy peasy, right? But wait.. we have a problem. There is nothing in our database that prevents an entry in, say, BulkProducts from referencing the same ProductId as an entry in InternetProducts. That's no good. OK, so let's add a "TypeCode" somewhere. We will have a table of valid types, and add the typeCode column to every product table, base or derived, eg.. Products[ProductId, TypeCode, Cost, Name] PK = ProductId, TypeCode, FK = TypeCode BulkProducts[ProductId, TypeCode, ...] PK = ProductId, TypeCode FK = ProductId, TypeCode Constraint = TypeCode = 'BULK' (NOTE!!!!, NOT seperate FKs on ProductId into the Products table, and TypeCode into TypeCodes table) etc. We've solved the "bad pointer" problem, but might have violated some normal form, since really ProductId itself is unique and a valid PK in the base table. The TypeCode doesn't seem to belong in the base table at all. In OO terms, this smells a bit like RTTI. Of course, you can make solution 2 look like solution 3 through the use of views and unions: select ProductId, cost, Name from InternetProducts union select ProductId, cost, Name from BulkProducts .... The obvious problem here is that we now need a way of making sure the productID's in each table are distinct from productID's in any other table. You could look for a natural PK, but those aren't always available, and the datatypes might be different for different products, which would screw up your view somewhat. You could use GUIDs, but I wouldn't recommend that either, largely for performance reasons. And you have the further problem that your, say, OrderedItems table can no longer have a nice, DDL basead FK into any table on ProductID. Hmmmm. And remember, you can just as easily make solution 3 look like solution 2, by creating a view for each subclass of product that joins the extended table back to the base table... create view BulkProductsEx as select p.name, p.cost, bp.inventory (...) from products p join bulkproducts bp on bp.productid = p.productid .... and so on for each subclass. So far, solution 3 (with the TypeCode column included for integrity checks) seems to have the most advantages in terms of declarative referential integrity, it seems to be quite flexible in terms of queries, and the drawbacks seem to be the most "academic". So does solution 2 (completely seperate tables) have any advantages over solution 3 (base table plus extended attribute tables)? One advantage is performance, but this is very minor. Using solution 2 (seperate tables) You can get all attributes of any product without any joins. But you ARE going to index your FK's, RIGHT? The cost of a single, indexed, keyed join is not worth considering IMO. Another advantage is that it makes your DML a bit easier. For inserts you don't have to insert into the Product table first, then get your scope_identity (if using identity columns), and then insert into the subclass table. This becomes a very real issue if you ever want to insert, update or delete more than one row at a time. The only "good" solution I have found for multiple inserts, for instance, is to create another set of staging tables to store what you are going to insert which itself generaetes key values, then perform your inserts into both the base and subclass table using the previously generated keys from the staging area. If you persist the staging tables (but delete the data so it doesn't just take up space), and use, say, an identity column, the problem of making sure that primary key stays unique isn't too big a deal. But something about it irks me a little. Still, this IS the solution I used the most recent time I encountered this problem while using solution 3 (base + extended tables). Something to note about this solution is that the staging tables look EXACTLY like the structure for solution 2. We are using solution 2 as a "pass through" for our DML, so that we get the data retrieval advantages of solution 3. And that really highlights the overall pattern. Solution 2 is the right solution for your DML. Solution 3 is *probably* the best solution for your retrieval. The overall best solution means writing code for both solution 2 AND solution 3, and using both of them together in those different situations. Unfortunately, this can be a lot of work in terms of the number of views, procedures, tables, keys, etc, that you need to write. The final problem that you are going to get in terms of retrieval is when you encounter something that requires you to answer the question "what are all of the attributes of productId x?". Only solution 1 (very wide table) can answer this easily. In solution 2 (seperate tables) you have to write code to figure out which table to query (you can't just union them all together, since the subclasses by definition do not contain shared attributes that can be unioned). In solution 3 you can either do the same, or you can left join your base table with every subclass. Oh, and keep one further possibility in mind. What do you do when you have multiple levels of inheritance, or "could" have, and are using solution 3? For instance: Base: Products[id, type, name, cost] Base AND extended: InventoryProducts[id, type, quantity] Extended: OTCInventoryProducts[id, type, ...something] Extended: BulkInventoryProducts[id, type, supplier] Base AND extended: SubscriptionProducts[id, type, duration] Extended: InternetSubscriptionProducts[id, type, bandwidth] Extended: MagazineSubscriptionProducts[id, type, publication] Or do you nix the base + extended tables, and put all extended attributes into the leaves (so that, eg, InternetSubscriptionProducts contains the duration column, and so does MagazineSubscriptionProducts)? I can see no "right" answer to this question. It depends on how you are going to use the data. But all of the pros and cons (DDL, DRI, DML) of the original argument become compounded). I am always interested to hear about any opinions on this topic, or other solutions people have come up with. Also, as an academic (not recommended!) answer to your question on using
entity attribute values, and multiple attribute value queries... Products[productId (PK), /*... */] ProductAttributes[ProductId {FK}, Attribute, Value] (PK productId, attribute) select p.* from Products p where exists( select * from ProductAttributes where productId = p.productId and Attribute = @firstAttribute and Value = @firstAttributeValue) and exists( select * from ProductAttributes where productId = p.productId and Attribute = @secondAttribute and Value = @secondAttributeValue) .... and so on for n attributes. I think this gets what you want (products that match all values for n given attributes)
All tables under partitioned union view get locked - how to reduce?
different database connection in clr stored proc SQL to show the number of distinct rows in tbl linked to another Update based on some combinations Invalid Column Name in SELECT select statement Swedish collation, 'V' = 'W' rule and different behaviour for VARCHAR and NVARCHAR querying a text file, waitfor issue? Import queries Getting @@ROWCOUNT from BULK INSERT with BATCHSIZE set |
|||||||||||||||||||||||