Home All Groups Group Topic Archive Search About
Author
2 Jun 2006 2:04 PM
Scott Lichtenberg
I have a question regarding database design.  Apologies in advance for the
length of this post, but I'm somewhat at a loss here.


I am creating an items table which will contain records for many different
types of products.  Each product has a number of common attributes (cost,
retail, etc.), but each type of product also has attributes specific to
itself.  As an example, think of a coffee shop.  It sells coffee by the cup,
so we would want to keep track of size, and ingredients.  The shop also
sells bulk coffee, coffee makers, pastries, CD's, and internet access.  Each
of these requires a different set of attributes that I want to track.

The simple solution would be to create an items table with all the necessary
columns.  I see two drawbacks to doing this, however.  The first is
aesthetic - it just strikes me as wrong to have a table with 100 columns.
The second drawback is that if we ever add a new type of product, and it is
likely that we will, I would have to add new columns to the table, and I
would probably have to do a fair amount of programming on the front end to
allow users to enter data and to sort/select on the new fields.


Another possible solution that I have been considering is to create a child
table, in which each attribute for an item represents a row in the table.
So if my main item table looks like the following:

        Table -  Item1  (ItemNo, ItemType, Description, Cost, Retail, etc.)

I would have a child table, Item2 as follows:

        Table -  Item2  (ItemNo, Attribute, Value)

For each item, I could have any number of attributes (size, color, etc.)

By designing the database this way, I eliminate the two issues I have with a
simple 1 level item table.  It really doesn't matter how many attributes an
item has, nor is it a problem to add new types of items and new attributes.
It would likely be easier to implement this type of structure on the front
end.  I could have an item maintenance form in which the user selects the
item, and its attributes are displayed in a grid.

The problem I foresee has to do with sorting, selecting, and displaying the
attributes and their values.  In a single level table, it is easy to select
records for a given set of criteria such as column1 =  x, column2 = y, and
column3 = z.  In the two level structure, we can not do this.  In order to
retrieve records, I will have join my parent table with the child
(attribute) table.

        SELECT  field1, field2, etc.
        FROM    Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo
        WHERE    Item2.Attribute = 'MyAttribute'
        AND         Item2.Value = 'MyValue'


This would allow me to select records where, using my coffee shop as an
example, the country of origin is Columbia.  However, if I try to select
based on two criteria, this will not work.  Since each Item2 record can have
only one attribute, the following query will not return any records:

        SELECT    field1, field2, etc.
        FROM    Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo
        WHERE   ( Item2.Attribute = 'Origin'    AND         Item2.Value =
'Columbia' )
        AND        ( Item2.Attribute = 'Decaf'     AND        Item2.Value =
'True' )

An 'OR' condition instead of the 'AND' condition would return records, but
it would return all coffee from Columbia, not just decaf, and all decaf
regardless of origin.

I have been playing around with the PIVOT function, and Common Table
Elements.  I can pivot the Item2 attributes into CTE, then join
the results with my Item1 table.

WITH MyPivot(it1ItemNo, Origin, Decaf)
AS
(
SELECT it1ItemNo, [Origin] AS Origin, [Decaf] AS Decaf
FROM it2Item2
PIVOT ( MIN(Item2.Value) FOR Attribute IN (Origin, Decaf) )
AS MyPivot
)

SELECT i.*, p.Origin, p.Decaf
FROM Item1 i
JOIN MyPivot p
ON i.ItemNo = p.ItemNo
WHERE  p.Origin = 'Columbia' AND p.Decaf = 'True'


This gives me the records I want, but it seems like an awful lot of work to
get there.  Again, apologies for the length of this post, but here at last
are my questions:

1.  Which would be considered (in the world of good SQL design) the better
way to create my tables?  Is there another way that I have not considered?

2.  If the two level design is the better way, how do I sort and select
based on the attributes in the child table?  Is there a way to do this other
than using the Pivot and CTE method I used above?

3.  If I used the CTE method (or something similar), what type of resources
would it take.  I'm not looking at a huge database or a tremendous number of
users, but I wouldn't want the server to have to run a monster query every
time a user asks for data.


Thanks in advance for any help you can give me.

Best regards,
Scott Lichtenberg

Author
2 Jun 2006 2:17 PM
Alexander Kuznetsov
Scott,

It is called an "entity attribute value" approach, and it is usually a
disaster at most several months after the rubber hits the road - the
performance is horrible.
Google up "entity attribute value" and see for yourself.
Are all your drivers up to date? click for free checkup

Author
2 Jun 2006 4:19 PM
Will
Alexander Kuznetsov wrote:
> Scott,
>
> It is called an "entity attribute value" approach, and it is usually a
> disaster at most several months after the rubber hits the road - the
> performance is horrible.
> Google up "entity attribute value" and see for yourself.

So why is it usually a disaster? If you have nicely indexed tables then
retrieving the relevant attributes for a particular item or group of
items should be nice and efficient. I suspect the problem comes when
people try to use this approach, but then try to create views and
tables to represent the final table within SQL stored procedures.

I've used this before with no performance issues, but I didn't try to
use SQL to build back up the list of attributes into columns, instead I
pulled the objects and their raw datatables back into my c#, then only
joined them as they were bound to the display. Because you would rarely
want to display more than 40 or so items on a screen there was no
performance issue.

Obviously this might not be relevant to the OP depending on the
platform, but personally I liked the model, I think it just needs to be
used with the right tools.

Cheers
Will
Author
2 Jun 2006 2:39 PM
Graham Morris
How about having a base item table (containing the item number and
attributes common to all) and a set of tables for each product type,
containing the item number (key) and only those attributes for that product
type?

For example, the base table has:

BaseTable(ItemNo CommonAtttribute1 CommonAttribute2)

and product type 1 would have:

Type1Table(ItemNo, ProductType1Attribute1, ProductType1Attribute2)

and product type 2 would have

Type2Table(ItemNo, ProductType2Attribute1, ProductType2Attribute2)

The product-specific tables would simply have no rows for items of a
different type.  Queries are simple joins - you don't even have to store the
product type, since an inner join with a product type table would exclude
items not of the correct type.  Where you need to include multiple products
in a query you can use outer joins which would return nulls in unmatched
columns.

The downside would be that you cannot dynamically add attributes - all the
attributes are 'wired-in' to the schema.


Show quoteHide quote
"Scott Lichtenberg" <x**@zzz.com> wrote in message
news:%23tdSD2khGHA.1244@TK2MSFTNGP04.phx.gbl...
>
> I have a question regarding database design.  Apologies in advance for the
> length of this post, but I'm somewhat at a loss here.
>
>
> I am creating an items table which will contain records for many different
> types of products.  Each product has a number of common attributes (cost,
> retail, etc.), but each type of product also has attributes specific to
> itself.  As an example, think of a coffee shop.  It sells coffee by the
> cup,
> so we would want to keep track of size, and ingredients.  The shop also
> sells bulk coffee, coffee makers, pastries, CD's, and internet access.
> Each
> of these requires a different set of attributes that I want to track.
>
> The simple solution would be to create an items table with all the
> necessary
> columns.  I see two drawbacks to doing this, however.  The first is
> aesthetic - it just strikes me as wrong to have a table with 100 columns.
> The second drawback is that if we ever add a new type of product, and it
> is
> likely that we will, I would have to add new columns to the table, and I
> would probably have to do a fair amount of programming on the front end to
> allow users to enter data and to sort/select on the new fields.
>
>
> Another possible solution that I have been considering is to create a
> child
> table, in which each attribute for an item represents a row in the table.
> So if my main item table looks like the following:
>
>        Table -  Item1  (ItemNo, ItemType, Description, Cost, Retail, etc.)
>
> I would have a child table, Item2 as follows:
>
>        Table -  Item2  (ItemNo, Attribute, Value)
>
> For each item, I could have any number of attributes (size, color, etc.)
>
> By designing the database this way, I eliminate the two issues I have with
> a
> simple 1 level item table.  It really doesn't matter how many attributes
> an
> item has, nor is it a problem to add new types of items and new
> attributes.
> It would likely be easier to implement this type of structure on the front
> end.  I could have an item maintenance form in which the user selects the
> item, and its attributes are displayed in a grid.
>
> The problem I foresee has to do with sorting, selecting, and displaying
> the
> attributes and their values.  In a single level table, it is easy to
> select
> records for a given set of criteria such as column1 =  x, column2 = y, and
> column3 = z.  In the two level structure, we can not do this.  In order to
> retrieve records, I will have join my parent table with the child
> (attribute) table.
>
>        SELECT  field1, field2, etc.
>        FROM    Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo
>        WHERE    Item2.Attribute = 'MyAttribute'
>        AND         Item2.Value = 'MyValue'
>
>
> This would allow me to select records where, using my coffee shop as an
> example, the country of origin is Columbia.  However, if I try to select
> based on two criteria, this will not work.  Since each Item2 record can
> have
> only one attribute, the following query will not return any records:
>
>        SELECT    field1, field2, etc.
>        FROM    Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo
>        WHERE   ( Item2.Attribute = 'Origin'    AND         Item2.Value =
> 'Columbia' )
>        AND        ( Item2.Attribute = 'Decaf'     AND        Item2.Value =
> 'True' )
>
> An 'OR' condition instead of the 'AND' condition would return records, but
> it would return all coffee from Columbia, not just decaf, and all decaf
> regardless of origin.
>
> I have been playing around with the PIVOT function, and Common Table
> Elements.  I can pivot the Item2 attributes into CTE, then join
> the results with my Item1 table.
>
> WITH MyPivot(it1ItemNo, Origin, Decaf)
> AS
> (
> SELECT it1ItemNo, [Origin] AS Origin, [Decaf] AS Decaf
> FROM it2Item2
> PIVOT ( MIN(Item2.Value) FOR Attribute IN (Origin, Decaf) )
> AS MyPivot
> )
>
> SELECT i.*, p.Origin, p.Decaf
> FROM Item1 i
> JOIN MyPivot p
> ON i.ItemNo = p.ItemNo
> WHERE  p.Origin = 'Columbia' AND p.Decaf = 'True'
>
>
> This gives me the records I want, but it seems like an awful lot of work
> to
> get there.  Again, apologies for the length of this post, but here at last
> are my questions:
>
> 1.  Which would be considered (in the world of good SQL design) the better
> way to create my tables?  Is there another way that I have not considered?
>
> 2.  If the two level design is the better way, how do I sort and select
> based on the attributes in the child table?  Is there a way to do this
> other
> than using the Pivot and CTE method I used above?
>
> 3.  If I used the CTE method (or something similar), what type of
> resources
> would it take.  I'm not looking at a huge database or a tremendous number
> of
> users, but I wouldn't want the server to have to run a monster query every
> time a user asks for data.
>
>
> Thanks in advance for any help you can give me.
>
> Best regards,
> Scott Lichtenberg
>
>
Author
2 Jun 2006 3:00 PM
Jim Underwood
"Graham Morris" <Graywing@newsgroup.nospam> wrote in message
news:eXvEtJlhGHA.4080@TK2MSFTNGP03.phx.gbl...
>
> The downside would be that you cannot dynamically add attributes - all the
> attributes are 'wired-in' to the schema.
>
I would consider this "downside" the main argument for this approach, rather
than against it.

Since you have a seperate table for each product type, this may not really
be a problem.  If you get a new product type, you create a new table, since
it does not fit into any of the existing tables.  If you need to add an
attribute to an existing product type you add a column to that particular
table, which is what you would do in any database.
Author
2 Jun 2006 3:14 PM
MattB
Do a search Search on these
The Relational Database Model
Normalisation of data
The Entity-Relationship Model


and heres my 2 cents on structure of tables - hope it makes sense
Tables

Items (item_no, item_description, is_active(1 or 0 to track), 
latest_action(date of adds/updates))
Item_type (item_type_no, item_no, type, type_description, is_active,
latest_action)
Item_attributes (item_att_no, item_type_no or item_no, is_active,
latest_action, origin, etc)
Item_Costs (item_no, cost, is_active, latest_action, etc)
Item_Sales (item_no, sale, etc)........
Item_Origin (item_org_no, item_att_no, origin_description, etc)


in other words, try to keep all tables lean - for efficiency
and relate them properly so the queries will run
efficiently (yes joins are ok if tables are correctly set)
and you will not have to do alot of maintanance (new tables etc) on the
database
and no pivots for data, etc.


When you add an item, its just another ROW.
When you change the cost, its just another ROW.
When you de-activate a product (no longer sell) update is_active to 0
When you add a new origin add another ROW.
When you change and attribute -- add new attribute and deactivate old
attribute.

All items, types, attributes, costs, sale_prices should just be adding
another row or updateing existing row.  But you may want to track the
differneces in prices (add new with is_Active = 1, and update old by
is_active = 0)
Update the date to track when you made the change......


Queries
select I.item_description, IT.type_description, IA.origin, etc.
FROM    Items I (nolock)
INNER JOIN Item_type IT (nolock)
ON I.item_no = IT.item_no
INNER JOIN Item_attributes IA (nolock)
ON I.item_no = IA.item_no
WHERE I.item_no = 1 -- this could be single coffee
AND IT.item_type_no = 1 -- this could be non-decaf
AND IA.origin = 1 --this could be Columbian
AND is_active = 1 -- only active product

coffee / non-decaf / Columbian

select I.item_description, IT.type_description, etc.
FROM    Items I (nolock)
INNER JOIN Item_type IT (nolock)
ON I.item_no = IT.item_no
INNER JOIN Item_attributes IA (nolock)
ON I.item_no = IA.item_no
WHERE I.item_no = 11 -- this could be internet access rental
AND IT.item_type_no = 11 -- this could be by the minute
AND is_active = 1 -- only active product (with latest pricing)

Internet access / minute rate

Hope this gives you some better ideas.

Show quoteHide quote
"Jim Underwood" wrote:

>
> "Graham Morris" <Graywing@newsgroup.nospam> wrote in message
> news:eXvEtJlhGHA.4080@TK2MSFTNGP03.phx.gbl...
> >
> > The downside would be that you cannot dynamically add attributes - all the
> > attributes are 'wired-in' to the schema.
> >
> I would consider this "downside" the main argument for this approach, rather
> than against it.
>
> Since you have a seperate table for each product type, this may not really
> be a problem.  If you get a new product type, you create a new table, since
> it does not fit into any of the existing tables.  If you need to add an
> attribute to an existing product type you add a column to that particular
> table, which is what you would do in any database.
>
>
>
Author
2 Jun 2006 3:18 PM
Damien
Scott,

in order to query two or more attributes you'll need a copy of the
attributes table for each attribute you want to query, eg:

SELECT field1, field2, etc.
FROM main_table m
    INNER JOIN Items o ON m.ItemNo = o.ItemNo
    INNER JOIN Items t ON m.ItemNo = t.ItemNo
        AND o.ItemNo = t.ItemNo
WHERE o.Attribute = 'Origin'   
    AND o.Value = 'Columbia'
    AND t.Attribute = 'Decaf'
    AND t.Value = 'True'

Not to say I'm recommending this method, but that's how you would query it.

Damien


Show quoteHide quote
"Scott Lichtenberg" wrote:

>
> I have a question regarding database design.  Apologies in advance for the
> length of this post, but I'm somewhat at a loss here.
>
>
> I am creating an items table which will contain records for many different
> types of products.  Each product has a number of common attributes (cost,
> retail, etc.), but each type of product also has attributes specific to
> itself.  As an example, think of a coffee shop.  It sells coffee by the cup,
> so we would want to keep track of size, and ingredients.  The shop also
> sells bulk coffee, coffee makers, pastries, CD's, and internet access.  Each
> of these requires a different set of attributes that I want to track.
>
> The simple solution would be to create an items table with all the necessary
> columns.  I see two drawbacks to doing this, however.  The first is
> aesthetic - it just strikes me as wrong to have a table with 100 columns.
> The second drawback is that if we ever add a new type of product, and it is
> likely that we will, I would have to add new columns to the table, and I
> would probably have to do a fair amount of programming on the front end to
> allow users to enter data and to sort/select on the new fields.
>
>
> Another possible solution that I have been considering is to create a child
> table, in which each attribute for an item represents a row in the table.
> So if my main item table looks like the following:
>
>         Table -  Item1  (ItemNo, ItemType, Description, Cost, Retail, etc.)
>
> I would have a child table, Item2 as follows:
>
>         Table -  Item2  (ItemNo, Attribute, Value)
>
> For each item, I could have any number of attributes (size, color, etc.)
>
> By designing the database this way, I eliminate the two issues I have with a
> simple 1 level item table.  It really doesn't matter how many attributes an
> item has, nor is it a problem to add new types of items and new attributes.
> It would likely be easier to implement this type of structure on the front
> end.  I could have an item maintenance form in which the user selects the
> item, and its attributes are displayed in a grid.
>
> The problem I foresee has to do with sorting, selecting, and displaying the
> attributes and their values.  In a single level table, it is easy to select
> records for a given set of criteria such as column1 =  x, column2 = y, and
> column3 = z.  In the two level structure, we can not do this.  In order to
> retrieve records, I will have join my parent table with the child
> (attribute) table.
>
>         SELECT  field1, field2, etc.
>         FROM    Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo
>         WHERE    Item2.Attribute = 'MyAttribute'
>         AND         Item2.Value = 'MyValue'
>
>
> This would allow me to select records where, using my coffee shop as an
> example, the country of origin is Columbia.  However, if I try to select
> based on two criteria, this will not work.  Since each Item2 record can have
> only one attribute, the following query will not return any records:
>
>         SELECT    field1, field2, etc.
>         FROM    Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo
>         WHERE   ( Item2.Attribute = 'Origin'    AND         Item2.Value =
> 'Columbia' )
>         AND        ( Item2.Attribute = 'Decaf'     AND        Item2.Value =
> 'True' )
>
> An 'OR' condition instead of the 'AND' condition would return records, but
> it would return all coffee from Columbia, not just decaf, and all decaf
> regardless of origin.
>
> I have been playing around with the PIVOT function, and Common Table
> Elements.  I can pivot the Item2 attributes into CTE, then join
> the results with my Item1 table.
>
> WITH MyPivot(it1ItemNo, Origin, Decaf)
> AS
> (
> SELECT it1ItemNo, [Origin] AS Origin, [Decaf] AS Decaf
> FROM it2Item2
> PIVOT ( MIN(Item2.Value) FOR Attribute IN (Origin, Decaf) )
> AS MyPivot
> )
>
> SELECT i.*, p.Origin, p.Decaf
> FROM Item1 i
> JOIN MyPivot p
> ON i.ItemNo = p.ItemNo
> WHERE  p.Origin = 'Columbia' AND p.Decaf = 'True'
>
>
> This gives me the records I want, but it seems like an awful lot of work to
> get there.  Again, apologies for the length of this post, but here at last
> are my questions:
>
> 1.  Which would be considered (in the world of good SQL design) the better
> way to create my tables?  Is there another way that I have not considered?
>
> 2.  If the two level design is the better way, how do I sort and select
> based on the attributes in the child table?  Is there a way to do this other
> than using the Pivot and CTE method I used above?
>
> 3.  If I used the CTE method (or something similar), what type of resources
> would it take.  I'm not looking at a huge database or a tremendous number of
> users, but I wouldn't want the server to have to run a monster query every
> time a user asks for data.
>
>
> Thanks in advance for any help you can give me.
>
> Best regards,
> Scott Lichtenberg
>
>
>
Author
3 Jun 2006 8:52 AM
Will
An Alternative way to do the searching if you do choose to go with the
attribute value approach which might be a bit more generic could be to
select an item and a count of how many of the criteria it satisfies,
then filter that to only have the maximum number. Not sure how it would
perform, but it would save having to join on the table for each
parameter, and would allow optional parameters.

One reason you may be forced down the attribute value approach is if
the user requires the ability to customise the products themselves
(which is why I've had to follow this route in the past). So far I
haven't seen anyones solution that accomodates this.

Cheers
Will

The code might look something like this (though I've not tested this).

CREATE PROC SearchProducts
@Att1 varchar(50),
@Val1 varchar(50) = null,
@Att2 varchar(50),
@Val2 varchar(50) = null
AS

DECLARE @NumArgs int
SET @NumArgs = CASE WHEN @Val1 IS NULL THEN 0 ELSE 1 END +
        CASE WHEN @Val2 IS NULL THEN 0 ELSE 1 END
SELECT Fields
FROM Main_Table as t
INNER JOIN
(
    SELECT m.ItemID, COUNT(DISTINCT m.Attribute) as ArgsSat
    FROM Items as m
    WHERE
        (@Val1 is null OR (m.Attribute = @Att1 AND m.Value = @Val1))
        OR
        (@Val2 is null or (m.Attribute = @Att2 AND m.Value = @Val2))

    GROUP BY m.ItemID
) as Filter on Filter.ItemID = t.ItemID AND @NumArgs = Filter.ArgsSat

Damien wrote:
Show quoteHide quote
> Scott,
>
> in order to query two or more attributes you'll need a copy of the
> attributes table for each attribute you want to query, eg:
>
> SELECT field1, field2, etc.
> FROM main_table m
>     INNER JOIN Items o ON m.ItemNo = o.ItemNo
>     INNER JOIN Items t ON m.ItemNo = t.ItemNo
>         AND o.ItemNo = t.ItemNo
> WHERE o.Attribute = 'Origin'
>     AND o.Value = 'Columbia'
>     AND t.Attribute = 'Decaf'
>     AND t.Value = 'True'
>
> Not to say I'm recommending this method, but that's how you would query it.
>
> Damien
>
>
> "Scott Lichtenberg" wrote:
>
> >
> > I have a question regarding database design.  Apologies in advance for the
> > length of this post, but I'm somewhat at a loss here.
> >
> >
> > I am creating an items table which will contain records for many different
> > types of products.  Each product has a number of common attributes (cost,
> > retail, etc.), but each type of product also has attributes specific to
> > itself.  As an example, think of a coffee shop.  It sells coffee by the cup,
> > so we would want to keep track of size, and ingredients.  The shop also
> > sells bulk coffee, coffee makers, pastries, CD's, and internet access.  Each
> > of these requires a different set of attributes that I want to track.
> >
> > The simple solution would be to create an items table with all the necessary
> > columns.  I see two drawbacks to doing this, however.  The first is
> > aesthetic - it just strikes me as wrong to have a table with 100 columns.
> > The second drawback is that if we ever add a new type of product, and it is
> > likely that we will, I would have to add new columns to the table, and I
> > would probably have to do a fair amount of programming on the front end to
> > allow users to enter data and to sort/select on the new fields.
> >
> >
> > Another possible solution that I have been considering is to create a child
> > table, in which each attribute for an item represents a row in the table.
> > So if my main item table looks like the following:
> >
> >         Table -  Item1  (ItemNo, ItemType, Description, Cost, Retail, etc.)
> >
> > I would have a child table, Item2 as follows:
> >
> >         Table -  Item2  (ItemNo, Attribute, Value)
> >
> > For each item, I could have any number of attributes (size, color, etc.)
> >
> > By designing the database this way, I eliminate the two issues I have with a
> > simple 1 level item table.  It really doesn't matter how many attributes an
> > item has, nor is it a problem to add new types of items and new attributes.
> > It would likely be easier to implement this type of structure on the front
> > end.  I could have an item maintenance form in which the user selects the
> > item, and its attributes are displayed in a grid.
> >
> > The problem I foresee has to do with sorting, selecting, and displaying the
> > attributes and their values.  In a single level table, it is easy to select
> > records for a given set of criteria such as column1 =  x, column2 = y, and
> > column3 = z.  In the two level structure, we can not do this.  In order to
> > retrieve records, I will have join my parent table with the child
> > (attribute) table.
> >
> >         SELECT  field1, field2, etc.
> >         FROM    Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo
> >         WHERE    Item2.Attribute = 'MyAttribute'
> >         AND         Item2.Value = 'MyValue'
> >
> >
> > This would allow me to select records where, using my coffee shop as an
> > example, the country of origin is Columbia.  However, if I try to select
> > based on two criteria, this will not work.  Since each Item2 record can have
> > only one attribute, the following query will not return any records:
> >
> >         SELECT    field1, field2, etc.
> >         FROM    Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo
> >         WHERE   ( Item2.Attribute = 'Origin'    AND         Item2.Value =
> > 'Columbia' )
> >         AND        ( Item2.Attribute = 'Decaf'     AND        Item2.Value =
> > 'True' )
> >
> > An 'OR' condition instead of the 'AND' condition would return records, but
> > it would return all coffee from Columbia, not just decaf, and all decaf
> > regardless of origin.
> >
> > I have been playing around with the PIVOT function, and Common Table
> > Elements.  I can pivot the Item2 attributes into CTE, then join
> > the results with my Item1 table.
> >
> > WITH MyPivot(it1ItemNo, Origin, Decaf)
> > AS
> > (
> > SELECT it1ItemNo, [Origin] AS Origin, [Decaf] AS Decaf
> > FROM it2Item2
> > PIVOT ( MIN(Item2.Value) FOR Attribute IN (Origin, Decaf) )
> > AS MyPivot
> > )
> >
> > SELECT i.*, p.Origin, p.Decaf
> > FROM Item1 i
> > JOIN MyPivot p
> > ON i.ItemNo = p.ItemNo
> > WHERE  p.Origin = 'Columbia' AND p.Decaf = 'True'
> >
> >
> > This gives me the records I want, but it seems like an awful lot of work to
> > get there.  Again, apologies for the length of this post, but here at last
> > are my questions:
> >
> > 1.  Which would be considered (in the world of good SQL design) the better
> > way to create my tables?  Is there another way that I have not considered?
> >
> > 2.  If the two level design is the better way, how do I sort and select
> > based on the attributes in the child table?  Is there a way to do this other
> > than using the Pivot and CTE method I used above?
> >
> > 3.  If I used the CTE method (or something similar), what type of resources
> > would it take.  I'm not looking at a huge database or a tremendous number of
> > users, but I wouldn't want the server to have to run a monster query every
> > time a user asks for data.
> >
> >
> > Thanks in advance for any help you can give me.
> >
> > Best regards,
> > Scott Lichtenberg
> >
> >
> >
Author
3 Jun 2006 8:56 AM
Will
Actually, just realised that the optional paramaters screw that
suggestion up, but without them it should work:

WHERE
(m.Attribute = @Att1 AND m.Value = @Val1)
OR
(m.Attribute = @Att2 AND m.Value = @Val2)


Will wrote:
Show quoteHide quote
> An Alternative way to do the searching if you do choose to go with the
> attribute value approach which might be a bit more generic could be to
> select an item and a count of how many of the criteria it satisfies,
> then filter that to only have the maximum number. Not sure how it would
> perform, but it would save having to join on the table for each
> parameter, and would allow optional parameters.
>
> One reason you may be forced down the attribute value approach is if
> the user requires the ability to customise the products themselves
> (which is why I've had to follow this route in the past). So far I
> haven't seen anyones solution that accomodates this.
>
> Cheers
> Will
>
> The code might look something like this (though I've not tested this).
>
> CREATE PROC SearchProducts
> @Att1 varchar(50),
> @Val1 varchar(50) = null,
> @Att2 varchar(50),
> @Val2 varchar(50) = null
> AS
>
> DECLARE @NumArgs int
> SET @NumArgs = CASE WHEN @Val1 IS NULL THEN 0 ELSE 1 END +
>         CASE WHEN @Val2 IS NULL THEN 0 ELSE 1 END
> SELECT Fields
> FROM Main_Table as t
> INNER JOIN
> (
>     SELECT m.ItemID, COUNT(DISTINCT m.Attribute) as ArgsSat
>     FROM Items as m
>     WHERE
>         (@Val1 is null OR (m.Attribute = @Att1 AND m.Value = @Val1))
>         OR
>         (@Val2 is null or (m.Attribute = @Att2 AND m.Value = @Val2))
>
>     GROUP BY m.ItemID
> ) as Filter on Filter.ItemID = t.ItemID AND @NumArgs = Filter.ArgsSat
>
> Damien wrote:
> > Scott,
> >
> > in order to query two or more attributes you'll need a copy of the
> > attributes table for each attribute you want to query, eg:
> >
> > SELECT field1, field2, etc.
> > FROM main_table m
> >     INNER JOIN Items o ON m.ItemNo = o.ItemNo
> >     INNER JOIN Items t ON m.ItemNo = t.ItemNo
> >         AND o.ItemNo = t.ItemNo
> > WHERE o.Attribute = 'Origin'
> >     AND o.Value = 'Columbia'
> >     AND t.Attribute = 'Decaf'
> >     AND t.Value = 'True'
> >
> > Not to say I'm recommending this method, but that's how you would query it.
> >
> > Damien
> >
> >
> > "Scott Lichtenberg" wrote:
> >
> > >
> > > I have a question regarding database design.  Apologies in advance for the
> > > length of this post, but I'm somewhat at a loss here.
> > >
> > >
> > > I am creating an items table which will contain records for many different
> > > types of products.  Each product has a number of common attributes (cost,
> > > retail, etc.), but each type of product also has attributes specific to
> > > itself.  As an example, think of a coffee shop.  It sells coffee by the cup,
> > > so we would want to keep track of size, and ingredients.  The shop also
> > > sells bulk coffee, coffee makers, pastries, CD's, and internet access.  Each
> > > of these requires a different set of attributes that I want to track.
> > >
> > > The simple solution would be to create an items table with all the necessary
> > > columns.  I see two drawbacks to doing this, however.  The first is
> > > aesthetic - it just strikes me as wrong to have a table with 100 columns.
> > > The second drawback is that if we ever add a new type of product, and it is
> > > likely that we will, I would have to add new columns to the table, and I
> > > would probably have to do a fair amount of programming on the front end to
> > > allow users to enter data and to sort/select on the new fields.
> > >
> > >
> > > Another possible solution that I have been considering is to create a child
> > > table, in which each attribute for an item represents a row in the table.
> > > So if my main item table looks like the following:
> > >
> > >         Table -  Item1  (ItemNo, ItemType, Description, Cost, Retail, etc.)
> > >
> > > I would have a child table, Item2 as follows:
> > >
> > >         Table -  Item2  (ItemNo, Attribute, Value)
> > >
> > > For each item, I could have any number of attributes (size, color, etc.)
> > >
> > > By designing the database this way, I eliminate the two issues I have with a
> > > simple 1 level item table.  It really doesn't matter how many attributes an
> > > item has, nor is it a problem to add new types of items and new attributes.
> > > It would likely be easier to implement this type of structure on the front
> > > end.  I could have an item maintenance form in which the user selects the
> > > item, and its attributes are displayed in a grid.
> > >
> > > The problem I foresee has to do with sorting, selecting, and displaying the
> > > attributes and their values.  In a single level table, it is easy to select
> > > records for a given set of criteria such as column1 =  x, column2 = y, and
> > > column3 = z.  In the two level structure, we can not do this.  In order to
> > > retrieve records, I will have join my parent table with the child
> > > (attribute) table.
> > >
> > >         SELECT  field1, field2, etc.
> > >         FROM    Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo
> > >         WHERE    Item2.Attribute = 'MyAttribute'
> > >         AND         Item2.Value = 'MyValue'
> > >
> > >
> > > This would allow me to select records where, using my coffee shop as an
> > > example, the country of origin is Columbia.  However, if I try to select
> > > based on two criteria, this will not work.  Since each Item2 record can have
> > > only one attribute, the following query will not return any records:
> > >
> > >         SELECT    field1, field2, etc.
> > >         FROM    Item1 INNER JOIN Item2 ON Item1.ItemNo = Item2.ItemNo
> > >         WHERE   ( Item2.Attribute = 'Origin'    AND         Item2.Value =
> > > 'Columbia' )
> > >         AND        ( Item2.Attribute = 'Decaf'     AND        Item2.Value =
> > > 'True' )
> > >
> > > An 'OR' condition instead of the 'AND' condition would return records, but
> > > it would return all coffee from Columbia, not just decaf, and all decaf
> > > regardless of origin.
> > >
> > > I have been playing around with the PIVOT function, and Common Table
> > > Elements.  I can pivot the Item2 attributes into CTE, then join
> > > the results with my Item1 table.
> > >
> > > WITH MyPivot(it1ItemNo, Origin, Decaf)
> > > AS
> > > (
> > > SELECT it1ItemNo, [Origin] AS Origin, [Decaf] AS Decaf
> > > FROM it2Item2
> > > PIVOT ( MIN(Item2.Value) FOR Attribute IN (Origin, Decaf) )
> > > AS MyPivot
> > > )
> > >
> > > SELECT i.*, p.Origin, p.Decaf
> > > FROM Item1 i
> > > JOIN MyPivot p
> > > ON i.ItemNo = p.ItemNo
> > > WHERE  p.Origin = 'Columbia' AND p.Decaf = 'True'
> > >
> > >
> > > This gives me the records I want, but it seems like an awful lot of work to
> > > get there.  Again, apologies for the length of this post, but here at last
> > > are my questions:
> > >
> > > 1.  Which would be considered (in the world of good SQL design) the better
> > > way to create my tables?  Is there another way that I have not considered?
> > >
> > > 2.  If the two level design is the better way, how do I sort and select
> > > based on the attributes in the child table?  Is there a way to do this other
> > > than using the Pivot and CTE method I used above?
> > >
> > > 3.  If I used the CTE method (or something similar), what type of resources
> > > would it take.  I'm not looking at a huge database or a tremendous number of
> > > users, but I wouldn't want the server to have to run a monster query every
> > > time a user asks for data.
> > >
> > >
> > > Thanks in advance for any help you can give me.
> > >
> > > Best regards,
> > > Scott Lichtenberg
> > >
> > >
> > >
Author
3 Jun 2006 8:01 AM
arc
Scott Lichtenberg wrote:
> I have a question regarding database design.  Apologies in advance for the
> length of this post, but I'm somewhat at a loss here.
>
>
> I am creating an items table which will contain records for many different
> types of products.  Each product has a number of common attributes (cost,
> retail, etc.), but each type of product also has attributes specific to
> itself. 

A problem I have faced many times. The issue is relating "inheritance"
(which is a clearly useful OO design principle) to relational design
(which is a "proven" mathematical way of representing data). I even
though I had figured out the idealistically "correct" solution at one
point out of the various options, but each time I come back to this type
of problem, I think it over again for the specific situation at hand.

Having the attributes (name and value) as rows is not a solution that I
ever really consider. You lose domain integrity... eg, your cost
attribute is surely positive decimal, your description is varchar,
quantity in stock is positive integer, last ordered date is datetime.
But if you store them all in a generic "key->value" table, you can no
longer enforce these domain constraints other than by coding a rather
lengthy trigger, and updating that trigger each time a new attribute is
added. You also run into problems if you want to relate (in an ad hoc
manner) different types of items on some shared attributes. You can make
your trigger shorter by also including a column defining the "data type"
for the value, and basing your trigger rules off that column... but this
seems completely counter to the whole idea of a relational (or quasi
relational) database.

(Having said the above, I must admit that every database I have built
usually ends up with a table of system parameters, with a key->value
pair. But this is reserved for storing things like decimal rounding,
directory path information, arguments to external programs used by the
database, etc. It is a table of database constants and is never treated
as a "relation" for the purpose of projection etc, it's just global storage)

What we have to keep in mind is, WHY are we using a "relational"
database to store our data at all? Surely it is so we can perform
"relational" operations on it? If we take that as a given, then we have
to find some balance between the flexibility of "object" based storage,
and the power of sql based queries.

There are three solutions that I have tried.

The first is the "very wide table". Lots of nullable columns, lots of
null values. I have only used this once as part of a messaging system,
where physically remote systems needed a way to pass information back
and forth, and the class of message required different parameters. The
ease of transferring data between identically structured tables overrode
most other concerns. But once again, this was not really treated as a
relation, it's more like the global storage table I discussed above, but
  a staging area for temporary data that is still easily accessible to
sql. (Think LPARAM, WPARAM in windows messaging, but with a much bigger
param count). Again, I would not use this type of structure for "real" data.

The second and third design patterns both have "names", in that they are
common solutions, although I always forget what those names are. First I
will just describe them, and then I will go over the pros and cons I
have encountered, and the work involved, which anyone using each pattern
would be sure to encounter.

Solution 2 is to have a seperate table for every type of item. Each
table has all attributes that belong to each item. So to take your
coffee shop analogy, the "cup of coffee" products have a table
containing name, cost, cup used. The "bulk coffee" products have a table
containing name, cost, supplier, inventory. The internet access products
have name, cost, bandwidth allocation, subscription duration. As you can
see, each product is likely to have attributes in common with other
products (name and cost are ubiquitous), but those attributes are still
stored by each table for the particular product represented by that table.

Solution 3 is to have a "base class" table containing shared attributes,
and further tables which extend (inherit from?) this base table for each
specific item. Excuse my shorthand and off the cuff naming

Base table: Products[productId (PK), name, cost]
Extended: CupOfCoffeeProducts[ProductId (PK,FK), cup used]
Extended: BulkProducts[ProductId  (PK,FK), supplier, Inventory]
Extended: InternetProducts[ProductID (PK,FK), bandwidth, duration]
etc.

Now, using this data is where it gets really tricky.
Each of solution 2 and 3 can be made to appear as the other, and as the
"very wide table" solution, through the use of views, using lots of
unions and/or lots of left joins. But this can be a VERY LARGE amount of
work to set up, and can be expensive in terms of performance as well.

Why would you want to do something like this anyway?

Well, the most obvious scenario (and one obviously derived from object
oriented design) is that you might sometimes want to treat all products
as "the same" in some way. Eg, how many "products" do we sell,
regardless of type? How many did we sell last month? How much revenue
did we get from those sales? Or an even more illustrative scenario, what
if you can place an order, where each line item can be one or more of
any kind of product? For a situation like this, the base/extended
structure of solution 3 is obviously advantageous. You can create an
orders table

Orders[OrderNbr (PK), Customer]

and an ordered items table...

OrderedItems[OrderNbr (PK,FK), ProductID (FK), Quantity]

And everything looks nicely keyed and safe. This is all well and good,
and you can easily display those ordered items for invoicing or
whatever, using a fairly simple query on base attributes

select O.OrderNbr,
        O.Customer,
        P.Name,
        P.Cost,
        I.Quantity
from   Orders O
join   OrderedItems I on I.orderNbr = o.orderNbr
join   Products P on I.productId = P.productId

Easy peasy, right? But wait.. we have a problem. There is nothing in our
database that prevents an entry in, say, BulkProducts from referencing
the same ProductId as an entry in InternetProducts. That's no good.

OK, so let's add a "TypeCode" somewhere. We will have a table of valid
types, and add the typeCode column to every product table, base or
derived, eg..

Products[ProductId, TypeCode, Cost, Name]
PK = ProductId, TypeCode, FK = TypeCode

BulkProducts[ProductId, TypeCode, ...]
PK = ProductId, TypeCode
FK = ProductId, TypeCode
Constraint = TypeCode = 'BULK'
(NOTE!!!!, NOT seperate FKs on ProductId into the Products table, and
TypeCode into TypeCodes table)

etc.

We've solved the "bad pointer" problem, but might have violated some
normal form, since really ProductId itself is unique and a valid PK in
the base table. The TypeCode doesn't seem to belong in the base table at
all. In OO terms, this smells a bit like RTTI.

Of course, you can make solution 2 look like solution 3 through the use
of views and unions:

select ProductId, cost, Name
from   InternetProducts
union
select ProductId, cost, Name
from   BulkProducts
....

The obvious problem here is that we now need a way of making sure the
productID's in each table are distinct from productID's in any other
table. You could look for a natural PK, but those aren't always
available, and the datatypes might be different for different products,
which would screw up your view somewhat. You could use GUIDs, but I
wouldn't recommend that either, largely for performance reasons. And you
have the further problem that your, say, OrderedItems table can no
longer have a nice, DDL basead FK into any table on ProductID. Hmmmm.
And remember, you can just as easily make solution 3 look like solution
2, by creating a view for each subclass of product that joins the
extended table back to the base table...

create view BulkProductsEx as
select p.name, p.cost, bp.inventory (...)
from   products p
join   bulkproducts bp on bp.productid = p.productid

.... and so on for each subclass.

So far, solution 3 (with the TypeCode column included for integrity
checks) seems to have the most advantages in terms of declarative
referential integrity, it seems to be quite flexible in terms of
queries, and the drawbacks seem to be the most "academic". So does
solution 2 (completely seperate tables) have any advantages over
solution 3 (base table plus extended attribute tables)?

One advantage is performance, but this is very minor. Using solution 2
(seperate tables) You can get all attributes of any product without any
joins. But you ARE going to index your FK's, RIGHT? The cost of a
single, indexed, keyed join is not worth considering IMO.

Another advantage is that it makes your DML a bit easier. For inserts
you don't have to insert into the Product table first, then get your
scope_identity (if using identity columns), and then insert into the
subclass table. This becomes a very real issue if you ever want to
insert, update or delete more than one row at a time. The only "good"
solution I have found for multiple inserts, for instance, is to create
another set of staging tables to store what you are going to insert
which itself generaetes key values, then perform your inserts into both
the base and subclass table using the previously generated keys from the
staging area. If you persist the staging tables (but delete the data so
it doesn't just take up space), and use, say, an identity column, the
problem of making sure that primary key stays unique isn't too big a
deal. But something about it irks me a little. Still, this IS the
solution I used the most recent time I encountered this problem while
using solution 3 (base + extended tables).

Something to note about this solution is that the staging tables look
EXACTLY like the structure for solution 2. We are using solution 2 as a
"pass through" for our DML, so that we get the data retrieval advantages
of solution 3. And that really highlights the overall pattern. Solution
2 is the right solution for your DML. Solution 3 is *probably* the best
solution for your retrieval. The overall best solution means writing
code for both solution 2 AND solution 3, and using both of them together
in those different situations. Unfortunately, this can be a lot of work
in terms of the number of views, procedures, tables, keys, etc, that you
need to write.

The final problem that you are going to get in terms of retrieval is
when you encounter something that requires you to answer the question
"what are all of the attributes of productId x?". Only solution 1 (very
wide table) can answer this easily. In solution 2 (seperate tables) you
have to write code to figure out which table to query (you can't just
union them all together, since the subclasses by definition do not
contain shared attributes that can be unioned). In solution 3 you can
either do the same, or you can left join your base table with every
subclass.

Oh, and keep one further possibility in mind. What do you do when you
have multiple levels of inheritance, or "could" have, and are using
solution 3? For instance:

Base: Products[id, type, name, cost]
Base AND extended: InventoryProducts[id, type, quantity]
Extended: OTCInventoryProducts[id, type, ...something]
Extended: BulkInventoryProducts[id, type, supplier]
Base AND extended: SubscriptionProducts[id, type, duration]
Extended: InternetSubscriptionProducts[id, type, bandwidth]
Extended: MagazineSubscriptionProducts[id, type, publication]

Or do you nix the base + extended tables, and put all extended
attributes into the leaves (so that, eg, InternetSubscriptionProducts
contains the duration column, and so does MagazineSubscriptionProducts)?

I can see no "right" answer to this question. It depends on how you are
going to use the data. But all of the pros and cons (DDL, DRI, DML) of
the original argument become compounded).

I am always interested to hear about any opinions on this topic, or
other solutions people have come up with.
Author
3 Jun 2006 8:52 AM
arc
Also, as an academic (not recommended!) answer to your question on using
entity attribute values, and multiple attribute value queries...

Products[productId (PK), /*... */]

ProductAttributes[ProductId {FK}, Attribute, Value] (PK productId,
attribute)


select p.*
from   Products p
where  exists(
           select *
           from   ProductAttributes
           where  productId = p.productId
           and    Attribute = @firstAttribute
           and    Value = @firstAttributeValue)
and    exists(
           select *
           from   ProductAttributes
           where  productId = p.productId
           and    Attribute = @secondAttribute
           and    Value = @secondAttributeValue)


.... and so on for n attributes. I think this gets what you want
(products that match all values for n given attributes)



Post Thread options