Home All Groups Group Topic Archive Search About

Hierarchically arranged attributes...mission impossible?

Author
2 Mar 2006 11:17 PM
Daniel Manes
Struggling over the design of a database. Basically, we have a long
list of attributes. We've decided to arrange them hierarchically so
users can find them more easily. The problem is, this doesn't seem very
easy to implement! Let me try to explain what I'm talking about a
little more specifically.

I have a list of attributes organized hierarchically. For example:

CarAttributes
-----Configuration
----------Doors
----------EngineLocation
----------DriveWheels
-----Engine
----------Type
---------------Cylinders
---------------Arrangement
----------Specs
---------------Size
---------------Power
---------------Torque
-----Transmission
----------Type
----------Gears

I also have a lot of attribute data (for leaf attributes only). For
example:

CarID = 1, Doors = 4
CarID = 1, EngineLocation = 'Front'
CarID = 1, DriveWheels = 'Front'
CarID = 1, Cylinders = 6
CarID = 1, Arrangement = 'Inline'
CarID = 1, Size = '3.0'
....
CarID = 2, Doors = 2
CarID = 2, EngineLocation = 'Mid'

Etc.

QUESTION: How do I set up a proper relational database to support this
scheme?

I was thinking I could set up an adjacency table for the attributes.
The conundrum is how to store the attribute data.

Couple possibilities:

===+===+===+===+===+===+===+===+===

Idea #1 - I could set up my CarData table like this:

Table: CarData
    CarID int
    AttributeID int
    AttributeValue nvarchar(100)

But then my attribute values, whether they are numeric, category, or
string, are all getting stored as a string. That seems like a big
disadvantage.

===+===+===+===+===+===+===+===+===

Idea #2 - Alternatively, I could set up my CarData table like this:

Table: CarData
    CarID int
    Doors int
    EngineLocation EngineLocationID (references EngineLocation table)
    DriveWheels DriveWheelsID
    Cylinders int
    EngineArrangement EngineArrangementID (references EngineArrangement
table)
    EngineSize float
    EnginePower int
    EngineTorque int
    TransmissionType TransmissionTypeID (reference TransmissionType
table)
    TransmissionGears int

Everything is now explicitly typed but there's no apparent way to set
up a link between an attribute in the CarAttributes tree and its
corresponding column in the CarData table. For example, say the user is
looking at data on CarID = 1 and clicks on the "Engine/Type/Cylinders"
leaf. Now I need a select statement that looks something like this...

SELECT Cylinders FROM CarData WHERE CarID = 1

....but the only way I can think to get the word "Cylinders" into that
query would be through dynamic code or some hideously ugly CASE
statment, which also seems like a big disadvantage.

===+===+===+===+===+===+===+===+===

So, is there a "right" way to do this so I can have both properly typed
attribute values *and* be able to reference attribute values without
resorting to dynamic queries?

Thanks!

-Dan

Author
3 Mar 2006 1:14 AM
--CELKO--
Have you gotten a copy of TREES & HIERARCHIES IN SQL for various
approaches to this kind of problem?
Author
3 Mar 2006 1:54 AM
A.Cicak
LOL, there is a book about his problem. Daniel, you are lucky!

Show quote
"--CELKO--" <jcelko***@earthlink.net> wrote in message
news:1141348474.658630.45940@v46g2000cwv.googlegroups.com...
> Have you gotten a copy of TREES & HIERARCHIES IN SQL for various
> approaches to this kind of problem?
>
Author
3 Mar 2006 4:43 AM
David Browne
"--CELKO--" <jcelko***@earthlink.net> wrote in message
news:1141348474.658630.45940@v46g2000cwv.googlegroups.com...
> Have you gotten a copy of TREES & HIERARCHIES IN SQL for various
> approaches to this kind of problem?
>


Look at the problem: it's not a hierarchy or tree problem:


CarAttributes
-----Configuration
----------Doors
----------EngineLocation
----------DriveWheels
-----Engine
----------Type
---------------Cylinders
---------------Arrangement
----------Specs
---------------Size
---------------Power
---------------Torque
-----Transmission
----------Type
----------Gears



These are all attributes of a single entity: the car.

Perhaps an Engine is shared among multiple Cars and should be factored into
a seperate table.  But other than that this is just a simple table:

create table CAR
(
  NAME VARCHAR(50) PRMARY KEY,
  DOORS INT CHECK (DOORS < 6),
  ENGINE_LOCATION CHAR(1) CHECK (ENGINE_LOCATION IN ('F','B')),
  DRIVE_WHEELS CHAR(1) CHECK (ENGINE_LOCATION IN ('F','B')),
  ENGINE_CYLINDERS INT CHECK (ENGINE_CYNINDERS < 12),
....

dAVID
Author
3 Mar 2006 6:35 AM
Daniel Manes
Yes, David, they are all attributes of a car, but the attributes
themselves are arranged in a hierarchy. The table you are proposing
(which is basically like my Idea #2) cannot be linked to the hierarchy.
At least, not as far as I can tell.

See what I'm getting at?

Thanks,

-Dan
Author
3 Mar 2006 2:13 PM
David Browne
"Daniel Manes" <danth***@cox.net> wrote in message
news:1141367733.531758.185340@e56g2000cwe.googlegroups.com...
> Yes, David, they are all attributes of a car, but the attributes
> themselves are arranged in a hierarchy. The table you are proposing
> (which is basically like my Idea #2) cannot be linked to the hierarchy.
> At least, not as far as I can tell.
>
> See what I'm getting at?
>

I think so.  If so the relational model just doesn't quite get you there.
The relational model doesn't have support for the notion of attribute
groupings on an entity.  You can get a bit of that by factoring the entity
into smaller, related entities, but that's just a hack.

You will have to supply and store that additional metadata yourself.

I would probably use a single flat table, and also provide an XSD schema
which mapped to the table and provided the hierarchicial relationships among
the attributes.

Or adopt a naming convention for your columns which embeds the attribute
relationships

eg


SELECT ENGINE_TYPE_CYLINDERS FROM CarData WHERE CarID = 1

David
Author
3 Mar 2006 5:36 PM
Daniel Manes
Hi David,

Knowing that I'm not missing some really obvious relational concept is
actually a relief. I think I'm going to go this route:

1. Put the attribute data in one flat table (e.g., Car).
2. Put the attribute names (which will correspond to the column names
in the Car table) and hierarchy information in an adjacency table
(e.g., CarAttribute).

If my app needs data on a particular car, I'll just write a query to
grab the whole row (all the attributes) and another query to grab the
particular column names that should be displayed. Then I'll write code
in my .NET app to handle the rest.

Thanks for the help,

-Dan
Author
5 Mar 2006 4:07 AM
--CELKO--
>> Look at the problem: it's not a hierarchy or tree problem: <<

Yes, it is.  It is called a parts explosion or Bill of Materials.  Get
the book and use teh code for this.  I think that you might be thinking
in OO terms and not RDBMS.
Author
5 Mar 2006 7:15 AM
David Browne
"--CELKO--" <jcelko***@earthlink.net> wrote in message
news:1141531629.790826.299470@p10g2000cwp.googlegroups.com...
>>> Look at the problem: it's not a hierarchy or tree problem: <<
>
> Yes, it is.  It is called a parts explosion or Bill of Materials.  Get
> the book and use teh code for this.  I think that you might be thinking
> in OO terms and not RDBMS.
>

The example chosen (Car) might be mistaken for a Bill of Materials problem.
But I think it's a typical entity model, but with the the entity attributes
"grouped" for the UI.

In a Bill of Materials problem an entity is composed of other entities, each
of which is composed of other entities, etc.  Moreover all the entities
belong to a single domain (eg Parts).  Here there's only one entity and a
fixed list of attributes.  The problem concerns how to group related
attributes.

<quote>
Basically, we have a long list of attributes. We've decided to arrange them
hierarchically so
users can find them more easily.
</quote>

Some entities have a lot of attributes.  Sometimes there are logical
groupings of attributes.  For instance a Person may have Height, Weight,
Age, Income, NumberOfPets.  Height, Weight and Age are all demographic
atributes, and it would sometimes be convenient to group them together.
They shouldn't be extracted into a seperate PersonDemographicData table, but
short of that the relational model doesn't help.

David
Author
6 Mar 2006 8:30 PM
Dan Manes
David,

Exactly. With a bill of materials, each item in the hierarchy would be
a "part," and each part would have, say, a part number, a name, a
weight, and a color.

While a car (which I'm just using as a proxy for explanatory purposes)
does have many systems and subsystems, my problem is different from a
bill of materials because each attribute in the hierarchy does *not*
follow the same template. At all.

One attribute may be a floating point number while another is a string
and another is a reference to another table, etc. They're just column
names, and because there are so many of them, the interface is going to
categorize them and present them as a hierarchy so users can find them
more easily. If that can't be fully represented in a relational
database, I'll simply have to make the necessary adjustments in the
CLR.

BTW, if CELKO's book really covers this issue, that would be nice to
know, but if his book explanations are written anything like what he
just wrote on
http://blogs.msdn.com/anthonybloesch/archive/2006/02/15/Hierarchies_in_SQL_Server_2005.aspx
, I'll pass :)
Author
3 Mar 2006 6:32 AM
Daniel Manes
Actually, Joe, I haven't, but I did read this...

http://blogs.msdn.com/anthonybloesch/archive/2006/02/15/Hierarchies_in_SQL_Server_2005.aspx

....which seems to suggest that a good ol' adjacency table is just fine.
Besides being simple, it apparently runs a lot faster than the "Nested
Sets" technique.

My question, however, is not how to implement/traverse a hierarchy.
It's more about how to store the actual attribute data. If I get to a
particular leaf of the hierarchy, how do I query that particular column
in the table?

Be great to get your take on this...I hope you'll give my post a read
sometime.

Thanks,

-Dan
Author
6 Mar 2006 3:02 AM
--CELKO--
>> which seems to suggest that a good ol' adjacency table is just fine.Besides being simple, it apparently runs a lot faster than the "Nested Sets" technique. <<

I took atr quick peek at the blog and I will post to it for details on
tomorrow.

He seemed to be measring loading time, without telling us the fomat of
the data. How many thoudsand summary queries in a tree of depth 5+ did
he do?  etc.  What did he do to prevent cycles in the adjacency list
model? etc.

iI you do the adjacency list model doen right, you have to insert
(in_node, ouit_node) pairs and then check the ENTIRE schema for cycles
with a TRIGGER every time.  Then you have to look for gaps to assure it
is one tree.

This is not an option -- it is vital.  I am waiting fo r $1000/day
contract to come thru with an insurance company that forgot these
constraints.  Opps!!

AddThis Social Bookmark Button