Home All Groups Group Topic Archive Search About

Implementing natural keys

Author
27 May 2005 5:37 PM
Chris Lieb
After reading a thread about the 'merits' of natural keys, I decided that I
would play around with them a little.  A database that I am working on needs
to have the ability to repopulate some of its tables from a flat file.  By
doing this, however, the foreign keys that exist in other tables would all be
invalidated.  I need a way to get around this problem.

I have a table (CITTAS_Primary) that has a foreign key to a table
(CITTAS_OriginCityState) that contains three columns: identity/PK, City, and
State.  I decided that the best way to put a natural key on this table would
be to use a combination of City and State.  This was easy enough to change. 
However, when I went to establish the foreign key in CITTAS_Primary, I wasn't
able to link my natural key to the existing int field.  What data type would
the field have to be to reference CITTAS_OriginCityState?  If I had to create
a city and state field, it would almost be pointless to have the extra
tables, wouldn't it?

Thanks in advance.

--
Chris Lieb
UPS CACH, Hodgekins, IL
Tech Support Group - Systems/Apps

Author
27 May 2005 5:58 PM
Thomas Coleman
Either you would have to make an amalgamated key (SOMEPLACECA for Someplace, CA)
or use both columns as the PK and link to both columns for all child
relationships. (Isn't that fun? ;->) Of course, if you want to associate a zip
code with that city and state you will run into problems. There are numerous
situations where multiple zip codes are in the same city.


Thomas


Show quoteHide quote
"Chris Lieb" <ChrisL***@discussions.microsoft.com> wrote in message
news:D138B4E3-7359-4844-B73D-28F69403085F@microsoft.com...
> After reading a thread about the 'merits' of natural keys, I decided that I
> would play around with them a little.  A database that I am working on needs
> to have the ability to repopulate some of its tables from a flat file.  By
> doing this, however, the foreign keys that exist in other tables would all be
> invalidated.  I need a way to get around this problem.
>
> I have a table (CITTAS_Primary) that has a foreign key to a table
> (CITTAS_OriginCityState) that contains three columns: identity/PK, City, and
> State.  I decided that the best way to put a natural key on this table would
> be to use a combination of City and State.  This was easy enough to change.
> However, when I went to establish the foreign key in CITTAS_Primary, I wasn't
> able to link my natural key to the existing int field.  What data type would
> the field have to be to reference CITTAS_OriginCityState?  If I had to create
> a city and state field, it would almost be pointless to have the extra
> tables, wouldn't it?
>
> Thanks in advance.
>
> --
> Chris Lieb
> UPS CACH, Hodgekins, IL
> Tech Support Group - Systems/Apps
Are all your drivers up to date? click for free checkup

Author
27 May 2005 8:55 PM
Chris Lieb
I don't have to worry about zip codes since the cities are just locations of
our hubs.  This is part of a tracking system that we use to track flat cars
as they travel between hubs over rail.

Chris

Show quoteHide quote
"Thomas Coleman" wrote:

> Either you would have to make an amalgamated key (SOMEPLACECA for Someplace, CA)
> or use both columns as the PK and link to both columns for all child
> relationships. (Isn't that fun? ;->) Of course, if you want to associate a zip
> code with that city and state you will run into problems. There are numerous
> situations where multiple zip codes are in the same city.
>
>
> Thomas
>
>
> "Chris Lieb" <ChrisL***@discussions.microsoft.com> wrote in message
> news:D138B4E3-7359-4844-B73D-28F69403085F@microsoft.com...
> > After reading a thread about the 'merits' of natural keys, I decided that I
> > would play around with them a little.  A database that I am working on needs
> > to have the ability to repopulate some of its tables from a flat file.  By
> > doing this, however, the foreign keys that exist in other tables would all be
> > invalidated.  I need a way to get around this problem.
> >
> > I have a table (CITTAS_Primary) that has a foreign key to a table
> > (CITTAS_OriginCityState) that contains three columns: identity/PK, City, and
> > State.  I decided that the best way to put a natural key on this table would
> > be to use a combination of City and State.  This was easy enough to change.
> > However, when I went to establish the foreign key in CITTAS_Primary, I wasn't
> > able to link my natural key to the existing int field.  What data type would
> > the field have to be to reference CITTAS_OriginCityState?  If I had to create
> > a city and state field, it would almost be pointless to have the extra
> > tables, wouldn't it?
> >
> > Thanks in advance.
> >
> > --
> > Chris Lieb
> > UPS CACH, Hodgekins, IL
> > Tech Support Group - Systems/Apps
>
>
>
Author
27 May 2005 6:00 PM
--CELKO--
>> I decided that the best way to put a natural key on this table would be to use a combination of City and State. <<

No problem; a key can be mutli-column:

CREATE TABLE CITTAS_OriginCityState
(city_name CHAR(20) NOT NULL,
state_code CHAR(2) NOT NULL,
PRIMARY KEY (city_name, state_code),
...);

CREATE TABLE CITTAS_Primary
( ..
city_name CHAR(20) NOT NULL,
state_code CHAR(2) NOT NULL,
FOREIGN KEY (city_name, state_code)
   REFERENCES CITTAS_Primary (city_name, state_code)
   ON UPDATE CASCADE
   ON DELETE CASCADE,
..);

>> If I had to create a city and state field [sic], it would almost be
pointless to have the extra tables, wouldn't it? <<

Maybe, but we don't know what your data model looks like.  What makes
"CITTAS_OriginCityState" logically different entities from
"CITTAS_Primary"?
Or is this a matter of something called a "CITTAS" being in different
states or status?

For example, you would not have a "MalePersonnel" and a
"FemalePersonnel" table; you would a "gender" column instead.
Personnel are the same kind of entities.

You would not have a "Personnel_or_Automobile" table. People and
automobiles are clearly different.
Author
27 May 2005 8:51 PM
Chris Lieb
CITTAS is just the name of the system being deployed, to keep the tables from
getting mixed up with the other tables in the DB.  The system is used to
track a trailer from its origin to CACH to its destination.  The
CITTAS_OriginCityState table is a table of the hubs that we receive trailers
from, and, for example, CITTAS_DestinationCityState is a list of hubs that we
send trailers to.  The CITTAS_Primary table links all of these pieces
together under a unique string.  The downloads that we get are matched
against this key string to get the data about each trailer.  For instance,
the download says that a trailer Q29 passed through hub X yesterday and was
delayed due to mechanical problems.  When we get the download, we match Q29s
unique key against CITTAS_Primary to retrieve the origin and destination of
the trailer among other things.  Then, we can inform other hubs down the line
that the trailer will not arrive or change sort schedules so we don't have
people standing around wailting for a trailer that won't be arriving today.

Hope this helps

Chris

Show quoteHide quote
"--CELKO--" wrote:

> >> I decided that the best way to put a natural key on this table would be to use a combination of City and State. <<
>
> No problem; a key can be mutli-column:
>
> CREATE TABLE CITTAS_OriginCityState
> (city_name CHAR(20) NOT NULL,
>  state_code CHAR(2) NOT NULL,
>  PRIMARY KEY (city_name, state_code),
>  ...);
>
> CREATE TABLE CITTAS_Primary
> ( ..
>  city_name CHAR(20) NOT NULL,
>  state_code CHAR(2) NOT NULL,
>  FOREIGN KEY (city_name, state_code)
>    REFERENCES CITTAS_Primary (city_name, state_code)
>    ON UPDATE CASCADE
>    ON DELETE CASCADE,
>  ..);
>
>  >> If I had to create a city and state field [sic], it would almost be
> pointless to have the extra tables, wouldn't it? <<
>
> Maybe, but we don't know what your data model looks like.  What makes
> "CITTAS_OriginCityState" logically different entities from
> "CITTAS_Primary"?
> Or is this a matter of something called a "CITTAS" being in different
> states or status?
>
> For example, you would not have a "MalePersonnel" and a
> "FemalePersonnel" table; you would a "gender" column instead.
> Personnel are the same kind of entities.
>
> You would not have a "Personnel_or_Automobile" table. People and
> automobiles are clearly different.
>
>
Author
27 May 2005 8:51 PM
Chris Lieb
An idea just came to mind.  To prevent having the actual values in
CITTAS_Primary and having to add extra fields to the child tables to handle
the multiple keys, could I generate a hash of the data (City + State) and
place it in the 'ID' field.  By doing this, The number of fields in each
table stays the same and, at the same time, I still have a reason to have the
parent tables.  Since the data in this table should not be altered other that
to be repopulated, Users would not have to worry about entering the hashes
themselves and programmers could treat the 'ID' fields normally instead of
having to do multiple JOINs to relate the data in queries.  Is there a way to
do this in TSQL?  It would not have to be an industrial-strength hash like
MD5 or SHA1, just maybe something like CRC.  (Ok, maybe CRC isn't actually a
hash, but it would do the job, I do believe.)  Unfortunately, I do not have
permissions to use extended stored procedures, so that is not a viable
solution.  I figure that a UDF or SP might be the path that I have to go, but
I am sure that there are probably hashing functions out there.

Chris

Show quoteHide quote
"Chris Lieb" wrote:

> After reading a thread about the 'merits' of natural keys, I decided that I
> would play around with them a little.  A database that I am working on needs
> to have the ability to repopulate some of its tables from a flat file.  By
> doing this, however, the foreign keys that exist in other tables would all be
> invalidated.  I need a way to get around this problem.
>
> I have a table (CITTAS_Primary) that has a foreign key to a table
> (CITTAS_OriginCityState) that contains three columns: identity/PK, City, and
> State.  I decided that the best way to put a natural key on this table would
> be to use a combination of City and State.  This was easy enough to change. 
> However, when I went to establish the foreign key in CITTAS_Primary, I wasn't
> able to link my natural key to the existing int field.  What data type would
> the field have to be to reference CITTAS_OriginCityState?  If I had to create
> a city and state field, it would almost be pointless to have the extra
> tables, wouldn't it?
>
> Thanks in advance.
>
> --
> Chris Lieb
> UPS CACH, Hodgekins, IL
> Tech Support Group - Systems/Apps
Author
27 May 2005 8:57 PM
Alex Papadimoulis
> An idea just came to mind.  To prevent having the actual values in
> CITTAS_Primary and having to add extra fields to the child tables to handle
> the multiple keys, could I generate a hash of the data (City + State) and
> place it in the 'ID' field.  [...]

It's hard to tell ... but I believe you may have actually thought of a
*worse* solution than going with an IDENTITY. You need to seriously get a
grasp of relational theory before touching RDBMS. Honestly. Take the
opportunity this weekend to go to the library and get a book on the topic.

> By doing this, The number of fields in each
> table stays the same and, at the same time, I still have a reason to have the
> parent tables.

Another advantage to reading up on relational theory is that you will not
sound like at total newbie.  In relational databases, there are no fields
(only columns) or parent/child tables (only referenced/referencing). It may
seem like stupid semantics now, but it's about as bad as saying there's no
difference between a "class" and an "object" to an OO guy.

> Since the data in this table should not be altered other that
> to be repopulated, Users would not have to worry about entering the hashes
> themselves and programmers could treat the 'ID' fields normally instead of
> having to do multiple JOINs to relate the data in queries. 

Ok -- there's not even a point to explain why this is such a bad idea. Views
(virtual tables) are used to ease complex multi-join queries. Think of them
as a "layer of abstraction".

> Unfortunately, I do not have permissions to use extended stored procedures

I couldn't even immagine the mess you would create if you had such access.
There is some hope.

Author
27 May 2005 8:58 PM
Chris Lieb
First off, I recall reading somewhere about the seven rules of relational
databases, or something like that, that were set out by one of the pioneers
in the field.  One of the rules was that ALL primary keys should be
abstracted from the data in their rows.  (Sorry if the language is wrong, but
you already managed to point out the fact that I am a real n00b to relational
databases.)  I kind of have to break that rule so that when I reconstruct the
CITTAS_Primary table, all of the tables that reference it will not be
invalidated, which is what would happen if I used abstract keys.  At the same
time, I will not be seeing this app through to the front-end development (or
even the completion of the back-end, for that matter), so I don't want to
complicate all of the relations and force the new programmer to spend weeks
trying to understand it.  Using abstract IDs is pretty common here and I
don't want my IDs to act differently.  Also, I don't see why hashes would be
worse than abstract keys.  They are basically combining the data that I would
have built into multiple keys into a single, easy to use key.

Chris

Aside: I really take no offence to someone using object and class
interchengably.  After all, in most OO languages that I have encountered, all
classes inherit from Object, so any class that is created is a child of
Object. 

Background: This is my first DB project outside of Access, and it happens to
be quite a large and important one.  This project has already had two
seperate programmers work on it, and neither of them are around anymore.  I
am basically learning the whole relational model as I go.  I actually am
employed as a co-op and am currently a sophomore in college.  I am not trying
to excuse my ignorance, just explain my complete lack of experience.  After
all, how many teenagers grow up with a full blown DB server in their house
for them to play with?  Not I.  I occasionaly mess around with MySQL over the
weekends, but not too often since I have to run it on my only computer, which
really slows down under the load.  Even then, about all that I can get from
it are DB structure and basic SQL syntax.
Author
27 May 2005 9:08 PM
Alex Papadimoulis
Chris,

> First off, I recall reading somewhere about the seven rules of relational
> databases, or something like that, that were set out by one of the pioneers
> in the field.  One of the rules was that ALL primary keys should be
> abstracted from the data in their rows.

I'm going guess that you were refering to the Access Rule (#2) of Dr. Codd's
12 Rules of Relational Database Management Systems. And you got some of the
words right, but have a totally wrong impression of it. A more accurate
characterization would be that primary keys *are* made up of columns within
the row -- not abstracted from them. A ID/GUID/Hash/etc are logically the
same as a physical locator (pointer) and violate this rule.

> I kind of have to break that rule so that when I reconstruct the
> CITTAS_Primary table, all of the tables that reference it will not be
> invalidated, which is what would happen if I used abstract keys.
> [...]
> Using abstract IDs is pretty common here and I
> don't want my IDs to act differently.

Well, you're not in a good position to change things. Considering your
circumstance, I think you'd be best just going with what they have.

> Also, I don't see why hashes would be
> worse than abstract keys.  They are basically combining the data that I would
> have built into multiple keys into a single, easy to use key.

Their both bad at the logical level (and are pretty much the same). No one
will understand the hashes; they will understand IDENTITYs. If I had
PRINCIPLES OF PROGRAM DESIGN (1975) handy, I'd pull out a quote talking about
a programmers' tendancy to want to write unecessarily complex things to
satisfy a professional challenge.  Needless complexity is also a sign of a
newbiew/bad programmer. Remember that and you will do a lot better as you
advance in your career.


> Aside: I really take no offence to someone using object and class
> interchengably. 

They are fundamentally different concepts. One is an instance and the other
is a definition. Nothing to be offended by, but it's still wrong to use them
interchangably. Same goes with fields vs. columns.

> I am basically learning the whole relational model as I go.  I actually am
> employed as a co-op and am currently a sophomore in college.  I am not trying
> to excuse my ignorance, just explain my complete lack of experience.

That is a good excuse for ignorance; you're in the process of learning. Most
people who post to newsgroups are seeking hacks for their horrendous deisgn.
No less, you still should take the time to read a book on the topic. If
nothign else, it'll help you with your "A" in DATABASE 101.

> I occasionaly mess around with MySQL over the weekends

Note that comparing MySQL to any other real database is like comparing a
lawnmower engine with a car engine.  Pick up a copy of SQL Server developer
edition. It's a full-blown verison that retails for $50; you can grab it on
fleabay for like $5. I think i've even got 4 or five copies. Your school
should have it for like $10.

Author
27 May 2005 8:51 PM
Alex Papadimoulis
> After reading a thread about the 'merits' of natural keys, I decided that I
> would play around with them a little.

Welcome to the world of relational data.

> I have a table (CITTAS_Primary) that has a foreign key to a table
> (CITTAS_OriginCityState) that contains three columns: identity/PK, City, and
> State.  I decided that the best way to put a natural key on this table would
> be to use a combination of City and State. 

You're halfway there: a table describes a set and should be plural, not
singular.

> What data type would
> the field have to be to reference CITTAS_OriginCityState? 

A FK will need to be the same as the referencing PK. So, you would need to
have a City + State column.

> If I had to create
> a city and state field, it would almost be pointless to have the extra
> tables, wouldn't it?

This is one of the things oponents of the relational model said back in the
70's. As it turns out, it works much better than having a pointer (artificial
INT pk). To undestand why, you need to read a book on relational theory. I
suggest Date's INTRODUCTION TO DATABASE SYSTEMS - you can get an old edition
for a few bucks. Celko's DATA & DATABASES would be my second choice.

Author
27 May 2005 9:09 PM
David Portas
One thing you seem to be misunderstanding is that although every table
should have a Natural key, that key doesn't have to be used as the foreign
key in other tables. Often you will want to create a surrogate key as well
to be used as a reference for foreign keys.

No reason why you can't use a code as an intelligent surrogate key for each
city/state. There are plenty of schemes that exist for these things. In the
freight business there are surely directories of codes for these locations
already in use? If you don't find a suitable scheme then invent one of your
own.

--
David Portas
SQL Server MVP
--
Author
27 May 2005 10:00 PM
Chris Lieb
The way that I see it is that I need to use natural keys to keep from
invalidating other tables in the DB when I repopulate this group of tables. 
After, there is no guarantee otherwise that Atlanta, GA will always have an
ID of 4.  After I repopulate the tables, Atlanta, GA might even drop off of
the list.

As for the intelligent surrogate, each hub does have a four digit ID number
called a SLIC.  Unfortunately, not all of the entries are consistent.  For
example, in the entries key, it might have the SLIC for Cleveland, OH as the
origin, but Pittsburgh, PA will be in the Origin field.  Most of the records
don't have this problem, but the number is signifigant enough that it does
create a problem.  If it wasn't for that, this problem would be a lot easier
to solve.

Chris

Show quoteHide quote
"David Portas" wrote:

> One thing you seem to be misunderstanding is that although every table
> should have a Natural key, that key doesn't have to be used as the foreign
> key in other tables. Often you will want to create a surrogate key as well
> to be used as a reference for foreign keys.
>
> No reason why you can't use a code as an intelligent surrogate key for each
> city/state. There are plenty of schemes that exist for these things. In the
> freight business there are surely directories of codes for these locations
> already in use? If you don't find a suitable scheme then invent one of your
> own.
>
> --
> David Portas
> SQL Server MVP
> --
>
>
>
Author
27 May 2005 10:19 PM
Steve Kass
It sounds to me like your problems have nothing at all to
do with the fact that you are using artificial keys.  Your
problem is that you aren't keeping track of what they
mean anywhere, and you are storing redundant information
in your tables.

I don't know what "repopulate" means, but if Atlanta, GA
has an (artificial) ID of 4, then you should have a table
keyed on the ID where Atlanta, GA and 4 are in the same
row.  You should never have to "repopulate" this table.

As for your row that contains the SLIC for Cleveland in one
column and the name Pittsburgh in another, why are you
storing redundant origin information?  Store the SLIC, and
when you need the name, get it from the table of SLICs.
Sure, the SLIC might be wrong, if someone entered
Pittsburgh incorrectly, when Cleveland was the right value,
but your problem is redundancy.  No fact (like "the Origin
is Cleveland, OH") should be stored more than once in
your database.  You can store it in many ways, but pick
just one, to avoid the trouble you are in now.

Steve Kass
Drew University

Chris Lieb wrote:

Show quoteHide quote
>The way that I see it is that I need to use natural keys to keep from
>invalidating other tables in the DB when I repopulate this group of tables. 
>After, there is no guarantee otherwise that Atlanta, GA will always have an
>ID of 4.  After I repopulate the tables, Atlanta, GA might even drop off of
>the list.
>
>As for the intelligent surrogate, each hub does have a four digit ID number
>called a SLIC.  Unfortunately, not all of the entries are consistent.  For
>example, in the entries key, it might have the SLIC for Cleveland, OH as the
>origin, but Pittsburgh, PA will be in the Origin field.  Most of the records
>don't have this problem, but the number is signifigant enough that it does
>create a problem.  If it wasn't for that, this problem would be a lot easier
>to solve.
>
>Chris
>
>"David Portas" wrote:
>

>
>>One thing you seem to be misunderstanding is that although every table
>>should have a Natural key, that key doesn't have to be used as the foreign
>>key in other tables. Often you will want to create a surrogate key as well
>>to be used as a reference for foreign keys.
>>
>>No reason why you can't use a code as an intelligent surrogate key for each
>>city/state. There are plenty of schemes that exist for these things. In the
>>freight business there are surely directories of codes for these locations
>>already in use? If you don't find a suitable scheme then invent one of your
>>own.
>>
>>--
>>David Portas
>>SQL Server MVP
>>--
>>
>>
>>
>>   
>>

Bookmark and Share