Home All Groups Group Topic Archive Search About

Another case for IDENTITY

Author
16 Sep 2005 2:56 PM
Mark White
Not to keep beating a dead horse, oh why not.

DDL:

DDL A:
Survey(
InternalId int IDENTITY PRIMARY KEY,
RespNo int,
RespKey nvarchar(16),
Qtr char(4),
CntyCod smallint,
ImportDate datetime not null default getDate())

SurveyAnswers(
InternalId int
    FK,
QuestionId int
    FK to Question table,
Answer nvarchar(72) not null,
CONSTRAINT PKAnswer PRIMARY KEY CLUSTERED (InternalId, QuestionId)

Conversly, the same tables above for non-IDENTITY folk:
DDL B:
Survey(
RespNo int,
RespKey nvarchar(16),
Qtr char(4),
CntyCod smallint,
ImportDate datetime not null default getDate(),
CONSTRAINT PkSurvey PRIMARY KEY
CLUSTERED (RespNo, RespKey, Qtr, CntyCod))

SurveyAnswers(
RespNo int,
RespKey nvarchar(16),
Qtr char(4),
CntyCod smallint,
QuestionId int not null
    FK to Question table,
Answer nvarchar(72) not null,
CONSTRAINT PkSurvey PRIMARY KEY
CLUSTERED (RespNo, RespKey, Qtr, CntyCod, QuestionId))

Suppose I need to extract a subset of data using a constrained column:
select answer from SurveyAnswer
where InternalId in (Select InternalId from Survey
where CntyCod = 35)

Of course this is an easy join also:
select a.Answer from SurveyAnswer a
join Survey s on a.InternalId = s.InternalId
where s.CntyCod = 35

How would you construct the non-joining query to extract the same
information using DDL B?  How about the speed of that query compared to the
above?

Author
16 Sep 2005 3:35 PM
David Portas
Use EXISTS instead of IN. EXISTS performs better than IN in many cases,
although this case is a pretty contrived example, given the join
solution.

Either the pseudo-DDL and/or your table designs are flawed anyway.
Apparently RespNo, RespKey, Qtr, CntyCod are NULL in one case and not
in the other. So where is the alternate key in A? Why would you make
IDENTITY the only key of a table?

--
David Portas
SQL Server MVP
--
Author
16 Sep 2005 3:41 PM
Mark White
No, they're both null.  Where do you see the difference?

Also, there is a Unique index on Survey in DDL A, forgot to include that.

How is it flawed, please describe.  What other key needs to be included?
There is no alternate.

Show quote
"David Portas" <REMOVE_BEFORE_REPLYING_dpor***@acm.org> wrote in message
news:1126884951.795754.65850@f14g2000cwb.googlegroups.com...
> Use EXISTS instead of IN. EXISTS performs better than IN in many cases,
> although this case is a pretty contrived example, given the join
> solution.
>
> Either the pseudo-DDL and/or your table designs are flawed anyway.
> Apparently RespNo, RespKey, Qtr, CntyCod are NULL in one case and not
> in the other. So where is the alternate key in A? Why would you make
> IDENTITY the only key of a table?
>
> --
> David Portas
> SQL Server MVP
> --
>
Author
16 Sep 2005 3:55 PM
David Portas
> No, they're both null.

Wrong. In B the PK of Survey is (RespNo, RespKey, Qtr, CntyCod), which
means they are implicitly NOT NULL. Not so in A where they are NULL by
default.

> How is it flawed, please describe.  What other key needs to be included?
> There is no alternate.

If there is no alternate key then you have redundant data and
consequent loss of integrity. Even if you don't value integrity, why
would you want to store duplicate data?

--
David Portas
SQL Server MVP
--
Author
16 Sep 2005 4:02 PM
Mark White
> Wrong. In B the PK of Survey is (RespNo, RespKey, Qtr, CntyCod), which
> means they are implicitly NOT NULL. Not so in A where they are NULL by
> default.

No, sorry, you misunderstood.  I meant that they have explicit definitions
in the "real" DDL, not the DDL I posted.

> If there is no alternate key then you have redundant data and
> consequent loss of integrity.

Where is the redundant data you describe?  In SurveyAnswers?

Show quote
"David Portas" <REMOVE_BEFORE_REPLYING_dpor***@acm.org> wrote in message
news:1126886155.884526.127020@g44g2000cwa.googlegroups.com...
> > No, they're both null.
>
> Wrong. In B the PK of Survey is (RespNo, RespKey, Qtr, CntyCod), which
> means they are implicitly NOT NULL. Not so in A where they are NULL by
> default.
>
> > How is it flawed, please describe.  What other key needs to be included?
> > There is no alternate.
>
> If there is no alternate key then you have redundant data and
> consequent loss of integrity. Even if you don't value integrity, why
> would you want to store duplicate data?
>
> --
> David Portas
> SQL Server MVP
> --
>
Author
16 Sep 2005 4:10 PM
Mark White
You are correct though in the pseudo-DDL (DDL B).
In fact, that reinforces my thoughts on using IDENTITY.   RespNo, RespKey,
Qtr, CntyCod all can be null (all will have at least one not null).  I
unfortunately have no control of those columns being null.

Show quote
"Mark White" <markw2***@yahoo.com> wrote in message
news:uuOT6ftuFHA.3000@TK2MSFTNGP12.phx.gbl...
> > Wrong. In B the PK of Survey is (RespNo, RespKey, Qtr, CntyCod), which
> > means they are implicitly NOT NULL. Not so in A where they are NULL by
> > default.
>
> No, sorry, you misunderstood.  I meant that they have explicit definitions
> in the "real" DDL, not the DDL I posted.
>
> > If there is no alternate key then you have redundant data and
> > consequent loss of integrity.
>
> Where is the redundant data you describe?  In SurveyAnswers?
>
> "David Portas" <REMOVE_BEFORE_REPLYING_dpor***@acm.org> wrote in message
> news:1126886155.884526.127020@g44g2000cwa.googlegroups.com...
> > > No, they're both null.
> >
> > Wrong. In B the PK of Survey is (RespNo, RespKey, Qtr, CntyCod), which
> > means they are implicitly NOT NULL. Not so in A where they are NULL by
> > default.
> >
> > > How is it flawed, please describe.  What other key needs to be
included?
> > > There is no alternate.
> >
> > If there is no alternate key then you have redundant data and
> > consequent loss of integrity. Even if you don't value integrity, why
> > would you want to store duplicate data?
> >
> > --
> > David Portas
> > SQL Server MVP
> > --
> >
>
>
Author
16 Sep 2005 4:21 PM
David Portas
> No, sorry, you misunderstood.  I meant that they have explicit definitions
> in the "real" DDL, not the DDL I posted.

I misunderstood? Sorry about that. Obviously I should have known that I
was expected to refer to my crystal ball... :-<

> Where is the redundant data you describe?

Both tables if they don't have an alternate key. Consider:

CREATE TABLE Recipe (x INTEGER IDENTITY PRIMARY KEY, item VARCHAR(10)
NOT NULL /* No key!! */) ;

INSERT INTO recipe (item) VALUES ('Egg') ;
INSERT INTO recipe (item) VALUES ('Egg') ;
INSERT INTO recipe (item) VALUES ('Egg') ;

SELECT x, item
FROM Recipe ;

See the redundancy now? If duplicates have meaning then add a Quantity
column instead, like:

CREATE TABLE Recipe (x INTEGER IDENTITY PRIMARY KEY, item VARCHAR(10)
NOT NULL UNIQUE, qty INTEGER NOT NULL CHECK (qty>0)) ;

If not, then just add the constraint and drop the duplicates. Anything
else means your logical model breaks.

--
David Portas
SQL Server MVP
--
Author
16 Sep 2005 4:27 PM
David Portas
I missed the INSERT from my last example. Here it is:

INSERT INTO recipe (item, qty) VALUES ('Egg', 3) ;

Hopefully that was obvious anyway. The point is the extra UNIQUE
constraint.

--
David Portas
SQL Server MVP
--
Author
16 Sep 2005 4:34 PM
Mark White
Yes it was.  As mentioned though, there is a Unique constraint.

Show quote
"David Portas" <REMOVE_BEFORE_REPLYING_dpor***@acm.org> wrote in message
news:1126888069.115632.141480@o13g2000cwo.googlegroups.com...
> I missed the INSERT from my last example. Here it is:
>
> INSERT INTO recipe (item, qty) VALUES ('Egg', 3) ;
>
> Hopefully that was obvious anyway. The point is the extra UNIQUE
> constraint.
>
> --
> David Portas
> SQL Server MVP
> --
>
Author
16 Sep 2005 4:33 PM
Mark White
lol, sorry again.  I haven't described it the best in this post, my
apologies.

Also, in my second post, I mentioned there Unique covers RespNo, RespKey,
Qtr, CntyCod. This was my fault for trying to shorten the DDL.
> INSERT INTO recipe (item) VALUES ('Egg') ;
> INSERT INTO recipe (item) VALUES ('Egg') ;
> INSERT INTO recipe (item) VALUES ('Egg') ;
The second and third insert will fail.

Now, knowing there is a Unique constraint on Survey, where is the redundant
data?

Thanks for responding to the post :)

Show quote
"David Portas" <REMOVE_BEFORE_REPLYING_dpor***@acm.org> wrote in message
news:1126887716.698652.108130@o13g2000cwo.googlegroups.com...
> > No, sorry, you misunderstood.  I meant that they have explicit
definitions
> > in the "real" DDL, not the DDL I posted.
>
> I misunderstood? Sorry about that. Obviously I should have known that I
> was expected to refer to my crystal ball... :-<
>
> > Where is the redundant data you describe?
>
> Both tables if they don't have an alternate key. Consider:
>
> CREATE TABLE Recipe (x INTEGER IDENTITY PRIMARY KEY, item VARCHAR(10)
> NOT NULL /* No key!! */) ;
>
> INSERT INTO recipe (item) VALUES ('Egg') ;
> INSERT INTO recipe (item) VALUES ('Egg') ;
> INSERT INTO recipe (item) VALUES ('Egg') ;
>
> SELECT x, item
>  FROM Recipe ;
>
> See the redundancy now? If duplicates have meaning then add a Quantity
> column instead, like:
>
> CREATE TABLE Recipe (x INTEGER IDENTITY PRIMARY KEY, item VARCHAR(10)
> NOT NULL UNIQUE, qty INTEGER NOT NULL CHECK (qty>0)) ;
>
> If not, then just add the constraint and drop the duplicates. Anything
> else means your logical model breaks.
>
> --
> David Portas
> SQL Server MVP
> --
>
Author
16 Sep 2005 4:52 PM
David Portas
> INSERT INTO recipe (item) VALUES ('Egg') ;
> INSERT INTO recipe (item) VALUES ('Egg') ;
> INSERT INTO recipe (item) VALUES ('Egg') ;
> The second and third insert will fail.

Huh? You obviously didn't try out what I posted. IDENTITY does NOT
prevent duplicates.

> Now, knowing there is a Unique constraint on Survey, where is the redundant
> data?

OK. That wasn't in your original post though. However, you still don't
have a key because you already told us those columns are nullable and
we know that keys must not be. Now SQL Server violates SQL standards,
the relational model and common sense by treating NULLs as equal values
in a UNIQUE index. So given that peculiarity, you *could* theoretically
write a lot of very ropey and inefficient TSQL that would treat this as
a "real" key - by which I mean, you could theoretically reference that
column every time with code that treats NULLs as equal (like
COALESCE(x,'foo')=COALESCE(x,'foo')). You'd have to be pretty
masochistic to attempt that though! Putting that aside, almost any
reasonable person would consider your table to be denormalized because
it lacks a logical key - remembering that IDENTITY isn't part of the
logical model at all.

--
David Portas
SQL Server MVP
--
Author
16 Sep 2005 5:01 PM
Mark White
Guilty.  AND I misread.

The fact that any of those four columns can be null is bad design on the
accepted data (from a system out of my hands, I wish it was), not my design.
If the acceptable data provided a logical key, obviously all of this would
be unneeded.  The point of this is when there isn't one logical key.

Also, even if all four columns (RespNo, RespKey, Qtr, CntyCod) didn't accept
null, I would still create an Identity column.  Is it not far more efficient
to join on a 4-byte column, or four column key which contains nvarchar data?
Show quote
"David Portas" <REMOVE_BEFORE_REPLYING_dpor***@acm.org> wrote in message
news:1126889576.077806.289090@g14g2000cwa.googlegroups.com...
> > INSERT INTO recipe (item) VALUES ('Egg') ;
> > INSERT INTO recipe (item) VALUES ('Egg') ;
> > INSERT INTO recipe (item) VALUES ('Egg') ;
> > The second and third insert will fail.
>
> Huh? You obviously didn't try out what I posted. IDENTITY does NOT
> prevent duplicates.
>
> > Now, knowing there is a Unique constraint on Survey, where is the
redundant
> > data?
>
> OK. That wasn't in your original post though. However, you still don't
> have a key because you already told us those columns are nullable and
> we know that keys must not be. Now SQL Server violates SQL standards,
> the relational model and common sense by treating NULLs as equal values
> in a UNIQUE index. So given that peculiarity, you *could* theoretically
> write a lot of very ropey and inefficient TSQL that would treat this as
> a "real" key - by which I mean, you could theoretically reference that
> column every time with code that treats NULLs as equal (like
> COALESCE(x,'foo')=COALESCE(x,'foo')). You'd have to be pretty
> masochistic to attempt that though! Putting that aside, almost any
> reasonable person would consider your table to be denormalized because
> it lacks a logical key - remembering that IDENTITY isn't part of the
> logical model at all.
>
> --
> David Portas
> SQL Server MVP
> --
>
Author
16 Sep 2005 7:04 PM
David Portas
> The point of this is when there isn't one logical key.

Well that was my point too! Whether any external system has a key or not is
irrelevant. What I am saying is that the same information can be represented
relationally (with keys and without NULLs). Do you really believe that you
have some data that cannot be represented relationally? The textbook
solution is to decompose the table into several base tables representing the
known attributes without transitive dependencies (BCNF).

In this case you also have another alternative. You have already said that
you have a UNIQUE constraint (albeit with NULLs) so in fact the "nulls" are
not "real" unknowns or inapplicables - they must represent some meaningful
information because you already know that they are going to be unique in the
table. So you could either use default values or extra columns to indicate
that fact rather than represent the same data through dubious use of a
nullable constraint.

> Also, even if all four columns (RespNo, RespKey, Qtr, CntyCod) didn't
> accept
> null, I would still create an Identity column.  Is it not far more
> efficient
> to join on a 4-byte column, or four column key which contains nvarchar
> data?

Correct. That doesn't remove the need for a logical key as well though.

--
David Portas
SQL Server MVP
--
Author
16 Sep 2005 8:27 PM
Hugo Kornelis
On Fri, 16 Sep 2005 13:01:15 -0400, Mark White wrote:

>Also, even if all four columns (RespNo, RespKey, Qtr, CntyCod) didn't accept
>null, I would still create an Identity column.  Is it not far more efficient
>to join on a 4-byte column, or four column key which contains nvarchar data?

Hi Mark,

That's true. But there's also another side to the medal. Using identity
columns for foreign keys will also increase the number of times that a
join is needed at all. And though a join on a 4-bute integer column is
pretty fast, no join at all will be even faster!

Just consider how this report (based on DDL B) would look (and how it
would perform) in DDL A.

SELECT   RespNo, QuestionId, Answer, COUNT(*)
FROM     SurveyAnswers
GROUP BY RespNo, QuestionId, Answer

Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)

AddThis Social Bookmark Button