|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Another case for IDENTITYDDL: DDL A: Survey( InternalId int IDENTITY PRIMARY KEY, RespNo int, RespKey nvarchar(16), Qtr char(4), CntyCod smallint, ImportDate datetime not null default getDate()) SurveyAnswers( InternalId int FK, QuestionId int FK to Question table, Answer nvarchar(72) not null, CONSTRAINT PKAnswer PRIMARY KEY CLUSTERED (InternalId, QuestionId) Conversly, the same tables above for non-IDENTITY folk: DDL B: Survey( RespNo int, RespKey nvarchar(16), Qtr char(4), CntyCod smallint, ImportDate datetime not null default getDate(), CONSTRAINT PkSurvey PRIMARY KEY CLUSTERED (RespNo, RespKey, Qtr, CntyCod)) SurveyAnswers( RespNo int, RespKey nvarchar(16), Qtr char(4), CntyCod smallint, QuestionId int not null FK to Question table, Answer nvarchar(72) not null, CONSTRAINT PkSurvey PRIMARY KEY CLUSTERED (RespNo, RespKey, Qtr, CntyCod, QuestionId)) Suppose I need to extract a subset of data using a constrained column: select answer from SurveyAnswer where InternalId in (Select InternalId from Survey where CntyCod = 35) Of course this is an easy join also: select a.Answer from SurveyAnswer a join Survey s on a.InternalId = s.InternalId where s.CntyCod = 35 How would you construct the non-joining query to extract the same information using DDL B? How about the speed of that query compared to the above? Use EXISTS instead of IN. EXISTS performs better than IN in many cases,
although this case is a pretty contrived example, given the join solution. Either the pseudo-DDL and/or your table designs are flawed anyway. Apparently RespNo, RespKey, Qtr, CntyCod are NULL in one case and not in the other. So where is the alternate key in A? Why would you make IDENTITY the only key of a table? -- David Portas SQL Server MVP -- No, they're both null. Where do you see the difference?
Also, there is a Unique index on Survey in DDL A, forgot to include that. How is it flawed, please describe. What other key needs to be included? There is no alternate. Show quote "David Portas" <REMOVE_BEFORE_REPLYING_dpor***@acm.org> wrote in message news:1126884951.795754.65850@f14g2000cwb.googlegroups.com... > Use EXISTS instead of IN. EXISTS performs better than IN in many cases, > although this case is a pretty contrived example, given the join > solution. > > Either the pseudo-DDL and/or your table designs are flawed anyway. > Apparently RespNo, RespKey, Qtr, CntyCod are NULL in one case and not > in the other. So where is the alternate key in A? Why would you make > IDENTITY the only key of a table? > > -- > David Portas > SQL Server MVP > -- > > No, they're both null. Wrong. In B the PK of Survey is (RespNo, RespKey, Qtr, CntyCod), whichmeans they are implicitly NOT NULL. Not so in A where they are NULL by default. > How is it flawed, please describe. What other key needs to be included? If there is no alternate key then you have redundant data and> There is no alternate. consequent loss of integrity. Even if you don't value integrity, why would you want to store duplicate data? -- David Portas SQL Server MVP -- > Wrong. In B the PK of Survey is (RespNo, RespKey, Qtr, CntyCod), which No, sorry, you misunderstood. I meant that they have explicit definitions> means they are implicitly NOT NULL. Not so in A where they are NULL by > default. in the "real" DDL, not the DDL I posted. > If there is no alternate key then you have redundant data and Where is the redundant data you describe? In SurveyAnswers?> consequent loss of integrity. Show quote "David Portas" <REMOVE_BEFORE_REPLYING_dpor***@acm.org> wrote in message news:1126886155.884526.127020@g44g2000cwa.googlegroups.com... > > No, they're both null. > > Wrong. In B the PK of Survey is (RespNo, RespKey, Qtr, CntyCod), which > means they are implicitly NOT NULL. Not so in A where they are NULL by > default. > > > How is it flawed, please describe. What other key needs to be included? > > There is no alternate. > > If there is no alternate key then you have redundant data and > consequent loss of integrity. Even if you don't value integrity, why > would you want to store duplicate data? > > -- > David Portas > SQL Server MVP > -- > You are correct though in the pseudo-DDL (DDL B).
In fact, that reinforces my thoughts on using IDENTITY. RespNo, RespKey, Qtr, CntyCod all can be null (all will have at least one not null). I unfortunately have no control of those columns being null. Show quote "Mark White" <markw2***@yahoo.com> wrote in message news:uuOT6ftuFHA.3000@TK2MSFTNGP12.phx.gbl... > > Wrong. In B the PK of Survey is (RespNo, RespKey, Qtr, CntyCod), which > > means they are implicitly NOT NULL. Not so in A where they are NULL by > > default. > > No, sorry, you misunderstood. I meant that they have explicit definitions > in the "real" DDL, not the DDL I posted. > > > If there is no alternate key then you have redundant data and > > consequent loss of integrity. > > Where is the redundant data you describe? In SurveyAnswers? > > "David Portas" <REMOVE_BEFORE_REPLYING_dpor***@acm.org> wrote in message > news:1126886155.884526.127020@g44g2000cwa.googlegroups.com... > > > No, they're both null. > > > > Wrong. In B the PK of Survey is (RespNo, RespKey, Qtr, CntyCod), which > > means they are implicitly NOT NULL. Not so in A where they are NULL by > > default. > > > > > How is it flawed, please describe. What other key needs to be included? > > > There is no alternate. > > > > If there is no alternate key then you have redundant data and > > consequent loss of integrity. Even if you don't value integrity, why > > would you want to store duplicate data? > > > > -- > > David Portas > > SQL Server MVP > > -- > > > > > No, sorry, you misunderstood. I meant that they have explicit definitions I misunderstood? Sorry about that. Obviously I should have known that I> in the "real" DDL, not the DDL I posted. was expected to refer to my crystal ball... :-< > Where is the redundant data you describe? Both tables if they don't have an alternate key. Consider:CREATE TABLE Recipe (x INTEGER IDENTITY PRIMARY KEY, item VARCHAR(10) NOT NULL /* No key!! */) ; INSERT INTO recipe (item) VALUES ('Egg') ; INSERT INTO recipe (item) VALUES ('Egg') ; INSERT INTO recipe (item) VALUES ('Egg') ; SELECT x, item FROM Recipe ; See the redundancy now? If duplicates have meaning then add a Quantity column instead, like: CREATE TABLE Recipe (x INTEGER IDENTITY PRIMARY KEY, item VARCHAR(10) NOT NULL UNIQUE, qty INTEGER NOT NULL CHECK (qty>0)) ; If not, then just add the constraint and drop the duplicates. Anything else means your logical model breaks. -- David Portas SQL Server MVP -- I missed the INSERT from my last example. Here it is:
INSERT INTO recipe (item, qty) VALUES ('Egg', 3) ; Hopefully that was obvious anyway. The point is the extra UNIQUE constraint. -- David Portas SQL Server MVP -- Yes it was. As mentioned though, there is a Unique constraint.
Show quote "David Portas" <REMOVE_BEFORE_REPLYING_dpor***@acm.org> wrote in message news:1126888069.115632.141480@o13g2000cwo.googlegroups.com... > I missed the INSERT from my last example. Here it is: > > INSERT INTO recipe (item, qty) VALUES ('Egg', 3) ; > > Hopefully that was obvious anyway. The point is the extra UNIQUE > constraint. > > -- > David Portas > SQL Server MVP > -- > lol, sorry again. I haven't described it the best in this post, my
apologies. Also, in my second post, I mentioned there Unique covers RespNo, RespKey, Qtr, CntyCod. This was my fault for trying to shorten the DDL. > INSERT INTO recipe (item) VALUES ('Egg') ; The second and third insert will fail.> INSERT INTO recipe (item) VALUES ('Egg') ; > INSERT INTO recipe (item) VALUES ('Egg') ; Now, knowing there is a Unique constraint on Survey, where is the redundant data? Thanks for responding to the post :) Show quote "David Portas" <REMOVE_BEFORE_REPLYING_dpor***@acm.org> wrote in message news:1126887716.698652.108130@o13g2000cwo.googlegroups.com... > > No, sorry, you misunderstood. I meant that they have explicit definitions > > in the "real" DDL, not the DDL I posted. > > I misunderstood? Sorry about that. Obviously I should have known that I > was expected to refer to my crystal ball... :-< > > > Where is the redundant data you describe? > > Both tables if they don't have an alternate key. Consider: > > CREATE TABLE Recipe (x INTEGER IDENTITY PRIMARY KEY, item VARCHAR(10) > NOT NULL /* No key!! */) ; > > INSERT INTO recipe (item) VALUES ('Egg') ; > INSERT INTO recipe (item) VALUES ('Egg') ; > INSERT INTO recipe (item) VALUES ('Egg') ; > > SELECT x, item > FROM Recipe ; > > See the redundancy now? If duplicates have meaning then add a Quantity > column instead, like: > > CREATE TABLE Recipe (x INTEGER IDENTITY PRIMARY KEY, item VARCHAR(10) > NOT NULL UNIQUE, qty INTEGER NOT NULL CHECK (qty>0)) ; > > If not, then just add the constraint and drop the duplicates. Anything > else means your logical model breaks. > > -- > David Portas > SQL Server MVP > -- > > INSERT INTO recipe (item) VALUES ('Egg') ; Huh? You obviously didn't try out what I posted. IDENTITY does NOT> INSERT INTO recipe (item) VALUES ('Egg') ; > INSERT INTO recipe (item) VALUES ('Egg') ; > The second and third insert will fail. prevent duplicates. > Now, knowing there is a Unique constraint on Survey, where is the redundant OK. That wasn't in your original post though. However, you still don't> data? have a key because you already told us those columns are nullable and we know that keys must not be. Now SQL Server violates SQL standards, the relational model and common sense by treating NULLs as equal values in a UNIQUE index. So given that peculiarity, you *could* theoretically write a lot of very ropey and inefficient TSQL that would treat this as a "real" key - by which I mean, you could theoretically reference that column every time with code that treats NULLs as equal (like COALESCE(x,'foo')=COALESCE(x,'foo')). You'd have to be pretty masochistic to attempt that though! Putting that aside, almost any reasonable person would consider your table to be denormalized because it lacks a logical key - remembering that IDENTITY isn't part of the logical model at all. -- David Portas SQL Server MVP -- Guilty. AND I misread.
The fact that any of those four columns can be null is bad design on the accepted data (from a system out of my hands, I wish it was), not my design. If the acceptable data provided a logical key, obviously all of this would be unneeded. The point of this is when there isn't one logical key. Also, even if all four columns (RespNo, RespKey, Qtr, CntyCod) didn't accept null, I would still create an Identity column. Is it not far more efficient to join on a 4-byte column, or four column key which contains nvarchar data? Show quote "David Portas" <REMOVE_BEFORE_REPLYING_dpor***@acm.org> wrote in message news:1126889576.077806.289090@g14g2000cwa.googlegroups.com... > > INSERT INTO recipe (item) VALUES ('Egg') ; > > INSERT INTO recipe (item) VALUES ('Egg') ; > > INSERT INTO recipe (item) VALUES ('Egg') ; > > The second and third insert will fail. > > Huh? You obviously didn't try out what I posted. IDENTITY does NOT > prevent duplicates. > > > Now, knowing there is a Unique constraint on Survey, where is the redundant > > data? > > OK. That wasn't in your original post though. However, you still don't > have a key because you already told us those columns are nullable and > we know that keys must not be. Now SQL Server violates SQL standards, > the relational model and common sense by treating NULLs as equal values > in a UNIQUE index. So given that peculiarity, you *could* theoretically > write a lot of very ropey and inefficient TSQL that would treat this as > a "real" key - by which I mean, you could theoretically reference that > column every time with code that treats NULLs as equal (like > COALESCE(x,'foo')=COALESCE(x,'foo')). You'd have to be pretty > masochistic to attempt that though! Putting that aside, almost any > reasonable person would consider your table to be denormalized because > it lacks a logical key - remembering that IDENTITY isn't part of the > logical model at all. > > -- > David Portas > SQL Server MVP > -- > > The point of this is when there isn't one logical key. Well that was my point too! Whether any external system has a key or not is irrelevant. What I am saying is that the same information can be represented relationally (with keys and without NULLs). Do you really believe that you have some data that cannot be represented relationally? The textbook solution is to decompose the table into several base tables representing the known attributes without transitive dependencies (BCNF). In this case you also have another alternative. You have already said that you have a UNIQUE constraint (albeit with NULLs) so in fact the "nulls" are not "real" unknowns or inapplicables - they must represent some meaningful information because you already know that they are going to be unique in the table. So you could either use default values or extra columns to indicate that fact rather than represent the same data through dubious use of a nullable constraint. > Also, even if all four columns (RespNo, RespKey, Qtr, CntyCod) didn't Correct. That doesn't remove the need for a logical key as well though.> accept > null, I would still create an Identity column. Is it not far more > efficient > to join on a 4-byte column, or four column key which contains nvarchar > data? -- David Portas SQL Server MVP -- On Fri, 16 Sep 2005 13:01:15 -0400, Mark White wrote:
>Also, even if all four columns (RespNo, RespKey, Qtr, CntyCod) didn't accept Hi Mark,>null, I would still create an Identity column. Is it not far more efficient >to join on a 4-byte column, or four column key which contains nvarchar data? That's true. But there's also another side to the medal. Using identity columns for foreign keys will also increase the number of times that a join is needed at all. And though a join on a 4-bute integer column is pretty fast, no join at all will be even faster! Just consider how this report (based on DDL B) would look (and how it would perform) in DDL A. SELECT RespNo, QuestionId, Answer, COUNT(*) FROM SurveyAnswers GROUP BY RespNo, QuestionId, Answer Best, Hugo -- (Remove _NO_ and _SPAM_ to get my e-mail address) |
|||||||||||||||||||||||