Home All Groups Group Topic Archive Search About

is there a set-based solution to this task?

Author
14 Jul 2005 8:02 PM
jason
given the following simplified tables:

create table apples (
     appleid int not null,
     column1 varchar(50) null,
     column2 varchar(50) null,
     orangeid int null)

create table oranges (
     orangeid int not null,
     column1 varchar(50) null,
     column2 varchar(50) null)

and the following simplified sql statement:

insert into oranges (
     column1,
     column2)
select
     column1,
     column2
from apples

is there a set-based modifiecation i can make to this batch to populate
the orangeid column of the apples table with the corresponding orange
row that was inserted into oranges with the information from apples?

i can get something similar accomplished by putting a temporary appleid
column on the oranges table, and include that in the insert. then i
could subsequently update the orangeid column in the apples table with
the orangeid from table oranges where the two appleid's matched, and
then delete the appleid column from oranges. but i'm curious if there
is a set-based batch that would accomplish the same goal?

(in case someone feels like making snarky comments about duplicating
data - column1 and column2 are going to be removed from apples when the
process is complete. this is just the first step in a normalization
process)

thanks for any help,

jason

Author
14 Jul 2005 8:05 PM
jason
addition/correction: forgot to mention that orangeid of the oranges
table is an identity column. that could be important, in case the set
based solution requires the SCOPE_IDENTITY property
Author
14 Jul 2005 8:38 PM
David Portas
You said you want to normalize the design so Step One is to add a candidate
key to the Oranges table:

ALTER TABLE oranges ALTER COLUMN column1 VARCHAR(50) NOT NULL
ALTER TABLE oranges ALTER COLUMN column2 VARCHAR(50) NOT NULL

ALTER TABLE oranges ADD CONSTRAINT ak_oranges UNIQUE (column1, column2)

Now eliminate the NULLs and duplicates. You could supply defaults for the
NULLs or you could just exclude them, depending on your requirements. If you
can't do either then column1 and column2 rightfully belong in separate
tables - either that or there's another key that you haven't told us about.

insert into oranges (
     column1,
     column2)
select DISTINCT
     column1,
     column2
from apples
WHERE column1 IS NOT NULL
AND column2 IS NOT NULL

Now the UPDATE is easy:

UPDATE apples
SET orangeid =
  (SELECT orangeid
   FROM oranges
   WHERE column1 = apples.column1
    AND column2 = apples.column2)

The problem with the Oranges table you posted is that IDENTITY is the only
key. That should never be. Otherwise you'll probably have to resort to a
nasty kludge using temp tables and IDENTITY_INSERT.

--
David Portas
SQL Server MVP
--
Author
15 Jul 2005 1:46 PM
jason
yeah, there is another field that can act as the alternate key that i
did leave out, using that to perform the id fk update should have been
obvious to me! thanks for the direction!

i agree about the logical keying precept, but i'm having a hell of a
time convincing the people i work for the importance of this, on one
table in particular. they primary key, and only key, is an identity
column, and they hand that identity id to the users as a sort of
"reference number". nothing else about that table is unique, so of
course transforming it is a complete pain. if i can just convince them
to come up with an alternate key on this last table, i'll be a happy
camper.

jason
Author
15 Jul 2005 8:06 PM
--CELKO--
>>  forgot to mention that orangeid of the oranges table is an identity column. that could be important, in case the set based solution requires the SCOPE_IDENTITY property <<

Then we know that you do not have any keys in the tables, don't we?
You are writing non-relational code.
Author
16 Jul 2005 4:13 PM
jason
actually there are keys and there are alternate keys on both tables. i
wrote a non-relational, simplified example, which didn't seem to stop
people from offering helpful advice.

thanksomuch.
Author
14 Jul 2005 9:07 PM
Steve Kass
Jason,

  You can do this with a trigger (see below), or you
can set the orangeid values in the apples table first,
based on the actual (column1,column2) values, say
by ranking them alphabetically, with appleid as a tiebreaker.
The trigger below is not as good a solution, as I note in
the comments:

-- will not work if there are duplicate (column1, column2) values
create trigger oranges_ins on oranges
for insert as
  update apples set
    orangeid = i.orangeid
  from inserted as i
  where i.column1 = apples.column1
  and i.column2 = apples.column2
  -- needs COALESCE or similar to work if either column is NULL
  and apples.orangeid is null
go

You could also give up the idea of making orangeid an identity, and
insert (column1, column2, appleid) into oranges(column1, column2,
orangeid), then update the values to be sequential if you need (in
both tables).

Steve Kass
Drew University

jason wrote:

Show quote
>given the following simplified tables:
>
>create table apples (
>     appleid int not null,
>     column1 varchar(50) null,
>     column2 varchar(50) null,
>     orangeid int null)
>
>create table oranges (
>     orangeid int not null,
>     column1 varchar(50) null,
>     column2 varchar(50) null)
>
>and the following simplified sql statement:
>
>insert into oranges (
>     column1,
>     column2)
>select
>     column1,
>     column2
>from apples
>
>is there a set-based modifiecation i can make to this batch to populate
>the orangeid column of the apples table with the corresponding orange
>row that was inserted into oranges with the information from apples?
>
>i can get something similar accomplished by putting a temporary appleid
>column on the oranges table, and include that in the insert. then i
>could subsequently update the orangeid column in the apples table with
>the orangeid from table oranges where the two appleid's matched, and
>then delete the appleid column from oranges. but i'm curious if there
>is a set-based batch that would accomplish the same goal?
>
>(in case someone feels like making snarky comments about duplicating
>data - column1 and column2 are going to be removed from apples when the
>process is complete. this is just the first step in a normalization
>process)
>
>thanks for any help,
>
>jason
>

>
Author
15 Jul 2005 1:48 PM
jason
ahh, making it non-identity is an intriguing notion. question: if i
start off with a non-identity field, perform the insertion preserving
the id values from apples, and then turn the oranges id field into an
identity field afterward, what will happen? is this generally good /
generally bad?
Author
15 Jul 2005 2:03 PM
David Portas
You can't directly add the IDENTITY property to an existing column. You
can't UPDATE an IDENTITY column either. What you have to do is
re-create the entire table and then re-insert the data, using the
IDENTITY_INSERT option to populate the IDENTITY column. Won't solve the
design problem though.

--
David Portas
SQL Server MVP
--
Author
15 Jul 2005 7:55 PM
MGFoster
David Portas wrote:
> You can't directly add the IDENTITY property to an existing column. You
> can't UPDATE an IDENTITY column either. What you have to do is
> re-create the entire table and then re-insert the data, using the
> IDENTITY_INSERT option to populate the IDENTITY column. Won't solve the
> design problem though.
>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Actually, using EM, you can change the Identity column to an Integer,
import the new data (using QA, or whatever you're using to run the
UPDATE/INSERT batch), and then, using EM again, change the Integer
column back to an Identity column.  EM does all the work, drop,
recreate, reload the table, and you don't have to issue all those
commands.

The problem will be duplicate values in the Identity column.  If you can
ensure there are not any duplicates values then it should work OK.

--
MGFoster:::mgf00 <at> earthlink <decimal-point> net
Oakland, CA (USA)

-----BEGIN PGP SIGNATURE-----
Version: PGP for Personal Privacy 5.0
Charset: noconv

iQA/AwUBQtgUuIechKqOuFEgEQJ/CQCeIXXFEkn0kq3l19RBwqS+Q/gAE74An0Tx
w8mZD/V9SdT+IKcjj0pouwnK
=YStg
-----END PGP SIGNATURE-----
Author
15 Jul 2005 8:02 PM
Aaron Bertrand [SQL Server MVP]
> UPDATE/INSERT batch), and then, using EM again, change the Integer
> column back to an Identity column.  EM does all the work,

I don't think anyone's objection is about having to type out the DROP TABLE
/ CREATE TABLE commands.  It's that the act of dropping and re-creating the
table is going to be impractical, at least in many of the applications I
work with...

AddThis Social Bookmark Button