Home All Groups Group Topic Archive Search About

wierd sql query problem

Author
13 Jul 2006 5:37 PM
Smokey Grindle
I dont have an DDL for this so I am just going to write it out by hand

CREATE TABLEA
(
    AddressID        BIGINT,
    Address1        nvarchar(100),
    Address2        nvarchar(200),
    zipcode            nvarchar(5)
)
Primary key is AddressID

CREATE Table JunctionA
(
    AddressID        BIGINT,
    PersonID        BIGINT,
    Description    nvarchar(300)
)


and say I have this same data

TableA===========

1        123 test st.    ''    12345
2        123 test st    ''    12345
3        230 1st st    ''    12345
4        345 2nd st    ''    90122

Junction==========
1    2    desc1
1    4    desc3
1    3    desc3
2    2    desc2
3    2    desc34
3    1    desc9

as you can see there are 2 addresses in tablea (ID 1 AND 2) that are the
same, and in the junction table, person 2 references both of them! how can I
fix this with a query so every item in tableA is unique while making all the
items in the junction change their references from the ones I want to delete
in tablea to the one that stays?

Author
13 Jul 2006 5:55 PM
Anith Sen
>> how can I fix this with a query so every item in tableA is unique while
>> making all the items in the junction change their references from the
>> ones I want to delete in tablea to the one that stays?

Delete the duplicate addresses in tableA and declare a key, either use a
composite one on the columns that are supposed to be unique or a simple
surrogate with a unique constraint.

--
Anith
Author
13 Jul 2006 6:06 PM
Smokey Grindle
thats not the problem... the problem is I need to translate the existing
data first (and there are tens of thousands of records) so that all the
items in the junction has only one of the address tablea's id... then i can
delete them and set up the unique constraint... but the problem im asking is
how to translate all the existing junction rows to this before i can delete
them... i wanted to use the lowest ID number as the one we keep and delete
the other duplicates automatically after updateing the records un the
junction table to match the good records in tablea

Show quote
"Anith Sen" <an***@bizdatasolutions.com> wrote in message
news:eqP1GWqpGHA.2292@TK2MSFTNGP05.phx.gbl...
>>> how can I fix this with a query so every item in tableA is unique while
>>> making all the items in the junction change their references from the
>>> ones I want to delete in tablea to the one that stays?
>
> Delete the duplicate addresses in tableA and declare a key, either use a
> composite one on the columns that are supposed to be unique or a simple
> surrogate with a unique constraint.
>
> --
> Anith
>
Author
13 Jul 2006 6:05 PM
--CELKO--
This was pretty sloppy even for an example -- DDL is easy to write; do
you really have a need for BIGINT?  Usually huge and proprietary are
bad things.  Why did you invent variable length zip coes and address
lines that exceed the USPS CHAR(35)?

CREATE TABLE Addresses
(address_id INTEGER NOT NULL PRIMARY KEY,
address1 CHAR(35) NOT NULL,
address2 CHAR(35) NOT NULL,
zip_code CHAR (5) NOT NULL
   CHECK (zip_code LIKE '[0-9][0-9][0-9][0-9][0-9]'),
UNIQUE (address1, address2, zip_code) -- prevents your problem!
);

The correct terms are "referencing" and "reference" tables.  I have no
idea who invented "Junction" as a term for a table that holds a
relationship.

CREATE TABLE Mailings
(address_id INTEGER NOT NULL
  REFERENCES Addresses(address_id)
  ON UPDATE CASCADE
  ON DELETE CASCADE,
person_id INTEGER NOT NULL
  REFERENCES Personnel(person_id)
  ON UPDATE CASCADE
  ON DELETE CASCADE,
mailing_description VARCHAR(300) NOT NULL, -- 300? guess or research?
PRIMARY KEY (address_id, person_id)
);

Look at how the DRI actions and the constraints are doing most of your
work for you.  The UNIQUE() constraint can be expensive and hard to
write because you are dealing with English ("Lane" = "Ln", "Street" =
"St" = "Str").  You probably should look at Melisa Data or SSA software
for cleaning address data -- SSA has a great little booklet on the
problems that will scare you to death.
Author
13 Jul 2006 6:09 PM
Roy Harvey
On 13 Jul 2006 11:05:31 -0700, "--CELKO--" <jcelko***@earthlink.net>
wrote:

>I have no
>idea who invented "Junction" as a term for a table that holds a
>relationship.

It happened at least 25 years ago, I don't imagine that naming the
individual at this late date would serve much purpose.

Roy
Author
13 Jul 2006 6:12 PM
Smokey Grindle
I know its bad... just inheriting code that I'm trying to fix......


Show quote
"--CELKO--" <jcelko***@earthlink.net> wrote in message
news:1152813931.916591.214420@m73g2000cwd.googlegroups.com...
> This was pretty sloppy even for an example -- DDL is easy to write; do
> you really have a need for BIGINT?  Usually huge and proprietary are
> bad things.  Why did you invent variable length zip coes and address
> lines that exceed the USPS CHAR(35)?
>
> CREATE TABLE Addresses
> (address_id INTEGER NOT NULL PRIMARY KEY,
> address1 CHAR(35) NOT NULL,
> address2 CHAR(35) NOT NULL,
> zip_code CHAR (5) NOT NULL
>   CHECK (zip_code LIKE '[0-9][0-9][0-9][0-9][0-9]'),
> UNIQUE (address1, address2, zip_code) -- prevents your problem!
> );
>
> The correct terms are "referencing" and "reference" tables.  I have no
> idea who invented "Junction" as a term for a table that holds a
> relationship.
>
> CREATE TABLE Mailings
> (address_id INTEGER NOT NULL
>  REFERENCES Addresses(address_id)
>  ON UPDATE CASCADE
>  ON DELETE CASCADE,
> person_id INTEGER NOT NULL
>  REFERENCES Personnel(person_id)
>  ON UPDATE CASCADE
>  ON DELETE CASCADE,
> mailing_description VARCHAR(300) NOT NULL, -- 300? guess or research?
> PRIMARY KEY (address_id, person_id)
> );
>
> Look at how the DRI actions and the constraints are doing most of your
> work for you.  The UNIQUE() constraint can be expensive and hard to
> write because you are dealing with English ("Lane" = "Ln", "Street" =
> "St" = "Str").  You probably should look at Melisa Data or SSA software
> for cleaning address data -- SSA has a great little booklet on the
> problems that will scare you to death.
>
Author
13 Jul 2006 6:06 PM
Alejandro Mesa
Try,

create table #t1 (
AddressID BIGINT unique,
c1 int
)

create table #t2 (
AddressID BIGINT unique,
new_AddressID BIGINT
)

insert into #t1(AddressID, c1)
select AddressID, checksum(Address1, Address2, zipcode)
from TABLEA

insert into #t2(AddressID, new_AddressID)
select t1.AddressID, t2.AddressID
from #t1 as t1 inner join #t1 as t2
on t1.c1 = t2.c1 and t2.AddressID = (
select min(t3.AddressID)
from #t1 as t3
where t3.c1 = t1.c1 and t3.AddressID <= t1.AddressID
)
where t1.AddressID != t2.AddressID

update JunctionA
set AddressID = (select new_AddressID from #t2 as t2 where t2.AddressID =
JunctionA.AddressID)
where exists (select * from #t2 as t2 where t2.AddressID =
JunctionA.AddressID)

delete TABLEA
where exists(select * from #t2 as t2 where t2.AddressID = TABLEA.AddressID)
go


AMB


Show quote
"Smokey Grindle" wrote:

> I dont have an DDL for this so I am just going to write it out by hand
>
> CREATE TABLEA
> (
>     AddressID        BIGINT,
>     Address1        nvarchar(100),
>     Address2        nvarchar(200),
>     zipcode            nvarchar(5)
> )
> Primary key is AddressID
>
> CREATE Table JunctionA
> (
>     AddressID        BIGINT,
>     PersonID        BIGINT,
>     Description    nvarchar(300)
> )
>
>
> and say I have this same data
>
> TableA===========
>
> 1        123 test st.    ''    12345
> 2        123 test st    ''    12345
> 3        230 1st st    ''    12345
> 4        345 2nd st    ''    90122
>
> Junction==========
> 1    2    desc1
> 1    4    desc3
> 1    3    desc3
> 2    2    desc2
> 3    2    desc34
> 3    1    desc9
>
> as you can see there are 2 addresses in tablea (ID 1 AND 2) that are the
> same, and in the junction table, person 2 references both of them! how can I
> fix this with a query so every item in tableA is unique while making all the
> items in the junction change their references from the ones I want to delete
> in tablea to the one that stays?
>
>
>
Author
13 Jul 2006 6:22 PM
Smokey Grindle
thanks thats exactly what I was looking for, forgot all about checksum's..


Show quote
"Alejandro Mesa" <AlejandroM***@discussions.microsoft.com> wrote in message
news:6510963A-EB4B-4A4C-87BB-A80BFC536BD5@microsoft.com...
> Try,
>
> create table #t1 (
> AddressID BIGINT unique,
> c1 int
> )
>
> create table #t2 (
> AddressID BIGINT unique,
> new_AddressID BIGINT
> )
>
> insert into #t1(AddressID, c1)
> select AddressID, checksum(Address1, Address2, zipcode)
> from TABLEA
>
> insert into #t2(AddressID, new_AddressID)
> select t1.AddressID, t2.AddressID
> from #t1 as t1 inner join #t1 as t2
> on t1.c1 = t2.c1 and t2.AddressID = (
> select min(t3.AddressID)
> from #t1 as t3
> where t3.c1 = t1.c1 and t3.AddressID <= t1.AddressID
> )
> where t1.AddressID != t2.AddressID
>
> update JunctionA
> set AddressID = (select new_AddressID from #t2 as t2 where t2.AddressID =
> JunctionA.AddressID)
> where exists (select * from #t2 as t2 where t2.AddressID =
> JunctionA.AddressID)
>
> delete TABLEA
> where exists(select * from #t2 as t2 where t2.AddressID =
> TABLEA.AddressID)
> go
>
>
> AMB
>
>
> "Smokey Grindle" wrote:
>
>> I dont have an DDL for this so I am just going to write it out by hand
>>
>> CREATE TABLEA
>> (
>>     AddressID        BIGINT,
>>     Address1        nvarchar(100),
>>     Address2        nvarchar(200),
>>     zipcode            nvarchar(5)
>> )
>> Primary key is AddressID
>>
>> CREATE Table JunctionA
>> (
>>     AddressID        BIGINT,
>>     PersonID        BIGINT,
>>     Description    nvarchar(300)
>> )
>>
>>
>> and say I have this same data
>>
>> TableA===========
>>
>> 1        123 test st.    ''    12345
>> 2        123 test st    ''    12345
>> 3        230 1st st    ''    12345
>> 4        345 2nd st    ''    90122
>>
>> Junction==========
>> 1    2    desc1
>> 1    4    desc3
>> 1    3    desc3
>> 2    2    desc2
>> 3    2    desc34
>> 3    1    desc9
>>
>> as you can see there are 2 addresses in tablea (ID 1 AND 2) that are the
>> same, and in the junction table, person 2 references both of them! how
>> can I
>> fix this with a query so every item in tableA is unique while making all
>> the
>> items in the junction change their references from the ones I want to
>> delete
>> in tablea to the one that stays?
>>
>>
>>
Author
13 Jul 2006 6:13 PM
Roy Harvey
Is there any UNIQUE contraint (PK, constraint, index) on JunctionA?
Perhaps on (AddressID, PersonID)?  If so, it might not be as simple as
you hope.  If we change the AddressID we could be intruducing a
duplicate key.  Remember you said the same person had multiple (though
identical) junctions, so changing one of the person's junctions to
point to the same address as the other, while that one already exists.
Only if the unique key includes the description will we get away with
that change.

(And if there is NOT a PK or unique constraint, why not?)

So anyway, this would update JunctionA so that all only the lowest of
the AddressID values are referenced.

UPDATE JunctionA
   SET AddressID = X.KeepMe
  FROM (select Address1, Address2, zipcode,
               min(AddressID) as KeepMe
          from TableA
         group by Address1, Address2, zipcode
        having count(*) > 1) as X
WHERE JunctionA.Address1 = X.Address1
   AND JunctionA.Address2 = X.Address2
   AND JunctionA.zipcode = X.zipcode
   AND JunctionA.AddressID <> X.KeepMe

Roy Harvey
Beacon Falls, CT


On Thu, 13 Jul 2006 13:37:04 -0400, "Smokey Grindle"
<nospam@dontspamme.com> wrote:

Show quote
>I dont have an DDL for this so I am just going to write it out by hand
>
>CREATE TABLEA
>(
>    AddressID        BIGINT,
>    Address1        nvarchar(100),
>    Address2        nvarchar(200),
>    zipcode            nvarchar(5)
>)
>Primary key is AddressID
>
>CREATE Table JunctionA
>(
>    AddressID        BIGINT,
>    PersonID        BIGINT,
>    Description    nvarchar(300)
>)
>
>
>and say I have this same data
>
>TableA===========
>
>1        123 test st.    ''    12345
>2        123 test st    ''    12345
>3        230 1st st    ''    12345
>4        345 2nd st    ''    90122
>
>Junction==========
>1    2    desc1
>1    4    desc3
>1    3    desc3
>2    2    desc2
>3    2    desc34
>3    1    desc9
>
>as you can see there are 2 addresses in tablea (ID 1 AND 2) that are the
>same, and in the junction table, person 2 references both of them! how can I
>fix this with a query so every item in tableA is unique while making all the
>items in the junction change their references from the ones I want to delete
>in tablea to the one that stays?
>
Author
13 Jul 2006 6:17 PM
Arnie Rowland
This indicates one of the situations where an artifical key (AddressID)
allows problems to occur that would not happen if there was a composite key
OR UNIQUE index.

You do NOT want two addresses with the same combinations of (Address1 +
Address2 + Zipcode). That is an excellent 'natural' key.

Rather than creating kludges around the design failure, you would get more
return on your time investment by cleaning upo the Address table and adding
the proper constriants to prevent any future duplication.

--
Arnie Rowland*
"To be successful, your heart must accompany your knowledge."



Show quote
"Smokey Grindle" <nospam@dontspamme.com> wrote in message
news:%23azHfMqpGHA.4196@TK2MSFTNGP04.phx.gbl...
>I dont have an DDL for this so I am just going to write it out by hand
>
> CREATE TABLEA
> (
>    AddressID        BIGINT,
>    Address1        nvarchar(100),
>    Address2        nvarchar(200),
>    zipcode            nvarchar(5)
> )
> Primary key is AddressID
>
> CREATE Table JunctionA
> (
>    AddressID        BIGINT,
>    PersonID        BIGINT,
>    Description    nvarchar(300)
> )
>
>
> and say I have this same data
>
> TableA===========
>
> 1        123 test st.    ''    12345
> 2        123 test st    ''    12345
> 3        230 1st st    ''    12345
> 4        345 2nd st    ''    90122
>
> Junction==========
> 1    2    desc1
> 1    4    desc3
> 1    3    desc3
> 2    2    desc2
> 3    2    desc34
> 3    1    desc9
>
> as you can see there are 2 addresses in tablea (ID 1 AND 2) that are the
> same, and in the junction table, person 2 references both of them! how can
> I fix this with a query so every item in tableA is unique while making all
> the items in the junction change their references from the ones I want to
> delete in tablea to the one that stays?
>
Author
13 Jul 2006 6:23 PM
Smokey Grindle
thats exactly what I am trying to do... clean it up so i can get a unique
constraint in here on the address columns

Show quote
"Arnie Rowland" <ar***@1568.com> wrote in message
news:%23FOqniqpGHA.1592@TK2MSFTNGP04.phx.gbl...
> This indicates one of the situations where an artifical key (AddressID)
> allows problems to occur that would not happen if there was a composite
> key OR UNIQUE index.
>
> You do NOT want two addresses with the same combinations of (Address1 +
> Address2 + Zipcode). That is an excellent 'natural' key.
>
> Rather than creating kludges around the design failure, you would get more
> return on your time investment by cleaning upo the Address table and
> adding the proper constriants to prevent any future duplication.
>
> --
> Arnie Rowland*
> "To be successful, your heart must accompany your knowledge."
>
>
>
> "Smokey Grindle" <nospam@dontspamme.com> wrote in message
> news:%23azHfMqpGHA.4196@TK2MSFTNGP04.phx.gbl...
>>I dont have an DDL for this so I am just going to write it out by hand
>>
>> CREATE TABLEA
>> (
>>    AddressID        BIGINT,
>>    Address1        nvarchar(100),
>>    Address2        nvarchar(200),
>>    zipcode            nvarchar(5)
>> )
>> Primary key is AddressID
>>
>> CREATE Table JunctionA
>> (
>>    AddressID        BIGINT,
>>    PersonID        BIGINT,
>>    Description    nvarchar(300)
>> )
>>
>>
>> and say I have this same data
>>
>> TableA===========
>>
>> 1        123 test st.    ''    12345
>> 2        123 test st    ''    12345
>> 3        230 1st st    ''    12345
>> 4        345 2nd st    ''    90122
>>
>> Junction==========
>> 1    2    desc1
>> 1    4    desc3
>> 1    3    desc3
>> 2    2    desc2
>> 3    2    desc34
>> 3    1    desc9
>>
>> as you can see there are 2 addresses in tablea (ID 1 AND 2) that are the
>> same, and in the junction table, person 2 references both of them! how
>> can I fix this with a query so every item in tableA is unique while
>> making all the items in the junction change their references from the
>> ones I want to delete in tablea to the one that stays?
>>
>
>

AddThis Social Bookmark Button