|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
wierd sql query problemCREATE TABLEA ( AddressID BIGINT, Address1 nvarchar(100), Address2 nvarchar(200), zipcode nvarchar(5) ) Primary key is AddressID CREATE Table JunctionA ( AddressID BIGINT, PersonID BIGINT, Description nvarchar(300) ) and say I have this same data TableA=========== 1 123 test st. '' 12345 2 123 test st '' 12345 3 230 1st st '' 12345 4 345 2nd st '' 90122 Junction========== 1 2 desc1 1 4 desc3 1 3 desc3 2 2 desc2 3 2 desc34 3 1 desc9 as you can see there are 2 addresses in tablea (ID 1 AND 2) that are the same, and in the junction table, person 2 references both of them! how can I fix this with a query so every item in tableA is unique while making all the items in the junction change their references from the ones I want to delete in tablea to the one that stays? >> how can I fix this with a query so every item in tableA is unique while Delete the duplicate addresses in tableA and declare a key, either use a >> making all the items in the junction change their references from the >> ones I want to delete in tablea to the one that stays? composite one on the columns that are supposed to be unique or a simple surrogate with a unique constraint. -- Anith thats not the problem... the problem is I need to translate the existing
data first (and there are tens of thousands of records) so that all the items in the junction has only one of the address tablea's id... then i can delete them and set up the unique constraint... but the problem im asking is how to translate all the existing junction rows to this before i can delete them... i wanted to use the lowest ID number as the one we keep and delete the other duplicates automatically after updateing the records un the junction table to match the good records in tablea Show quote "Anith Sen" <an***@bizdatasolutions.com> wrote in message news:eqP1GWqpGHA.2292@TK2MSFTNGP05.phx.gbl... >>> how can I fix this with a query so every item in tableA is unique while >>> making all the items in the junction change their references from the >>> ones I want to delete in tablea to the one that stays? > > Delete the duplicate addresses in tableA and declare a key, either use a > composite one on the columns that are supposed to be unique or a simple > surrogate with a unique constraint. > > -- > Anith > This was pretty sloppy even for an example -- DDL is easy to write; do
you really have a need for BIGINT? Usually huge and proprietary are bad things. Why did you invent variable length zip coes and address lines that exceed the USPS CHAR(35)? CREATE TABLE Addresses (address_id INTEGER NOT NULL PRIMARY KEY, address1 CHAR(35) NOT NULL, address2 CHAR(35) NOT NULL, zip_code CHAR (5) NOT NULL CHECK (zip_code LIKE '[0-9][0-9][0-9][0-9][0-9]'), UNIQUE (address1, address2, zip_code) -- prevents your problem! ); The correct terms are "referencing" and "reference" tables. I have no idea who invented "Junction" as a term for a table that holds a relationship. CREATE TABLE Mailings (address_id INTEGER NOT NULL REFERENCES Addresses(address_id) ON UPDATE CASCADE ON DELETE CASCADE, person_id INTEGER NOT NULL REFERENCES Personnel(person_id) ON UPDATE CASCADE ON DELETE CASCADE, mailing_description VARCHAR(300) NOT NULL, -- 300? guess or research? PRIMARY KEY (address_id, person_id) ); Look at how the DRI actions and the constraints are doing most of your work for you. The UNIQUE() constraint can be expensive and hard to write because you are dealing with English ("Lane" = "Ln", "Street" = "St" = "Str"). You probably should look at Melisa Data or SSA software for cleaning address data -- SSA has a great little booklet on the problems that will scare you to death. On 13 Jul 2006 11:05:31 -0700, "--CELKO--" <jcelko***@earthlink.net> It happened at least 25 years ago, I don't imagine that naming thewrote: >I have no >idea who invented "Junction" as a term for a table that holds a >relationship. individual at this late date would serve much purpose. Roy I know its bad... just inheriting code that I'm trying to fix......
Show quote "--CELKO--" <jcelko***@earthlink.net> wrote in message news:1152813931.916591.214420@m73g2000cwd.googlegroups.com... > This was pretty sloppy even for an example -- DDL is easy to write; do > you really have a need for BIGINT? Usually huge and proprietary are > bad things. Why did you invent variable length zip coes and address > lines that exceed the USPS CHAR(35)? > > CREATE TABLE Addresses > (address_id INTEGER NOT NULL PRIMARY KEY, > address1 CHAR(35) NOT NULL, > address2 CHAR(35) NOT NULL, > zip_code CHAR (5) NOT NULL > CHECK (zip_code LIKE '[0-9][0-9][0-9][0-9][0-9]'), > UNIQUE (address1, address2, zip_code) -- prevents your problem! > ); > > The correct terms are "referencing" and "reference" tables. I have no > idea who invented "Junction" as a term for a table that holds a > relationship. > > CREATE TABLE Mailings > (address_id INTEGER NOT NULL > REFERENCES Addresses(address_id) > ON UPDATE CASCADE > ON DELETE CASCADE, > person_id INTEGER NOT NULL > REFERENCES Personnel(person_id) > ON UPDATE CASCADE > ON DELETE CASCADE, > mailing_description VARCHAR(300) NOT NULL, -- 300? guess or research? > PRIMARY KEY (address_id, person_id) > ); > > Look at how the DRI actions and the constraints are doing most of your > work for you. The UNIQUE() constraint can be expensive and hard to > write because you are dealing with English ("Lane" = "Ln", "Street" = > "St" = "Str"). You probably should look at Melisa Data or SSA software > for cleaning address data -- SSA has a great little booklet on the > problems that will scare you to death. > Try,
create table #t1 ( AddressID BIGINT unique, c1 int ) create table #t2 ( AddressID BIGINT unique, new_AddressID BIGINT ) insert into #t1(AddressID, c1) select AddressID, checksum(Address1, Address2, zipcode) from TABLEA insert into #t2(AddressID, new_AddressID) select t1.AddressID, t2.AddressID from #t1 as t1 inner join #t1 as t2 on t1.c1 = t2.c1 and t2.AddressID = ( select min(t3.AddressID) from #t1 as t3 where t3.c1 = t1.c1 and t3.AddressID <= t1.AddressID ) where t1.AddressID != t2.AddressID update JunctionA set AddressID = (select new_AddressID from #t2 as t2 where t2.AddressID = JunctionA.AddressID) where exists (select * from #t2 as t2 where t2.AddressID = JunctionA.AddressID) delete TABLEA where exists(select * from #t2 as t2 where t2.AddressID = TABLEA.AddressID) go AMB Show quote "Smokey Grindle" wrote: > I dont have an DDL for this so I am just going to write it out by hand > > CREATE TABLEA > ( > AddressID BIGINT, > Address1 nvarchar(100), > Address2 nvarchar(200), > zipcode nvarchar(5) > ) > Primary key is AddressID > > CREATE Table JunctionA > ( > AddressID BIGINT, > PersonID BIGINT, > Description nvarchar(300) > ) > > > and say I have this same data > > TableA=========== > > 1 123 test st. '' 12345 > 2 123 test st '' 12345 > 3 230 1st st '' 12345 > 4 345 2nd st '' 90122 > > Junction========== > 1 2 desc1 > 1 4 desc3 > 1 3 desc3 > 2 2 desc2 > 3 2 desc34 > 3 1 desc9 > > as you can see there are 2 addresses in tablea (ID 1 AND 2) that are the > same, and in the junction table, person 2 references both of them! how can I > fix this with a query so every item in tableA is unique while making all the > items in the junction change their references from the ones I want to delete > in tablea to the one that stays? > > > thanks thats exactly what I was looking for, forgot all about checksum's..
Show quote "Alejandro Mesa" <AlejandroM***@discussions.microsoft.com> wrote in message news:6510963A-EB4B-4A4C-87BB-A80BFC536BD5@microsoft.com... > Try, > > create table #t1 ( > AddressID BIGINT unique, > c1 int > ) > > create table #t2 ( > AddressID BIGINT unique, > new_AddressID BIGINT > ) > > insert into #t1(AddressID, c1) > select AddressID, checksum(Address1, Address2, zipcode) > from TABLEA > > insert into #t2(AddressID, new_AddressID) > select t1.AddressID, t2.AddressID > from #t1 as t1 inner join #t1 as t2 > on t1.c1 = t2.c1 and t2.AddressID = ( > select min(t3.AddressID) > from #t1 as t3 > where t3.c1 = t1.c1 and t3.AddressID <= t1.AddressID > ) > where t1.AddressID != t2.AddressID > > update JunctionA > set AddressID = (select new_AddressID from #t2 as t2 where t2.AddressID = > JunctionA.AddressID) > where exists (select * from #t2 as t2 where t2.AddressID = > JunctionA.AddressID) > > delete TABLEA > where exists(select * from #t2 as t2 where t2.AddressID = > TABLEA.AddressID) > go > > > AMB > > > "Smokey Grindle" wrote: > >> I dont have an DDL for this so I am just going to write it out by hand >> >> CREATE TABLEA >> ( >> AddressID BIGINT, >> Address1 nvarchar(100), >> Address2 nvarchar(200), >> zipcode nvarchar(5) >> ) >> Primary key is AddressID >> >> CREATE Table JunctionA >> ( >> AddressID BIGINT, >> PersonID BIGINT, >> Description nvarchar(300) >> ) >> >> >> and say I have this same data >> >> TableA=========== >> >> 1 123 test st. '' 12345 >> 2 123 test st '' 12345 >> 3 230 1st st '' 12345 >> 4 345 2nd st '' 90122 >> >> Junction========== >> 1 2 desc1 >> 1 4 desc3 >> 1 3 desc3 >> 2 2 desc2 >> 3 2 desc34 >> 3 1 desc9 >> >> as you can see there are 2 addresses in tablea (ID 1 AND 2) that are the >> same, and in the junction table, person 2 references both of them! how >> can I >> fix this with a query so every item in tableA is unique while making all >> the >> items in the junction change their references from the ones I want to >> delete >> in tablea to the one that stays? >> >> >> Is there any UNIQUE contraint (PK, constraint, index) on JunctionA?
Perhaps on (AddressID, PersonID)? If so, it might not be as simple as you hope. If we change the AddressID we could be intruducing a duplicate key. Remember you said the same person had multiple (though identical) junctions, so changing one of the person's junctions to point to the same address as the other, while that one already exists. Only if the unique key includes the description will we get away with that change. (And if there is NOT a PK or unique constraint, why not?) So anyway, this would update JunctionA so that all only the lowest of the AddressID values are referenced. UPDATE JunctionA SET AddressID = X.KeepMe FROM (select Address1, Address2, zipcode, min(AddressID) as KeepMe from TableA group by Address1, Address2, zipcode having count(*) > 1) as X WHERE JunctionA.Address1 = X.Address1 AND JunctionA.Address2 = X.Address2 AND JunctionA.zipcode = X.zipcode AND JunctionA.AddressID <> X.KeepMe Roy Harvey Beacon Falls, CT On Thu, 13 Jul 2006 13:37:04 -0400, "Smokey Grindle" <nospam@dontspamme.com> wrote: Show quote >I dont have an DDL for this so I am just going to write it out by hand > >CREATE TABLEA >( > AddressID BIGINT, > Address1 nvarchar(100), > Address2 nvarchar(200), > zipcode nvarchar(5) >) >Primary key is AddressID > >CREATE Table JunctionA >( > AddressID BIGINT, > PersonID BIGINT, > Description nvarchar(300) >) > > >and say I have this same data > >TableA=========== > >1 123 test st. '' 12345 >2 123 test st '' 12345 >3 230 1st st '' 12345 >4 345 2nd st '' 90122 > >Junction========== >1 2 desc1 >1 4 desc3 >1 3 desc3 >2 2 desc2 >3 2 desc34 >3 1 desc9 > >as you can see there are 2 addresses in tablea (ID 1 AND 2) that are the >same, and in the junction table, person 2 references both of them! how can I >fix this with a query so every item in tableA is unique while making all the >items in the junction change their references from the ones I want to delete >in tablea to the one that stays? > This indicates one of the situations where an artifical key (AddressID)
allows problems to occur that would not happen if there was a composite key OR UNIQUE index. You do NOT want two addresses with the same combinations of (Address1 + Address2 + Zipcode). That is an excellent 'natural' key. Rather than creating kludges around the design failure, you would get more return on your time investment by cleaning upo the Address table and adding the proper constriants to prevent any future duplication. -- Show quoteArnie Rowland* "To be successful, your heart must accompany your knowledge." "Smokey Grindle" <nospam@dontspamme.com> wrote in message news:%23azHfMqpGHA.4196@TK2MSFTNGP04.phx.gbl... >I dont have an DDL for this so I am just going to write it out by hand > > CREATE TABLEA > ( > AddressID BIGINT, > Address1 nvarchar(100), > Address2 nvarchar(200), > zipcode nvarchar(5) > ) > Primary key is AddressID > > CREATE Table JunctionA > ( > AddressID BIGINT, > PersonID BIGINT, > Description nvarchar(300) > ) > > > and say I have this same data > > TableA=========== > > 1 123 test st. '' 12345 > 2 123 test st '' 12345 > 3 230 1st st '' 12345 > 4 345 2nd st '' 90122 > > Junction========== > 1 2 desc1 > 1 4 desc3 > 1 3 desc3 > 2 2 desc2 > 3 2 desc34 > 3 1 desc9 > > as you can see there are 2 addresses in tablea (ID 1 AND 2) that are the > same, and in the junction table, person 2 references both of them! how can > I fix this with a query so every item in tableA is unique while making all > the items in the junction change their references from the ones I want to > delete in tablea to the one that stays? > thats exactly what I am trying to do... clean it up so i can get a unique
constraint in here on the address columns Show quote "Arnie Rowland" <ar***@1568.com> wrote in message news:%23FOqniqpGHA.1592@TK2MSFTNGP04.phx.gbl... > This indicates one of the situations where an artifical key (AddressID) > allows problems to occur that would not happen if there was a composite > key OR UNIQUE index. > > You do NOT want two addresses with the same combinations of (Address1 + > Address2 + Zipcode). That is an excellent 'natural' key. > > Rather than creating kludges around the design failure, you would get more > return on your time investment by cleaning upo the Address table and > adding the proper constriants to prevent any future duplication. > > -- > Arnie Rowland* > "To be successful, your heart must accompany your knowledge." > > > > "Smokey Grindle" <nospam@dontspamme.com> wrote in message > news:%23azHfMqpGHA.4196@TK2MSFTNGP04.phx.gbl... >>I dont have an DDL for this so I am just going to write it out by hand >> >> CREATE TABLEA >> ( >> AddressID BIGINT, >> Address1 nvarchar(100), >> Address2 nvarchar(200), >> zipcode nvarchar(5) >> ) >> Primary key is AddressID >> >> CREATE Table JunctionA >> ( >> AddressID BIGINT, >> PersonID BIGINT, >> Description nvarchar(300) >> ) >> >> >> and say I have this same data >> >> TableA=========== >> >> 1 123 test st. '' 12345 >> 2 123 test st '' 12345 >> 3 230 1st st '' 12345 >> 4 345 2nd st '' 90122 >> >> Junction========== >> 1 2 desc1 >> 1 4 desc3 >> 1 3 desc3 >> 2 2 desc2 >> 3 2 desc34 >> 3 1 desc9 >> >> as you can see there are 2 addresses in tablea (ID 1 AND 2) that are the >> same, and in the junction table, person 2 references both of them! how >> can I fix this with a query so every item in tableA is unique while >> making all the items in the junction change their references from the >> ones I want to delete in tablea to the one that stays? >> > > |
|||||||||||||||||||||||