|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Delete 'Non-Unique' Rows Question (with DDL)I'm working on a query which will ultimately be used in a stored proc for a report. This report will quite simply list the components that are unique to a particular customers goods. I've created a mechanism for traversing our BOM structure, and getting a list of all components used in any customers parts. I then intended to clear out all the components that were used by more than one customer. Unfortunately, after about 5 hours effort, I've come up short. Perhaps someone can offer a suggestion. The DDL is posted below. What I'm looking for, ultimately, is to get a 'selection' containing only the two "1000" partnumbers, as components "1001" and "1002" are also used by customer "B". Also, Customer "A" has two products which use the "1000" component, but its still unique to them, so we want to keep it in the results. You can see the DELETE statement I was trying, but it misses the mark rather wildly. I'm open to tackling this problem a different way to, if that would provide a clearer solution. In the meanwhile, I'll keep plugging away at it, perhaps I can get lucky. Thanks for your help, Brian Ackermann ------------------------------------------ -- DDL ------------------------------------------ DROP TABLE zz_report GO CREATE TABLE zz_report ( partNumber varchar(10) , customer varchar(10) ) INSERT zz_report VALUES ('1000', 'A') INSERT zz_report VALUES ('1000', 'A') INSERT zz_report VALUES ('1001', 'A') INSERT zz_report VALUES ('1002', 'A') INSERT zz_report VALUES ('1001', 'B') INSERT zz_report VALUES ('1002', 'B') INSERT zz_report VALUES ('1002', 'B') DELETE zz_report FROM zz_report R WHERE EXISTS ( SELECT Z.partnumber FROM zz_report Z WHERE R.partnumber = Z.partnumber GROUP BY Z.customer, Z.partnumber HAVING count(*) <> 1 ) SELECT * FROM zz_report GO DROP TABLE zz_report GO a shot in the dark:
DELETE zz_report FROM zz_report R WHERE EXISTS ( SELECT 1 FROM zz_report Z WHERE R.partnumber = Z.partnumber AND z.customer > r.customer ) partNumber customer ---------- ---------- 1000 A 1000 A 1001 B 1002 B 1002 B Try using a select, without deleting anything...
Select a.Customer, a.partnumber from zz_report as a where not exists (select 1 from zz_report b where b.partnumber = a.partnumber and b.customer <> a.customer) Show quote "Brian J. Ackermann" <brianj***@allmail.net> wrote in message news:gdudnZReWYKmavDZnZ2dnUVZ_tednZ2d@giganews.com... > Hi all, > > I'm working on a query which will ultimately be used in a stored proc > for a report. This report will quite simply list the components that > are unique to a particular customers goods. I've created a mechanism > for traversing our BOM structure, and getting a list of all components > used in any customers parts. I then intended to clear out all the > components that were used by more than one customer. Unfortunately, > after about 5 hours effort, I've come up short. Perhaps someone can > offer a suggestion. > > The DDL is posted below. What I'm looking for, ultimately, is to get a > 'selection' containing only the two "1000" partnumbers, as components > "1001" and "1002" are also used by customer "B". Also, Customer "A" has > two products which use the "1000" component, but its still unique to > them, so we want to keep it in the results. > > You can see the DELETE statement I was trying, but it misses the mark > rather wildly. I'm open to tackling this problem a different way to, if > that would provide a clearer solution. In the meanwhile, I'll keep > plugging away at it, perhaps I can get lucky. > > Thanks for your help, > > Brian Ackermann > > > ------------------------------------------ > -- DDL > ------------------------------------------ > DROP TABLE zz_report > GO > > CREATE TABLE zz_report > ( > partNumber varchar(10) > , customer varchar(10) > ) > > > INSERT zz_report VALUES ('1000', 'A') > INSERT zz_report VALUES ('1000', 'A') > INSERT zz_report VALUES ('1001', 'A') > INSERT zz_report VALUES ('1002', 'A') > INSERT zz_report VALUES ('1001', 'B') > INSERT zz_report VALUES ('1002', 'B') > INSERT zz_report VALUES ('1002', 'B') > > > DELETE zz_report FROM zz_report R > WHERE EXISTS > ( > SELECT Z.partnumber > FROM zz_report Z > WHERE R.partnumber = Z.partnumber > GROUP BY Z.customer, Z.partnumber > HAVING count(*) <> 1 > ) > > > > SELECT * FROM zz_report > > GO > > DROP TABLE zz_report > GO Jim Underwood wrote:
> Try using a select, without deleting anything... Thanks Jim!> > Select a.Customer, a.partnumber > from zz_report as a > where not exists (select 1 > from zz_report b > where b.partnumber = a.partnumber > and b.customer <> a.customer) This works, at least as far as the small dataset I provided. I'm running it now on my real data. Unfortunately, its taking quite a long time to process (about a minute and a half). I'll report back my findings in a few. Brian J. Ackermann wrote
> Thanks Jim! Well, sadly, its not what I am needing. The very first part I checked > > This works, at least as far as the small dataset I provided. I'm > running it now on my real data. Unfortunately, its taking quite a long > time to process (about a minute and a half). I'll report back my > findings in a few. > (part "000001") is used in just about every single finished product produced here, and consequently, is used by nearly every customer. I'll poke around with it some, because the idea seems good to me... Brian Brian J. Ackermann wrote:
> Well, sadly, its not what I am needing. The very first part I checked Okay, I found a bit of garbage data (some customers with NULL values). > (part "000001") is used in just about every single finished product > produced here, and consequently, is used by nearly every customer. I'll > poke around with it some, because the idea seems good to me... > > Brian Removing that, I've found that the first hundred items are perfect. The logic seems strong, so I'm going to go with it. But, do you have any suggestions for picking up the pace a little bit. That little query you provided me with takes about 00:01:05 to run. The table is about 19,000 rows. Perhaps theres some kind of optimization we can try on it? Thanks again, I appreciate your help. Brian Ackermann Well, if you post full DDL showing your actual table and indexes, someone
may be able to offer a suggestion. If partnumber and customer form the primary key of this table (or have a unique constraint) , then you might try something like... Select customer, partnumber from zz_report where partnumber in ( select partnumber from zz_report group by partnumber having count(customer) = 1 ) Having an index on partnumber may help in either case. Note, this is an index with partnumber as the first column, not the second. Show quote "Brian J. Ackermann" <brianj***@allmail.net> wrote in message news:446E1199.8070504@allmail.net... > Brian J. Ackermann wrote: > > Well, sadly, its not what I am needing. The very first part I checked > > (part "000001") is used in just about every single finished product > > produced here, and consequently, is used by nearly every customer. I'll > > poke around with it some, because the idea seems good to me... > > > > Brian > > Okay, I found a bit of garbage data (some customers with NULL values). > Removing that, I've found that the first hundred items are perfect. The > logic seems strong, so I'm going to go with it. > > But, do you have any suggestions for picking up the pace a little bit. > That little query you provided me with takes about 00:01:05 to run. The > table is about 19,000 rows. Perhaps theres some kind of optimization we > can try on it? > > Thanks again, I appreciate your help. > > Brian Ackermann Jim Underwood wrote:
> Well, if you post full DDL showing your actual table and indexes, someone The DDL is pretty much as I gave it. Its just a temporary table, with > may be able to offer a suggestion. only those two fields. I'll try putting an index on it, and see if that make a big difference. I expect it will. Thanks again for your excellent help. Brian. Do you really need the temporary table? If the data is coming from an
existing table, why not access it directly? Or is this data coming from an application, and not the database? Show quote "Brian J. Ackermann" <brianj***@allmail.net> wrote in message news:hbSdnfRYaMeOh_PZRVn-uw@giganews.com... > Jim Underwood wrote: > > Well, if you post full DDL showing your actual table and indexes, someone > > may be able to offer a suggestion. > > The DDL is pretty much as I gave it. Its just a temporary table, with > only those two fields. I'll try putting an index on it, and see if that > make a big difference. I expect it will. > > Thanks again for your excellent help. > > Brian. > > Jim Underwood wrote:
> Do you really need the temporary table? If the data is coming from an I am not sure if I _need_ it, but I can't really think of a better way > existing table, why not access it directly? Or is this data coming from an > application, and not the database? > to get to it. All of that data comes out of our Bill of Material Tables, which is a hierarchical data structure, and some of our parts are only 2 levels deep, and some are 5 levels deep, and even that could change in the future. Presently, I am using a cursor to loop over all our 'non-obsolete' parts, and then a UDF which navigates the BOM, pulling out all the components along the way. The results of this UDF I put into the temp table. I'm quite certain there are other ways to tackle this problem, but that seemed the most likely to generate easily understood results. Brian Sounds like an interesting challenge, but one that would be appropriate for
a seperate post. Show quote "Brian J. Ackermann" <brianj***@allmail.net> wrote in message news:j_6dnZ8NPYuZtPPZnZ2dneKdnZydnZ2d@giganews.com... > Jim Underwood wrote: > > Do you really need the temporary table? If the data is coming from an > > existing table, why not access it directly? Or is this data coming from an > > application, and not the database? > > > I am not sure if I _need_ it, but I can't really think of a better way > to get to it. All of that data comes out of our Bill of Material > Tables, which is a hierarchical data structure, and some of our parts > are only 2 levels deep, and some are 5 levels deep, and even that could > change in the future. > > Presently, I am using a cursor to loop over all our 'non-obsolete' > parts, and then a UDF which navigates the BOM, pulling out all the > components along the way. The results of this UDF I put into the temp > table. > > I'm quite certain there are other ways to tackle this problem, but that > seemed the most likely to generate easily understood results. > > Brian Brian, you haven't mentioned the version of SQL Server you are working with.
If it's SQL Server 2005, you can use the following query to isolate the rows you are after: WITH MinMax AS ( SELECT partNumber, customer, MIN(customer) OVER(PARTITION BY partNumber) AS mn, MAX(customer) OVER(PARTITION BY partNumber) AS mx FROM zz_report ) SELECT partNumber, customer FROM MinMax WHERE mn = mx; And the following query to delete the ones you wish to get rid of: WITH MinMax AS ( SELECT partNumber, customer, MIN(customer) OVER(PARTITION BY partNumber) AS mn, MAX(customer) OVER(PARTITION BY partNumber) AS mx FROM zz_report ) DELETE FROM MinMax WHERE mn < mx; -- Show quoteBG, SQL Server MVP www.SolidQualityLearning.com www.insidetsql.com Anything written in this message represents my view, my own view, and nothing but my view (WITH SCHEMABINDING), so help me my T-SQL code. "Brian J. Ackermann" <brianj***@allmail.net> wrote in message news:gdudnZReWYKmavDZnZ2dnUVZ_tednZ2d@giganews.com... > Hi all, > > I'm working on a query which will ultimately be used in a stored proc for > a report. This report will quite simply list the components that are > unique to a particular customers goods. I've created a mechanism for > traversing our BOM structure, and getting a list of all components used in > any customers parts. I then intended to clear out all the components that > were used by more than one customer. Unfortunately, after about 5 hours > effort, I've come up short. Perhaps someone can offer a suggestion. > > The DDL is posted below. What I'm looking for, ultimately, is to get a > 'selection' containing only the two "1000" partnumbers, as components > "1001" and "1002" are also used by customer "B". Also, Customer "A" has > two products which use the "1000" component, but its still unique to them, > so we want to keep it in the results. > > You can see the DELETE statement I was trying, but it misses the mark > rather wildly. I'm open to tackling this problem a different way to, if > that would provide a clearer solution. In the meanwhile, I'll keep > plugging away at it, perhaps I can get lucky. > > Thanks for your help, > > Brian Ackermann > > > ------------------------------------------ > -- DDL > ------------------------------------------ > DROP TABLE zz_report > GO > > CREATE TABLE zz_report > ( > partNumber varchar(10) > , customer varchar(10) > ) > > > INSERT zz_report VALUES ('1000', 'A') > INSERT zz_report VALUES ('1000', 'A') > INSERT zz_report VALUES ('1001', 'A') > INSERT zz_report VALUES ('1002', 'A') > INSERT zz_report VALUES ('1001', 'B') > INSERT zz_report VALUES ('1002', 'B') > INSERT zz_report VALUES ('1002', 'B') > > > DELETE zz_report FROM zz_report R > WHERE EXISTS > ( > SELECT Z.partnumber > FROM zz_report Z > WHERE R.partnumber = Z.partnumber > GROUP BY Z.customer, Z.partnumber > HAVING count(*) <> 1 > ) > > > > SELECT * FROM zz_report > > GO > > DROP TABLE zz_report > GO |
|||||||||||||||||||||||