|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Question for T-SQL Gurusthis information to create a rollup key that can be used to identify the entire group. Some groups will have one email some will have many. Is there a way to approach this problem with a set based approach? I know I can do it with a cursor but I prefer not to. Please feel free to offer schema and/or modeling changes. Thanks for looking! CREATE TABLE dbo.email_key ( email_id INT identity ,email VARCHAR(100) ,rollup_id INT ) CREATE TABLE dbo.email_email ( email_email_id INT identity ,email_id INT ,rel_email_id INT ) CREATE TABLE dbo.rollup_key ( rollup_id INT identity ,primary_email_id INT) TRUNCATE TABLE dbo.email_key TRUNCATE TABLE dbo.email_email TRUNCATE TABLE dbo.rollup_key --SAMPLE DATA INSERT INTO dbo.email_key (email) SELECT 's**@abc.com' INSERT INTO dbo.email_key (email) SELECT 'd***@abc.com' INSERT INTO dbo.email_key (email) SELECT 'j**@abc.com' INSERT INTO dbo.email_key (email) SELECT 'm***@abc.com' --SAMPLE DATA INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 1,2 INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 1,3 INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 2,1 INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 3,1 INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 2,3 --ONE POSSIBLE END RESULT INSERT INTO dbo.rollup_key (primary_email_id) SELECT 1 INSERT INTO dbo.rollup_key (primary_email_id) SELECT 4 --RUN A COUPLE UPDATES TO SET email_key.rollup_id Dave,
I cannot fully understanding your problem, but I guess that it relates to the "recursive common table expression (CTE)" feature that is newly added in SQL Server 2005. - WITH common_table_expression (Transact-SQL) http://msdn2.microsoft.com/en-us/library/ms175972.aspx J. Using multiple anchor and recursive members D. Using a recursive CTE to display multiple levels of recursion E. Using a recursive CTE to display two levels of recursion F. Using a recursive CTE to display a hierarchical list -- Show quoteMartin Poon MVP - SQL Server ==================================== "Dave" <daveg***@gmail.com> ??? news:1152893141.610957.311880@m73g2000cwd.googlegroups.com ???... > I have a table that relates emails together. I would like to leverage > this information to create a rollup key that can be used to identify > the entire group. Some groups will have one email some will have many. > > Is there a way to approach this problem with a set based approach? I > know I can do it with a cursor but I prefer not to. > > Please feel free to offer schema and/or modeling changes. > > Thanks for looking! > > > > CREATE TABLE dbo.email_key ( > email_id INT identity > ,email VARCHAR(100) > ,rollup_id INT > ) > > CREATE TABLE dbo.email_email ( > email_email_id INT identity > ,email_id INT > ,rel_email_id INT > ) > > CREATE TABLE dbo.rollup_key ( > rollup_id INT identity > ,primary_email_id INT) > > > TRUNCATE TABLE dbo.email_key > TRUNCATE TABLE dbo.email_email > TRUNCATE TABLE dbo.rollup_key > > --SAMPLE DATA > INSERT INTO dbo.email_key (email) SELECT 's**@abc.com' > INSERT INTO dbo.email_key (email) SELECT 'd***@abc.com' > INSERT INTO dbo.email_key (email) SELECT 'j**@abc.com' > INSERT INTO dbo.email_key (email) SELECT 'm***@abc.com' > > --SAMPLE DATA > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 1,2 > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 1,3 > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 2,1 > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 3,1 > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 2,3 > > --ONE POSSIBLE END RESULT > INSERT INTO dbo.rollup_key (primary_email_id) SELECT 1 > INSERT INTO dbo.rollup_key (primary_email_id) SELECT 4 > --RUN A COUPLE UPDATES TO SET email_key.rollup_id > Thanks, I forgot to mention that we are using SQL Server 2000
Martin C K Poon [MVP] wrote: Show quote > Dave, > > I cannot fully understanding your problem, but I guess that it relates to > the "recursive common table expression (CTE)" feature that is newly added in > SQL Server 2005. > > - WITH common_table_expression (Transact-SQL) > http://msdn2.microsoft.com/en-us/library/ms175972.aspx > J. Using multiple anchor and recursive members > D. Using a recursive CTE to display multiple levels of recursion > E. Using a recursive CTE to display two levels of recursion > F. Using a recursive CTE to display a hierarchical list > > -- > Martin Poon > MVP - SQL Server > ==================================== > "Dave" <daveg***@gmail.com> ??? > news:1152893141.610957.311880@m73g2000cwd.googlegroups.com ???... > > I have a table that relates emails together. I would like to leverage > > this information to create a rollup key that can be used to identify > > the entire group. Some groups will have one email some will have many. > > > > Is there a way to approach this problem with a set based approach? I > > know I can do it with a cursor but I prefer not to. > > > > Please feel free to offer schema and/or modeling changes. > > > > Thanks for looking! > > > > > > > > CREATE TABLE dbo.email_key ( > > email_id INT identity > > ,email VARCHAR(100) > > ,rollup_id INT > > ) > > > > CREATE TABLE dbo.email_email ( > > email_email_id INT identity > > ,email_id INT > > ,rel_email_id INT > > ) > > > > CREATE TABLE dbo.rollup_key ( > > rollup_id INT identity > > ,primary_email_id INT) > > > > > > TRUNCATE TABLE dbo.email_key > > TRUNCATE TABLE dbo.email_email > > TRUNCATE TABLE dbo.rollup_key > > > > --SAMPLE DATA > > INSERT INTO dbo.email_key (email) SELECT 's**@abc.com' > > INSERT INTO dbo.email_key (email) SELECT 'd***@abc.com' > > INSERT INTO dbo.email_key (email) SELECT 'j**@abc.com' > > INSERT INTO dbo.email_key (email) SELECT 'm***@abc.com' > > > > --SAMPLE DATA > > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 1,2 > > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 1,3 > > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 2,1 > > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 3,1 > > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 2,3 > > > > --ONE POSSIBLE END RESULT > > INSERT INTO dbo.rollup_key (primary_email_id) SELECT 1 > > INSERT INTO dbo.rollup_key (primary_email_id) SELECT 4 > > --RUN A COUPLE UPDATES TO SET email_key.rollup_id > > >> I have a table that relates emails together. I would like to leverage this information to create a rollup key that can be used to identify the entire group. << Roll ups are mathematical operations done on a hierachy. Google ROLLUPand CUBE for details. Are you talking about building a tree structure to show a newsgroup thread? >> Is there a way to approach this problem with a set based approach? << Get a copy of TREES & HIERARCHIES IN SQL for several approaches forthis kind of problem. You mgiht also want to get a basic data modeling book, so you will not have table names that contain "key" as affix, will stop using IDENTITY and will craet tables with relational keys. Also, think about bad "email_email" is as a data element name. The proper syntax isd "INSERT INTO .. VALUES()", and not the proprietary SELECT you used. Does anyone have any suggestions?
Dave wrote: Show quote > I have a table that relates emails together. I would like to leverage > this information to create a rollup key that can be used to identify > the entire group. Some groups will have one email some will have many. > > Is there a way to approach this problem with a set based approach? I > know I can do it with a cursor but I prefer not to. > > Please feel free to offer schema and/or modeling changes. > > Thanks for looking! > > > > CREATE TABLE dbo.email_key ( > email_id INT identity > ,email VARCHAR(100) > ,rollup_id INT > ) > > CREATE TABLE dbo.email_email ( > email_email_id INT identity > ,email_id INT > ,rel_email_id INT > ) > > CREATE TABLE dbo.rollup_key ( > rollup_id INT identity > ,primary_email_id INT) > > > TRUNCATE TABLE dbo.email_key > TRUNCATE TABLE dbo.email_email > TRUNCATE TABLE dbo.rollup_key > > --SAMPLE DATA > INSERT INTO dbo.email_key (email) SELECT 's**@abc.com' > INSERT INTO dbo.email_key (email) SELECT 'd***@abc.com' > INSERT INTO dbo.email_key (email) SELECT 'j**@abc.com' > INSERT INTO dbo.email_key (email) SELECT 'm***@abc.com' > > --SAMPLE DATA > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 1,2 > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 1,3 > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 2,1 > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 3,1 > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 2,3 > > --ONE POSSIBLE END RESULT > INSERT INTO dbo.rollup_key (primary_email_id) SELECT 1 > INSERT INTO dbo.rollup_key (primary_email_id) SELECT 4 > --RUN A COUPLE UPDATES TO SET email_key.rollup_id On 17 Jul 2006 08:55:56 -0700, "Dave" <daveg***@gmail.com> wrote: Perhaps nobody understood what you were trying to do - I know I>Does anyone have any suggestions? didn't. Perhaps explaining it again, with a few more words, would clarify it. Roy I want to be able to aggregate my reports based on the person not his
email. Sometimes a person has more than one email so I would like to create a new table that will uniquely identify each person. I would like to have an intersection table that will relate this person (rollup) to all of his emails. I have already created a relationship table for email called dbo.email__email that relates the multiple emails. Now I just need to create the person (rollup) table with a unique key and then create the intersection table between person and email. I can accomplish this using a cursor but it takes forever. I would like to know if there is a set bases approach to this problem. Roy Harvey wrote: Show quote > On 17 Jul 2006 08:55:56 -0700, "Dave" <daveg***@gmail.com> wrote: > > >Does anyone have any suggestions? > > Perhaps nobody understood what you were trying to do - I know I > didn't. Perhaps explaining it again, with a few more words, would > clarify it. > > Roy On 18 Jul 2006 08:38:49 -0700, "Dave" <daveg***@gmail.com> wrote: What rules did you follow in creating this table? What constraints>I have already created a relationship table for email called >dbo.email__email that relates the multiple emails. >CREATE TABLE dbo.email_email ( > email_email_id INT identity > ,email_id INT > ,rel_email_id INT >) are in place to enforce those rules? Do you allow A linked to B and B linked to A? Do you allow A linked to B, B linked to C? Do you allow A linked to B, B linked to C, and C linked to A? Or A to B, B to C, C to D, A to D? If five emails, A, B, C, D and E, are all one person, there are a great many different ways to enter links. Many include redundancy. Many include "loops". If there is only one email for a person, does that mean that they do not appear in the table at all? One common limit used in such a situation is to order the pairing, so that the one of the emails with the lower value is in the first bucket and the higher in the second. That avoids at least some of the redundancy. Another would be to require the one on the right to be unique within the table, which would eliminate many more bits of confusion. Roy Harvey Beacon Falls, CT Thanks for your reply Roy!
The original question assumes the data in email__email is accurate. Also the following logic exists If A linked B then B linked A If A linked B & B linked C then A linked C Sample Data email rel_email 1 2 2 1 1 3 3 1 2 3 3 2 Single emails do not appear in this table though they can be easily handled with a different query. I have actually found a solution that I believe is based off your first suggestion. It has passed the initial QA and I think it is solid. The query went from 3 hours to less than 2 seconds :-) (Code is posted below). Is there a good book that lists these different T-SQL techniques for common problems? I have learned a handful and this one will be a very nice addition to my tool belt. Thanks again! --SAMPLE CODE /* email_id rel_email_id 1 4 1 3 4 1 1 3 */ --drop table #e SELECT distinct CASE WHEN ee1.email_id < ee2.email_id THEN ee1.email__email_id ELSE ee2.email__email_id END email__email_id ,CASE WHEN ee1.email_id < ee2.email_id THEN ee1.email_id ELSE ee2.email_id END email_id ,CASE WHEN ee1.email_id < ee2.email_id THEN ee1.rel_email_id ELSE ee2.rel_email_id END rel_email_id INTO #e FROM dbo.email__email ee1 INNER JOIN dbo.email__email ee2 ON ee2.email_id = ee1.rel_email_id AND ee2.rel_email_id = ee1.email_id /* email_id rel_email_id 1 4 1 3 1 2 2 4 2 3 3 4 */ --select count(*) --from #e --drop table #e2 SELECT email_id parent_email_id ,rel_email_id INTO #e2 from #e WHERE email_id NOT IN (SELECT rel_email_id FROM #e) Roy Harvey wrote: Show quote > On 18 Jul 2006 08:38:49 -0700, "Dave" <daveg***@gmail.com> wrote: > > >I have already created a relationship table for email called > >dbo.email__email that relates the multiple emails. > > >CREATE TABLE dbo.email_email ( > > email_email_id INT identity > > ,email_id INT > > ,rel_email_id INT > >) > > What rules did you follow in creating this table? What constraints > are in place to enforce those rules? Do you allow A linked to B and B > linked to A? Do you allow A linked to B, B linked to C? Do you allow > A linked to B, B linked to C, and C linked to A? Or A to B, B to C, C > to D, A to D? If five emails, A, B, C, D and E, are all one person, > there are a great many different ways to enter links. Many include > redundancy. Many include "loops". > > If there is only one email for a person, does that mean that they do > not appear in the table at all? > > One common limit used in such a situation is to order the pairing, so > that the one of the emails with the lower value is in the first bucket > and the higher in the second. That avoids at least some of the > redundancy. Another would be to require the one on the right to be > unique within the table, which would eliminate many more bits of > confusion. > > Roy Harvey > Beacon Falls, CT If every email for a person is linked to EVERY other, BOTH ways, this
should be pretty simple. SELECT min(email_id) as parent_email_id, rel_email_id FROM email_email WHERE email_id < rel_email_id GROUP BY rel_email_id ORDER BY 1, 2 As for books, I bought Ken Henderson's "The Guru's Guide to Transact-SQL" when it first came out, and while I did not read it in detail - I already had a pretty good handle on things - it did look very good. I haven't kept up with the newer books out since then. Roy Harvey Beacon Falls, CT Show quote On 18 Jul 2006 12:48:29 -0700, "Dave" <daveg***@gmail.com> wrote: >Thanks for your reply Roy! > >The original question assumes the data in email__email is accurate. > >Also the following logic exists >If A linked B > then B linked A > >If A linked B & > B linked C > then A linked C > >Sample Data >email rel_email >1 2 >2 1 >1 3 >3 1 >2 3 >3 2 > >Single emails do not appear in this table though they can be easily >handled with a different query. > > >I have actually found a solution that I believe is based off your first >suggestion. It has passed the initial QA and I think it is solid. The >query went from 3 hours to less than 2 seconds :-) (Code is posted >below). > >Is there a good book that lists these different T-SQL techniques for >common problems? I have learned a handful and this one will be a very >nice addition to my tool belt. > >Thanks again! > > > >--SAMPLE CODE >/* >email_id rel_email_id >1 4 >1 3 >4 1 >1 3 >*/ > >--drop table #e >SELECT distinct > CASE > WHEN ee1.email_id < ee2.email_id THEN ee1.email__email_id > ELSE ee2.email__email_id > END email__email_id > ,CASE > WHEN ee1.email_id < ee2.email_id THEN ee1.email_id > ELSE ee2.email_id > END email_id > ,CASE > WHEN ee1.email_id < ee2.email_id THEN ee1.rel_email_id > ELSE ee2.rel_email_id > END rel_email_id >INTO #e >FROM dbo.email__email ee1 > INNER JOIN dbo.email__email ee2 ON ee2.email_id = ee1.rel_email_id > AND ee2.rel_email_id = ee1.email_id > >/* >email_id rel_email_id >1 4 >1 3 >1 2 >2 4 >2 3 >3 4 >*/ >--select count(*) >--from #e > >--drop table #e2 >SELECT email_id parent_email_id > ,rel_email_id >INTO #e2 >from #e >WHERE email_id NOT IN (SELECT rel_email_id FROM #e) > > > > > > >Roy Harvey wrote: >> On 18 Jul 2006 08:38:49 -0700, "Dave" <daveg***@gmail.com> wrote: >> >> >I have already created a relationship table for email called >> >dbo.email__email that relates the multiple emails. >> >> >CREATE TABLE dbo.email_email ( >> > email_email_id INT identity >> > ,email_id INT >> > ,rel_email_id INT >> >) >> >> What rules did you follow in creating this table? What constraints >> are in place to enforce those rules? Do you allow A linked to B and B >> linked to A? Do you allow A linked to B, B linked to C? Do you allow >> A linked to B, B linked to C, and C linked to A? Or A to B, B to C, C >> to D, A to D? If five emails, A, B, C, D and E, are all one person, >> there are a great many different ways to enter links. Many include >> redundancy. Many include "loops". >> >> If there is only one email for a person, does that mean that they do >> not appear in the table at all? >> >> One common limit used in such a situation is to order the pairing, so >> that the one of the emails with the lower value is in the first bucket >> and the higher in the second. That avoids at least some of the >> redundancy. Another would be to require the one on the right to be >> unique within the table, which would eliminate many more bits of >> confusion. >> >> Roy Harvey >> Beacon Falls, CT |
|||||||||||||||||||||||