Home All Groups Group Topic Archive Search About

Question for T-SQL Gurus

Author
14 Jul 2006 4:05 PM
Dave
I have a table that relates emails together.  I would like to leverage
this information to create a rollup key that can be used to identify
the entire group.  Some groups will have one email some will have many.

Is there a way to approach this problem with a set based approach?  I
know I can do it with a cursor but I prefer not to.

Please feel free to offer schema and/or modeling changes.

Thanks for looking!



CREATE TABLE dbo.email_key (
    email_id INT identity
    ,email VARCHAR(100)
    ,rollup_id INT
)

CREATE TABLE dbo.email_email (
    email_email_id INT identity
    ,email_id INT
    ,rel_email_id INT
)

CREATE TABLE dbo.rollup_key (
    rollup_id INT identity
    ,primary_email_id INT)


TRUNCATE TABLE dbo.email_key
TRUNCATE TABLE dbo.email_email
TRUNCATE TABLE dbo.rollup_key

--SAMPLE DATA
INSERT INTO dbo.email_key (email) SELECT 's**@abc.com'
INSERT INTO dbo.email_key (email) SELECT 'd***@abc.com'
INSERT INTO dbo.email_key (email) SELECT 'j**@abc.com'
INSERT INTO dbo.email_key (email) SELECT 'm***@abc.com'

--SAMPLE DATA
INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 1,2
INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 1,3
INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 2,1
INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 3,1
INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 2,3

--ONE POSSIBLE END RESULT
INSERT INTO dbo.rollup_key (primary_email_id) SELECT 1
INSERT INTO dbo.rollup_key (primary_email_id) SELECT 4
    --RUN A COUPLE UPDATES TO SET email_key.rollup_id

Author
14 Jul 2006 5:45 PM
Martin C K Poon [MVP]
Dave,

I cannot fully understanding your problem, but I guess that it relates to
the "recursive common table expression (CTE)" feature that is newly added in
SQL Server 2005.

- WITH common_table_expression (Transact-SQL)
http://msdn2.microsoft.com/en-us/library/ms175972.aspx
J. Using multiple anchor and recursive members
D. Using a recursive CTE to display multiple levels of recursion
E. Using a recursive CTE to display two levels of recursion
F. Using a recursive CTE to display a hierarchical list

--
Martin Poon
MVP - SQL Server
====================================
Show quote
"Dave" <daveg***@gmail.com> ???
news:1152893141.610957.311880@m73g2000cwd.googlegroups.com ???...
> I have a table that relates emails together.  I would like to leverage
> this information to create a rollup key that can be used to identify
> the entire group.  Some groups will have one email some will have many.
>
> Is there a way to approach this problem with a set based approach?  I
> know I can do it with a cursor but I prefer not to.
>
> Please feel free to offer schema and/or modeling changes.
>
> Thanks for looking!
>
>
>
> CREATE TABLE dbo.email_key (
> email_id INT identity
> ,email VARCHAR(100)
> ,rollup_id INT
> )
>
> CREATE TABLE dbo.email_email (
> email_email_id INT identity
> ,email_id INT
> ,rel_email_id INT
> )
>
> CREATE TABLE dbo.rollup_key (
> rollup_id INT identity
> ,primary_email_id INT)
>
>
> TRUNCATE TABLE dbo.email_key
> TRUNCATE TABLE dbo.email_email
> TRUNCATE TABLE dbo.rollup_key
>
> --SAMPLE DATA
> INSERT INTO dbo.email_key (email) SELECT 's**@abc.com'
> INSERT INTO dbo.email_key (email) SELECT 'd***@abc.com'
> INSERT INTO dbo.email_key (email) SELECT 'j**@abc.com'
> INSERT INTO dbo.email_key (email) SELECT 'm***@abc.com'
>
> --SAMPLE DATA
> INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 1,2
> INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 1,3
> INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 2,1
> INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 3,1
> INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 2,3
>
> --ONE POSSIBLE END RESULT
> INSERT INTO dbo.rollup_key (primary_email_id) SELECT 1
> INSERT INTO dbo.rollup_key (primary_email_id) SELECT 4
> --RUN A COUPLE UPDATES TO SET email_key.rollup_id
>
Author
14 Jul 2006 6:10 PM
Dave
Thanks, I forgot to mention that we are using SQL Server 2000
Martin C K Poon [MVP] wrote:
Show quote
> Dave,
>
> I cannot fully understanding your problem, but I guess that it relates to
> the "recursive common table expression (CTE)" feature that is newly added in
> SQL Server 2005.
>
> - WITH common_table_expression (Transact-SQL)
> http://msdn2.microsoft.com/en-us/library/ms175972.aspx
> J. Using multiple anchor and recursive members
> D. Using a recursive CTE to display multiple levels of recursion
> E. Using a recursive CTE to display two levels of recursion
> F. Using a recursive CTE to display a hierarchical list
>
> --
> Martin Poon
> MVP - SQL Server
> ====================================
> "Dave" <daveg***@gmail.com> ???
> news:1152893141.610957.311880@m73g2000cwd.googlegroups.com ???...
> > I have a table that relates emails together.  I would like to leverage
> > this information to create a rollup key that can be used to identify
> > the entire group.  Some groups will have one email some will have many.
> >
> > Is there a way to approach this problem with a set based approach?  I
> > know I can do it with a cursor but I prefer not to.
> >
> > Please feel free to offer schema and/or modeling changes.
> >
> > Thanks for looking!
> >
> >
> >
> > CREATE TABLE dbo.email_key (
> > email_id INT identity
> > ,email VARCHAR(100)
> > ,rollup_id INT
> > )
> >
> > CREATE TABLE dbo.email_email (
> > email_email_id INT identity
> > ,email_id INT
> > ,rel_email_id INT
> > )
> >
> > CREATE TABLE dbo.rollup_key (
> > rollup_id INT identity
> > ,primary_email_id INT)
> >
> >
> > TRUNCATE TABLE dbo.email_key
> > TRUNCATE TABLE dbo.email_email
> > TRUNCATE TABLE dbo.rollup_key
> >
> > --SAMPLE DATA
> > INSERT INTO dbo.email_key (email) SELECT 's**@abc.com'
> > INSERT INTO dbo.email_key (email) SELECT 'd***@abc.com'
> > INSERT INTO dbo.email_key (email) SELECT 'j**@abc.com'
> > INSERT INTO dbo.email_key (email) SELECT 'm***@abc.com'
> >
> > --SAMPLE DATA
> > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 1,2
> > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 1,3
> > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 2,1
> > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 3,1
> > INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 2,3
> >
> > --ONE POSSIBLE END RESULT
> > INSERT INTO dbo.rollup_key (primary_email_id) SELECT 1
> > INSERT INTO dbo.rollup_key (primary_email_id) SELECT 4
> > --RUN A COUPLE UPDATES TO SET email_key.rollup_id
> >
Author
15 Jul 2006 5:53 PM
--CELKO--
>> I have a table that relates emails together.  I would like to leverage this information to create a rollup key that can be used to identify the entire group.  <<

Roll ups are mathematical operations done on a hierachy.  Google ROLLUP
and CUBE for details.  Are you talking about building a tree structure
to show a newsgroup thread?

>> Is there a way to approach this problem with a set based approach?  <<

Get a copy of TREES & HIERARCHIES IN SQL for several approaches for
this kind of problem.

You mgiht also want to get a basic data modeling book, so you will not
have table names  that contain "key" as affix, will stop using IDENTITY
and will craet tables with relational keys.  Also, think about bad
"email_email" is as a data element name.  The proper syntax isd "INSERT
INTO .. VALUES()", and not the proprietary SELECT you used.
Author
17 Jul 2006 3:55 PM
Dave
Does anyone have any suggestions?

Dave wrote:
Show quote
> I have a table that relates emails together.  I would like to leverage
> this information to create a rollup key that can be used to identify
> the entire group.  Some groups will have one email some will have many.
>
> Is there a way to approach this problem with a set based approach?  I
> know I can do it with a cursor but I prefer not to.
>
> Please feel free to offer schema and/or modeling changes.
>
> Thanks for looking!
>
>
>
> CREATE TABLE dbo.email_key (
>     email_id INT identity
>     ,email VARCHAR(100)
>     ,rollup_id INT
> )
>
> CREATE TABLE dbo.email_email (
>     email_email_id INT identity
>     ,email_id INT
>     ,rel_email_id INT
> )
>
> CREATE TABLE dbo.rollup_key (
>     rollup_id INT identity
>     ,primary_email_id INT)
>
>
> TRUNCATE TABLE dbo.email_key
> TRUNCATE TABLE dbo.email_email
> TRUNCATE TABLE dbo.rollup_key
>
> --SAMPLE DATA
> INSERT INTO dbo.email_key (email) SELECT 's**@abc.com'
> INSERT INTO dbo.email_key (email) SELECT 'd***@abc.com'
> INSERT INTO dbo.email_key (email) SELECT 'j**@abc.com'
> INSERT INTO dbo.email_key (email) SELECT 'm***@abc.com'
>
> --SAMPLE DATA
> INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 1,2
> INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 1,3
> INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 2,1
> INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 3,1
> INSERT INTO dbo.email_email (email_id,rel_email_id) SELECT 2,3
>
> --ONE POSSIBLE END RESULT
> INSERT INTO dbo.rollup_key (primary_email_id) SELECT 1
> INSERT INTO dbo.rollup_key (primary_email_id) SELECT 4
>     --RUN A COUPLE UPDATES TO SET email_key.rollup_id
Author
17 Jul 2006 10:52 PM
Roy Harvey
On 17 Jul 2006 08:55:56 -0700, "Dave" <daveg***@gmail.com> wrote:

>Does anyone have any suggestions?

Perhaps nobody understood what you were trying to do - I know I
didn't.  Perhaps explaining it again, with a few more words, would
clarify it.

Roy
Author
18 Jul 2006 3:38 PM
Dave
I want to be able to aggregate my reports based on the person not his
email.  Sometimes a person has more than one email so I would like to
create a new table that will uniquely identify each person.  I would
like to have an intersection table that will relate this person
(rollup) to all of his emails.

I have already created a relationship table for email called
dbo.email__email that relates the multiple emails.

Now I just need to create the person (rollup) table with a unique key
and then create the intersection table between person and email.  I can
accomplish this using a cursor but it takes forever.  I would like to
know if there is a set bases approach to this problem.




Roy Harvey wrote:
Show quote
> On 17 Jul 2006 08:55:56 -0700, "Dave" <daveg***@gmail.com> wrote:
>
> >Does anyone have any suggestions?
>
> Perhaps nobody understood what you were trying to do - I know I
> didn't.  Perhaps explaining it again, with a few more words, would
> clarify it.
>
> Roy
Author
18 Jul 2006 5:25 PM
Roy Harvey
On 18 Jul 2006 08:38:49 -0700, "Dave" <daveg***@gmail.com> wrote:

>I have already created a relationship table for email called
>dbo.email__email that relates the multiple emails.

>CREATE TABLE dbo.email_email (
>    email_email_id INT identity
>    ,email_id INT
>    ,rel_email_id INT
>)

What rules did you follow in creating this table?  What constraints
are in place to enforce those rules?  Do you allow A linked to B and B
linked to A?  Do you allow A linked to B, B linked to C?  Do you allow
A linked to B, B linked to C, and C linked to A?  Or A to B, B to C, C
to D, A to D?  If five emails, A, B, C, D and E, are all one person,
there are a great many different ways to enter links.  Many include
redundancy.  Many include "loops".

If there is only one email for a person, does that mean that they do
not appear in the table at all?

One common limit used in such a situation is to order the pairing, so
that the one of the emails with the lower value is in the first bucket
and the higher in the second.  That avoids at least some of the
redundancy.  Another would be to require the one on the right to be
unique within the table, which would eliminate many more bits of
confusion.

Roy Harvey
Beacon Falls, CT
Author
18 Jul 2006 7:48 PM
Dave
Thanks for your reply Roy!

The original question assumes the data in email__email is accurate.

Also the following logic exists
If A linked B
  then B linked A

If A linked B &
  B linked C
    then A linked C

Sample Data
email   rel_email
1         2
2         1
1         3
3         1
2         3
3         2

Single emails do not appear in this table though they can be easily
handled with a different query.


I have actually found a solution that I believe is based off your first
suggestion.  It has passed the initial QA and I think it is solid.  The
query went from 3 hours to less than 2 seconds :-)  (Code is posted
below).

Is there a good book that lists these different T-SQL techniques for
common problems?  I have learned a handful and this one will be a very
nice addition to my tool belt.

Thanks again!



--SAMPLE CODE
/*
email_id    rel_email_id
1        4
1       3
4        1
1        3
*/

--drop table #e
SELECT distinct
    CASE
        WHEN ee1.email_id < ee2.email_id THEN ee1.email__email_id
        ELSE ee2.email__email_id
    END email__email_id
    ,CASE
        WHEN ee1.email_id < ee2.email_id THEN ee1.email_id
        ELSE ee2.email_id
    END email_id
    ,CASE
        WHEN ee1.email_id < ee2.email_id THEN ee1.rel_email_id
        ELSE ee2.rel_email_id
    END rel_email_id
INTO #e
FROM dbo.email__email ee1
    INNER JOIN dbo.email__email ee2 ON ee2.email_id = ee1.rel_email_id
                                        AND ee2.rel_email_id = ee1.email_id

/*
email_id    rel_email_id
1        4
1       3
1        2
2        4
2        3
3        4
*/
--select count(*)
--from #e

--drop table #e2
SELECT email_id parent_email_id
    ,rel_email_id
INTO #e2
from #e
WHERE email_id NOT IN (SELECT rel_email_id FROM #e)






Roy Harvey wrote:
Show quote
> On 18 Jul 2006 08:38:49 -0700, "Dave" <daveg***@gmail.com> wrote:
>
> >I have already created a relationship table for email called
> >dbo.email__email that relates the multiple emails.
>
> >CREATE TABLE dbo.email_email (
> >    email_email_id INT identity
> >    ,email_id INT
> >    ,rel_email_id INT
> >)
>
> What rules did you follow in creating this table?  What constraints
> are in place to enforce those rules?  Do you allow A linked to B and B
> linked to A?  Do you allow A linked to B, B linked to C?  Do you allow
> A linked to B, B linked to C, and C linked to A?  Or A to B, B to C, C
> to D, A to D?  If five emails, A, B, C, D and E, are all one person,
> there are a great many different ways to enter links.  Many include
> redundancy.  Many include "loops".
>
> If there is only one email for a person, does that mean that they do
> not appear in the table at all?
>
> One common limit used in such a situation is to order the pairing, so
> that the one of the emails with the lower value is in the first bucket
> and the higher in the second.  That avoids at least some of the
> redundancy.  Another would be to require the one on the right to be
> unique within the table, which would eliminate many more bits of
> confusion.
>
> Roy Harvey
> Beacon Falls, CT
Author
18 Jul 2006 8:07 PM
Roy Harvey
If every email for a person is linked to EVERY other, BOTH ways, this
should be pretty simple.

SELECT min(email_id) as parent_email_id,
       rel_email_id
  FROM email_email
WHERE email_id < rel_email_id
GROUP BY rel_email_id
ORDER BY 1, 2

As for books, I bought Ken Henderson's "The Guru's Guide to
Transact-SQL" when it first came out, and while I did not read it in
detail - I already had a pretty good handle on things - it did look
very good.  I haven't kept up with the newer books out since then.

Roy Harvey
Beacon Falls, CT

Show quote
On 18 Jul 2006 12:48:29 -0700, "Dave" <daveg***@gmail.com> wrote:

>Thanks for your reply Roy!
>
>The original question assumes the data in email__email is accurate.
>
>Also the following logic exists
>If A linked B
>  then B linked A
>
>If A linked B &
>  B linked C
>    then A linked C
>
>Sample Data
>email   rel_email
>1         2
>2         1
>1         3
>3         1
>2         3
>3         2
>
>Single emails do not appear in this table though they can be easily
>handled with a different query.
>
>
>I have actually found a solution that I believe is based off your first
>suggestion.  It has passed the initial QA and I think it is solid.  The
>query went from 3 hours to less than 2 seconds :-)  (Code is posted
>below).
>
>Is there a good book that lists these different T-SQL techniques for
>common problems?  I have learned a handful and this one will be a very
>nice addition to my tool belt.
>
>Thanks again!
>
>
>
>--SAMPLE CODE
>/*
>email_id    rel_email_id
>1        4
>1       3
>4        1
>1        3
>*/
>
>--drop table #e
>SELECT distinct
>    CASE
>        WHEN ee1.email_id < ee2.email_id THEN ee1.email__email_id
>        ELSE ee2.email__email_id
>    END email__email_id
>    ,CASE
>        WHEN ee1.email_id < ee2.email_id THEN ee1.email_id
>        ELSE ee2.email_id
>    END email_id
>    ,CASE
>        WHEN ee1.email_id < ee2.email_id THEN ee1.rel_email_id
>        ELSE ee2.rel_email_id
>    END rel_email_id
>INTO #e
>FROM dbo.email__email ee1
>    INNER JOIN dbo.email__email ee2 ON ee2.email_id = ee1.rel_email_id
>                                        AND ee2.rel_email_id = ee1.email_id
>
>/*
>email_id    rel_email_id
>1        4
>1       3
>1        2
>2        4
>2        3
>3        4
>*/
>--select count(*)
>--from #e
>
>--drop table #e2
>SELECT email_id parent_email_id
>    ,rel_email_id
>INTO #e2
>from #e
>WHERE email_id NOT IN (SELECT rel_email_id FROM #e)
>
>
>
>
>
>
>Roy Harvey wrote:
>> On 18 Jul 2006 08:38:49 -0700, "Dave" <daveg***@gmail.com> wrote:
>>
>> >I have already created a relationship table for email called
>> >dbo.email__email that relates the multiple emails.
>>
>> >CREATE TABLE dbo.email_email (
>> >    email_email_id INT identity
>> >    ,email_id INT
>> >    ,rel_email_id INT
>> >)
>>
>> What rules did you follow in creating this table?  What constraints
>> are in place to enforce those rules?  Do you allow A linked to B and B
>> linked to A?  Do you allow A linked to B, B linked to C?  Do you allow
>> A linked to B, B linked to C, and C linked to A?  Or A to B, B to C, C
>> to D, A to D?  If five emails, A, B, C, D and E, are all one person,
>> there are a great many different ways to enter links.  Many include
>> redundancy.  Many include "loops".
>>
>> If there is only one email for a person, does that mean that they do
>> not appear in the table at all?
>>
>> One common limit used in such a situation is to order the pairing, so
>> that the one of the emails with the lower value is in the first bucket
>> and the higher in the second.  That avoids at least some of the
>> redundancy.  Another would be to require the one on the right to be
>> unique within the table, which would eliminate many more bits of
>> confusion.
>>
>> Roy Harvey
>> Beacon Falls, CT

AddThis Social Bookmark Button