Home All Groups Group Topic Archive Search About
Author
8 Jul 2005 12:06 AM
Mike Labosh
Maybe it's a personality flaw, but I have always had trouble figuring out
the meaning of sub-selects inside a WHERE NOT EXISTS clause.  My brain just
can't visualize it.

SELECT Stuff
FROM TableA
WHERE NOT EXISTS (
    SELECT OtherStuff FROM TableB
)

I am gasping to find a rephrasing of a general form that uses some kind of
OUTER JOIN WHERE xxx IS NULL.  THAT makes sense to me.  All these upside
down where not exists (select big long huge statement) queries we do here
are just unreadable to me.

Also, is there a performance difference between the above SQL statement and
the rephrasing of it that uses outer joins?  Or is it statement specific?

--
Peace & happy computing,

Mike Labosh, MCSD

"Mr. McKittrick, after very careful consideration, I have
come to the conclusion that this new system SUCKS."
-- General Barringer, "War Games"

Author
8 Jul 2005 12:39 AM
Brian Selzer
SELECT A.k, A.x
    FROM A
    WHERE NOT EXISTS (SELECT 1 FROM B WHERE B.k = a.k)

is logically equivalent to

SELECT A.k, A.x
    FROM A
     WHERE A.k IN (SELECT k FROM B)

is logically equivalent to

SELECT DISTINCT A.k, A.x
    FROM A
        LEFT OUTER JOIN B
            ON (B.k = A.k)
    WHERE B.k IS NULL

is logically equivalent to

SELECT A.k, A.x
    FROM A
        LEFT OUTER JOIN B
            ON (B.k = A.k)
    WHERE B.k IS NULL
    GROUP BY A.k, A.x

Note that the key value from A exists in each query.  That's because the
relationship between A and B could be 1..*, so with a left join you could
get several identical rows.

As far as performance, the query optimizer may produce different execution
plans.  The first is a correlated subquery and may produce an iteration.
The second is just a subquery, and will probably produce a hash or merge
join, but may also produce an iteration.  The third and fourth are left
outer joins, and will probably produce a hash or merge join, and most
probably the same execution plan.  Depending on available indexes the query
optimizer may produce the same plan for each of the above.  Your best bet is
to try them with real data and find out.


Show quote
"Mike Labosh" <mlab***@hotmail.com> wrote in message
news:e5zPsD1gFHA.2560@TK2MSFTNGP10.phx.gbl...
> Maybe it's a personality flaw, but I have always had trouble figuring out
> the meaning of sub-selects inside a WHERE NOT EXISTS clause.  My brain
just
> can't visualize it.
>
> SELECT Stuff
> FROM TableA
> WHERE NOT EXISTS (
>     SELECT OtherStuff FROM TableB
> )
>
> I am gasping to find a rephrasing of a general form that uses some kind of
> OUTER JOIN WHERE xxx IS NULL.  THAT makes sense to me.  All these upside
> down where not exists (select big long huge statement) queries we do here
> are just unreadable to me.
>
> Also, is there a performance difference between the above SQL statement
and
> the rephrasing of it that uses outer joins?  Or is it statement specific?
>
> --
> Peace & happy computing,
>
> Mike Labosh, MCSD
>
> "Mr. McKittrick, after very careful consideration, I have
> come to the conclusion that this new system SUCKS."
> -- General Barringer, "War Games"
>
>
Author
8 Jul 2005 3:22 PM
Gert-Jan Strik
Brian,

I have to correct this. If one uses an OUTER JOIN to determine the
non-related rows (the Outer Join variant of Not Exists), then a DISTINCT
or GROUP BY is not necessary. In fact, it will give incorrect results if
there are any duplicates in the outer table. This is demonstrated in the
example below.

CREATE TABLE #t1 (id int)
CREATE TABLE #t2 (id int)
INSERT INTO #t1 VALUES (1)
INSERT INTO #t1 VALUES (1)
INSERT INTO #t1 VALUES (2)

-- incorrect result
SELECT DISTINCT T1.id
FROM      #t1 T1
LEFT JOIN #t2 T2 ON T2.id=T1.id
WHERE T2.id IS NULL

-- incorrect result
SELECT T1.id
FROM      #t1 T1
LEFT JOIN #t2 T2 ON T2.id=T1.id
WHERE T2.id IS NULL
GROUP BY T1.id

-- correct result
SELECT T1.id
FROM      #t1 T1
LEFT JOIN #t2 T2 ON T2.id=T1.id
WHERE T2.id IS NULL

DROP TABLE #t1
DROP TABLE #t2

Gert-Jan


Brian Selzer wrote:
Show quote
>
> SELECT A.k, A.x
>     FROM A
>     WHERE NOT EXISTS (SELECT 1 FROM B WHERE B.k = a.k)
>
> is logically equivalent to
>
> SELECT A.k, A.x
>     FROM A
>      WHERE A.k IN (SELECT k FROM B)
>
> is logically equivalent to
>
> SELECT DISTINCT A.k, A.x
>     FROM A
>         LEFT OUTER JOIN B
>             ON (B.k = A.k)
>     WHERE B.k IS NULL
>
> is logically equivalent to
>
> SELECT A.k, A.x
>     FROM A
>         LEFT OUTER JOIN B
>             ON (B.k = A.k)
>     WHERE B.k IS NULL
>     GROUP BY A.k, A.x
>
> Note that the key value from A exists in each query.  That's because the
> relationship between A and B could be 1..*, so with a left join you could
> get several identical rows.
>
> As far as performance, the query optimizer may produce different execution
> plans.  The first is a correlated subquery and may produce an iteration.
> The second is just a subquery, and will probably produce a hash or merge
> join, but may also produce an iteration.  The third and fourth are left
> outer joins, and will probably produce a hash or merge join, and most
> probably the same execution plan.  Depending on available indexes the query
> optimizer may produce the same plan for each of the above.  Your best bet is
> to try them with real data and find out.
>
> "Mike Labosh" <mlab***@hotmail.com> wrote in message
> news:e5zPsD1gFHA.2560@TK2MSFTNGP10.phx.gbl...
> > Maybe it's a personality flaw, but I have always had trouble figuring out
> > the meaning of sub-selects inside a WHERE NOT EXISTS clause.  My brain
> just
> > can't visualize it.
> >
> > SELECT Stuff
> > FROM TableA
> > WHERE NOT EXISTS (
> >     SELECT OtherStuff FROM TableB
> > )
> >
> > I am gasping to find a rephrasing of a general form that uses some kind of
> > OUTER JOIN WHERE xxx IS NULL.  THAT makes sense to me.  All these upside
> > down where not exists (select big long huge statement) queries we do here
> > are just unreadable to me.
> >
> > Also, is there a performance difference between the above SQL statement
> and
> > the rephrasing of it that uses outer joins?  Or is it statement specific?
> >
> > --
> > Peace & happy computing,
> >
> > Mike Labosh, MCSD
> >
> > "Mr. McKittrick, after very careful consideration, I have
> > come to the conclusion that this new system SUCKS."
> > -- General Barringer, "War Games"
> >
> >
Author
8 Jul 2005 12:39 AM
--CELKO--
>> Maybe it's a Personality Flaw, but .. <<

Nah!  you have a wrong mental model!  Just read it as bad  English or
good Greek or formal logic.  This is a language problem.  "There ain't
nobody that can do this!" is valid but confusing to English speakers.

>> All these upside down where not exists (select big long huge statement) queries we do here are just unreadable to me. <<

Sorry, but that is how us math geeks think.  Have you bought  a basic
logic book yet?  I thought that my 8-th book might be "logic for
programmers" ... if there is an demand.

>>  is there a performance difference between the above SQL statement and the rephrasing of it that uses outer joins? <<

Depends on the SQL engine.  DB2 likes EXISTS() predicates and Ingres
optimizes  so it does n ot matter, etc.

AddThis Social Bookmark Button