Home All Groups Group Topic Archive Search About
Author
16 Sep 2005 3:50 PM
SimonC
Dear All,

Please can you tell me why the following query takes 4-5 mins:
SELECT * FROM db1.dbo.table1
WHERE column1 IN (SELECT column1 FROM db2.dbo.table1 WHERE column2 = 'x')

The IN () query
SELECT column1 FROM db2.dbo.table1 WHERE column2 = 'x'
gives:
field
a1
a2
a3
a4

Whereas the query below takes 0-1 sec.?
SELECT * FROM db1.dbo.table1
WHERE column1 IN ('a1','a2','a3','a4')

db1.dbo.table1 has 25,000,000 rows, an indexed identity column and column1
is indexed
db2.dbo.table1 has 1,300 rows and no indices.

Both dbs are on the same disk system (RAID5) and cannot go anywhere else.
Server is MS SBS 2003 (SP1), Twin 3.2gHz Xeon, 2gB RAM, 4x160gB SCSI RAID5

I worked around it using a stored procedure:
Building a string variable via a cursor from the IN () query.
The string variable is then used in the main query.
Running this stored procedure takes a second.

Cheers,
Simon.

Author
16 Sep 2005 3:54 PM
Jerry Spivey
Simon,

Have you tried using EXISTS instead of IN?

HTH

Jerry
Show quote
"SimonC" <Sim***@discussions.microsoft.com> wrote in message
news:6231236C-1AFD-44B6-892F-22541B50B6F0@microsoft.com...
> Dear All,
>
> Please can you tell me why the following query takes 4-5 mins:
> SELECT * FROM db1.dbo.table1
> WHERE column1 IN (SELECT column1 FROM db2.dbo.table1 WHERE column2 = 'x')
>
> The IN () query
> SELECT column1 FROM db2.dbo.table1 WHERE column2 = 'x'
> gives:
> field
> a1
> a2
> a3
> a4
>
> Whereas the query below takes 0-1 sec.?
> SELECT * FROM db1.dbo.table1
> WHERE column1 IN ('a1','a2','a3','a4')
>
> db1.dbo.table1 has 25,000,000 rows, an indexed identity column and column1
> is indexed
> db2.dbo.table1 has 1,300 rows and no indices.
>
> Both dbs are on the same disk system (RAID5) and cannot go anywhere else.
> Server is MS SBS 2003 (SP1), Twin 3.2gHz Xeon, 2gB RAM, 4x160gB SCSI RAID5
>
> I worked around it using a stored procedure:
> Building a string variable via a cursor from the IN () query.
> The string variable is then used in the main query.
> Running this stored procedure takes a second.
>
> Cheers,
> Simon.
Author
16 Sep 2005 4:21 PM
Perayu
It will speed up a lot if you can add index for column2 in db2.dbo.table1.

Perayu

Show quote
"SimonC" <Sim***@discussions.microsoft.com> wrote in message
news:6231236C-1AFD-44B6-892F-22541B50B6F0@microsoft.com...
> Dear All,
>
> Please can you tell me why the following query takes 4-5 mins:
> SELECT * FROM db1.dbo.table1
> WHERE column1 IN (SELECT column1 FROM db2.dbo.table1 WHERE column2 = 'x')
>
> The IN () query
> SELECT column1 FROM db2.dbo.table1 WHERE column2 = 'x'
> gives:
> field
> a1
> a2
> a3
> a4
>
> Whereas the query below takes 0-1 sec.?
> SELECT * FROM db1.dbo.table1
> WHERE column1 IN ('a1','a2','a3','a4')
>
> db1.dbo.table1 has 25,000,000 rows, an indexed identity column and column1
> is indexed
> db2.dbo.table1 has 1,300 rows and no indices.
>
> Both dbs are on the same disk system (RAID5) and cannot go anywhere else.
> Server is MS SBS 2003 (SP1), Twin 3.2gHz Xeon, 2gB RAM, 4x160gB SCSI RAID5
>
> I worked around it using a stored procedure:
> Building a string variable via a cursor from the IN () query.
> The string variable is then used in the main query.
> Running this stored procedure takes a second.
>
> Cheers,
> Simon.
Author
16 Sep 2005 4:33 PM
SimonC
PS
Index on identity col in db1.dbo.table1 is clustered, column1 index is
ordinary.
Did execution plans...:
Quick one (1sec):
Non Clustered Index Seek ->13k rows (0%)
then Bookmark Lookup -> 13k rows(100%)
Slow one (4min):
Branch1- Table Scan of db2.dbo.table1 -> 4 rows
Branch2- Clustered Index Scan -> 25m rows (76%)
Streams together
then Hash Match/Right Semi Join ->13k rows (24%)


Show quote
"SimonC" wrote:

> Dear All,
>
> Please can you tell me why the following query takes 4-5 mins:
> SELECT * FROM db1.dbo.table1
> WHERE column1 IN (SELECT column1 FROM db2.dbo.table1 WHERE column2 = 'x')
>
> The IN () query
> SELECT column1 FROM db2.dbo.table1 WHERE column2 = 'x'
> gives:
> field
> a1
> a2
> a3
> a4
>
> Whereas the query below takes 0-1 sec.?
> SELECT * FROM db1.dbo.table1
> WHERE column1 IN ('a1','a2','a3','a4')
>
> db1.dbo.table1 has 25,000,000 rows, an indexed identity column and column1
> is indexed
> db2.dbo.table1 has 1,300 rows and no indices.
>
> Both dbs are on the same disk system (RAID5) and cannot go anywhere else.
> Server is MS SBS 2003 (SP1), Twin 3.2gHz Xeon, 2gB RAM, 4x160gB SCSI RAID5
>
> I worked around it using a stored procedure:
> Building a string variable via a cursor from the IN () query.
> The string variable is then used in the main query.
> Running this stored procedure takes a second.
>
> Cheers,
> Simon.
Author
16 Sep 2005 5:09 PM
SimonC
Thanks for suggestions so far, but...
EXISTS gave same execution plan and time as IN (plus I had to think a bit
more about syntax!)
Indices on columns in db2.dbo.table1 had no effect.

Show quote
"SimonC" wrote:

> PS
> Index on identity col in db1.dbo.table1 is clustered, column1 index is
> ordinary.
> Did execution plans...:
> Quick one (1sec):
> Non Clustered Index Seek ->13k rows (0%)
> then Bookmark Lookup -> 13k rows(100%)
> Slow one (4min):
> Branch1- Table Scan of db2.dbo.table1 -> 4 rows
> Branch2- Clustered Index Scan -> 25m rows (76%)
> Streams together
> then Hash Match/Right Semi Join ->13k rows (24%)
>
>
> "SimonC" wrote:
>
> > Dear All,
> >
> > Please can you tell me why the following query takes 4-5 mins:
> > SELECT * FROM db1.dbo.table1
> > WHERE column1 IN (SELECT column1 FROM db2.dbo.table1 WHERE column2 = 'x')
> >
> > The IN () query
> > SELECT column1 FROM db2.dbo.table1 WHERE column2 = 'x'
> > gives:
> > field
> > a1
> > a2
> > a3
> > a4
> >
> > Whereas the query below takes 0-1 sec.?
> > SELECT * FROM db1.dbo.table1
> > WHERE column1 IN ('a1','a2','a3','a4')
> >
> > db1.dbo.table1 has 25,000,000 rows, an indexed identity column and column1
> > is indexed
> > db2.dbo.table1 has 1,300 rows and no indices.
> >
> > Both dbs are on the same disk system (RAID5) and cannot go anywhere else.
> > Server is MS SBS 2003 (SP1), Twin 3.2gHz Xeon, 2gB RAM, 4x160gB SCSI RAID5
> >
> > I worked around it using a stored procedure:
> > Building a string variable via a cursor from the IN () query.
> > The string variable is then used in the main query.
> > Running this stored procedure takes a second.
> >
> > Cheers,
> > Simon.
Author
16 Sep 2005 5:02 PM
Brian Selzer
The reason you're query is slow is that both IN and EXISTS are correlated
subqueries.  To speed this up I would try adding a covering index on
db2.dbo.table1(column2, column1).  You could also try OPTION(HASH JOIN) to
coerce the optimizer to flatten the correlated subquery into a join.

Show quote
"SimonC" <Sim***@discussions.microsoft.com> wrote in message
news:6231236C-1AFD-44B6-892F-22541B50B6F0@microsoft.com...
> Dear All,
>
> Please can you tell me why the following query takes 4-5 mins:
> SELECT * FROM db1.dbo.table1
> WHERE column1 IN (SELECT column1 FROM db2.dbo.table1 WHERE column2 = 'x')
>
> The IN () query
> SELECT column1 FROM db2.dbo.table1 WHERE column2 = 'x'
> gives:
> field
> a1
> a2
> a3
> a4
>
> Whereas the query below takes 0-1 sec.?
> SELECT * FROM db1.dbo.table1
> WHERE column1 IN ('a1','a2','a3','a4')
>
> db1.dbo.table1 has 25,000,000 rows, an indexed identity column and column1
> is indexed
> db2.dbo.table1 has 1,300 rows and no indices.
>
> Both dbs are on the same disk system (RAID5) and cannot go anywhere else.
> Server is MS SBS 2003 (SP1), Twin 3.2gHz Xeon, 2gB RAM, 4x160gB SCSI RAID5
>
> I worked around it using a stored procedure:
> Building a string variable via a cursor from the IN () query.
> The string variable is then used in the main query.
> Running this stored procedure takes a second.
>
> Cheers,
> Simon.
Author
16 Sep 2005 5:13 PM
SimonC
Brian,
Tried index on db2.dbo.table1(column2) and one on db2.dbo.table1(column1)
haven't tried db2.dbo.table1(column2,column1)
Tried doing join instead of subquery but had same (long) time

Show quote
"Brian Selzer" wrote:

> The reason you're query is slow is that both IN and EXISTS are correlated
> subqueries.  To speed this up I would try adding a covering index on
> db2.dbo.table1(column2, column1).  You could also try OPTION(HASH JOIN) to
> coerce the optimizer to flatten the correlated subquery into a join.
>
> "SimonC" <Sim***@discussions.microsoft.com> wrote in message
> news:6231236C-1AFD-44B6-892F-22541B50B6F0@microsoft.com...
> > Dear All,
> >
> > Please can you tell me why the following query takes 4-5 mins:
> > SELECT * FROM db1.dbo.table1
> > WHERE column1 IN (SELECT column1 FROM db2.dbo.table1 WHERE column2 = 'x')
> >
> > The IN () query
> > SELECT column1 FROM db2.dbo.table1 WHERE column2 = 'x'
> > gives:
> > field
> > a1
> > a2
> > a3
> > a4
> >
> > Whereas the query below takes 0-1 sec.?
> > SELECT * FROM db1.dbo.table1
> > WHERE column1 IN ('a1','a2','a3','a4')
> >
> > db1.dbo.table1 has 25,000,000 rows, an indexed identity column and column1
> > is indexed
> > db2.dbo.table1 has 1,300 rows and no indices.
> >
> > Both dbs are on the same disk system (RAID5) and cannot go anywhere else.
> > Server is MS SBS 2003 (SP1), Twin 3.2gHz Xeon, 2gB RAM, 4x160gB SCSI RAID5
> >
> > I worked around it using a stored procedure:
> > Building a string variable via a cursor from the IN () query.
> > The string variable is then used in the main query.
> > Running this stored procedure takes a second.
> >
> > Cheers,
> > Simon.
>
>
>
Author
16 Sep 2005 8:21 PM
Gert-Jan Strik
Simon,

you could try this:
- Make sure there is an index on db2.dbo.table1(column2, column1), or a
nonclustered index on db2.dbo.table1(column2) and a clustered index on
db2.dbo.table1(column1).
- Make sure there is an index on db1.dbo.table1(column1), preferably
clustered
- Rewrite the query as below:

SELECT db1.*
FROM (
  SELECT column1
  FROM db2.dbo.table1
  WHERE column2='x'
) AS db2
INNER JOIN db1.dbo.table1 AS db1
ON db1.Column1=db2.Column1

If the performance is still 4-5 minutes, then you could try to force a
particular plan, in this case:

SELECT db1.*
FROM (
  SELECT column1
  FROM db2.dbo.table1
  WHERE column2='x'
) AS db2
INNER LOOP JOIN db1.dbo.table1 AS db1
ON db1.Column1=db2.Column1
OPTION (force order)

Hope this helps,
Gert-Jan


SimonC wrote:
Show quote
>
> Dear All,
>
> Please can you tell me why the following query takes 4-5 mins:
> SELECT * FROM db1.dbo.table1
> WHERE column1 IN (SELECT column1 FROM db2.dbo.table1 WHERE column2 = 'x')
>
> The IN () query
> SELECT column1 FROM db2.dbo.table1 WHERE column2 = 'x'
> gives:
> field
> a1
> a2
> a3
> a4
>
> Whereas the query below takes 0-1 sec.?
> SELECT * FROM db1.dbo.table1
> WHERE column1 IN ('a1','a2','a3','a4')
>
> db1.dbo.table1 has 25,000,000 rows, an indexed identity column and column1
> is indexed
> db2.dbo.table1 has 1,300 rows and no indices.
>
> Both dbs are on the same disk system (RAID5) and cannot go anywhere else.
> Server is MS SBS 2003 (SP1), Twin 3.2gHz Xeon, 2gB RAM, 4x160gB SCSI RAID5
>
> I worked around it using a stored procedure:
> Building a string variable via a cursor from the IN () query.
> The string variable is then used in the main query.
> Running this stored procedure takes a second.
>
> Cheers,
> Simon.

AddThis Social Bookmark Button