Home All Groups Group Topic Archive Search About

How to find gaps in sequential key

Author
28 Jul 2005 5:04 PM
D Babin
Is it possible to write an SQL query to find gaps in a sequential key field?

Key Field
7
8
10
11
14

I would like the query to return the gaps
9
12
13

Or better yet, the range of the gaps
9,9
12,13

Any suggestions?

Author
28 Jul 2005 5:10 PM
Aaron Bertrand [SQL Server MVP]
Yes, use a numbers table.

http://www.aspfaq.com/2516



Show quote
"D Babin" <DBa***@discussions.microsoft.com> wrote in message
news:485C3250-0E8C-4217-9984-92D86B7C19DA@microsoft.com...
> Is it possible to write an SQL query to find gaps in a sequential key
> field?
>
> Key Field
> 7
> 8
> 10
> 11
> 14
>
> I would like the query to return the gaps
> 9
> 12
> 13
>
> Or better yet, the range of the gaps
> 9,9
> 12,13
>
> Any suggestions?
Author
28 Jul 2005 5:14 PM
Anith Sen
Do:

SELECT t1.col + 1 AS "start",
       MIN( t2.col ) - 1 AS "end"
  FROM tbl t1
INNER JOIN tbl t2
    ON t1.col < t2.col
GROUP BY t1.col
HAVING MIN( t2.col ) - t1.col > 1 ;

--
Anith
Author
28 Jul 2005 5:15 PM
David Portas
Here's one method:

SELECT T1.keycol+1, MIN(T2.keycol)-1
FROM YourTable AS T1
JOIN YourTable AS T2
  ON T1.keycol < T2.keycol
GROUP BY T1.keycol
HAVING T1.keycol+1 < MIN(T2.keycol)

--
David Portas
SQL Server MVP
--
Author
28 Jul 2005 10:09 PM
frank chang
Equivalently in SQL-92 syntax

select t1.field + 1 as "begin", t2.field - 1 as "end"
from testa t1
inner join testa t2 on t1.field < t2.field
and t2.field - t1.field > 1
and
not exists (select a.field from testa a
where a.field > t1.field and a.field < t2.field)

Show quote
"D Babin" wrote:

> Is it possible to write an SQL query to find gaps in a sequential key field?
>
> Key Field
> 7
> 8
> 10
> 11
> 14
>
> I would like the query to return the gaps
> 9
> 12
> 13
>
> Or better yet, the range of the gaps
> 9,9
> 12,13
>
> Any suggestions?
Author
29 Jul 2005 5:01 AM
jsfromynr
Hi D Babin,
create table YourTable(keycol int)
insert into YourTable values(7)
--insert into YourTable values(8 )
--insert into YourTable values(9 )
insert into YourTable values(11 )
insert into YourTable values(14 )

-- The following query give wrong results when 8 and 9 values are not
there
SELECT T1.keycol+1, MIN(T2.keycol)-1
FROM YourTable AS T1
JOIN YourTable AS T2
  ON T1.keycol < T2.keycol
GROUP BY T1.keycol
HAVING T1.keycol+1 < MIN(T2.keycol)

-- If you are looking for a single column Then I go with Aaron
Bertrand's Solution
-- Here sequence is an auxiallry table having numbers from 1-99999
select seq from sequence S1,(select  min(keycol) M1,max (keycol) M2
from YourTable) S2
where S1.seq between S2.M1 and S2.M2
and S1.SEQ NOT IN (SELECT keycol from YourTable)

drop table yourtable

With warm regards
Jatinder Singh
D Babin wrote:
Show quote
> Is it possible to write an SQL query to find gaps in a sequential key field?
>
> Key Field
> 7
> 8
> 10
> 11
> 14
>
> I would like the query to return the gaps
> 9
> 12
> 13
>
> Or better yet, the range of the gaps
> 9,9
> 12,13
>
> Any suggestions?
Author
29 Jul 2005 2:00 PM
frank chang
Jatinder Singh, I read your previous post and i have some questions for you
about your post

> -- The following query give wrong results when 8 and 9 values are not
> there
> SELECT T1.keycol+1, MIN(T2.keycol)-1
>  FROM YourTable AS T1
>  JOIN YourTable AS T2
>   ON T1.keycol < T2.keycol
>  GROUP BY T1.keycol
>  HAVING T1.keycol+1 < MIN(T2.keycol)
>

I tested this query by David Portas and it works fine even when the 8 and 9
values are not there.

> -- Here sequence is an auxiallry table having numbers from 1-99999
> select seq from sequence S1,(select  min(keycol) M1,max (keycol) M2
> from YourTable) S2
> where S1.seq between S2.M1 and S2.M2
> and S1.SEQ NOT IN (SELECT keycol from YourTable)

Your query  result set does not show that the beginning and end of the range
gaps.
Also, why are you creating an auxiliary sequence table from 1 to 99999? It
just uses more memory in SQL Server cache that could be used by some more
important table. Thank you.



Show quote
"jsfromynr" wrote:

> Hi D Babin,
> create table YourTable(keycol int)
> insert into YourTable values(7)
> --insert into YourTable values(8 )
> --insert into YourTable values(9 )
> insert into YourTable values(11 )
> insert into YourTable values(14 )
>
> -- The following query give wrong results when 8 and 9 values are not
> there
> SELECT T1.keycol+1, MIN(T2.keycol)-1
>  FROM YourTable AS T1
>  JOIN YourTable AS T2
>   ON T1.keycol < T2.keycol
>  GROUP BY T1.keycol
>  HAVING T1.keycol+1 < MIN(T2.keycol)
>
> -- If you are looking for a single column Then I go with Aaron
> Bertrand's Solution
> -- Here sequence is an auxiallry table having numbers from 1-99999
> select seq from sequence S1,(select  min(keycol) M1,max (keycol) M2
> from YourTable) S2
> where S1.seq between S2.M1 and S2.M2
> and S1.SEQ NOT IN (SELECT keycol from YourTable)
>
> drop table yourtable
>
> With warm regards
> Jatinder Singh
Author
29 Jul 2005 2:06 PM
Aaron Bertrand [SQL Server MVP]
> Also, why are you creating an auxiliary sequence table from 1 to 99999? It
> just uses more memory in SQL Server cache that could be used by some more
> important table.

A numbers table is probably more useful, and less of a burden
performance-wise, than you seem to think.

http://www.aspfaq.com/2516
Author
29 Jul 2005 4:21 PM
frank chang
Aaron Bertrand, Thank you for your reply. I am trying to identify what the
benefits of the numbers table technique are using  Jatinder Singh's select
statement

select seq from sequence S1,(select min(keycol) M1,max (keycol) M2
from YourTable) S2
where S1.seq between S2.M1 and S2.M2
and S1.SEQ NOT IN (SELECT keycol from YourTable)

if we use

CREATE TABLE SEQUENCE
(
    SEQ INT IDENTITY(1,1) PRIMARY KEY CLUSTERED
)

the Sql server execution plan show that  a Clustered Index Seek is used for
"where S1.seq between S2.M1 and S2.M2".

if we use

CREATE TABLE SEQUENCE
(
    SEQ INT IDENTITY(1,1)
)

the SQL Server Execution Plan shows that a table scan is used for
"where S1.seq between S2.M1 and S2.M2".

So, it appears that the primary clustered index on the number tables improves
the SQL Server execution plan. But how can I see the real benefits of the
numbered
table approach? Maybe, SQL Profiler can reveal them. Again, I don't have the
actual data set that you experimented with on your blog but I am curious how
many rows were in it. Could you please identifity the specific benefits of
the numbered table
approach in this type application(i.e. finding gaps)? Thank you.







Show quote
"Aaron Bertrand [SQL Server MVP]" wrote:

> > Also, why are you creating an auxiliary sequence table from 1 to 99999? It
> > just uses more memory in SQL Server cache that could be used by some more
> > important table.
>
> A numbers table is probably more useful, and less of a burden
> performance-wise, than you seem to think.
>
> http://www.aspfaq.com/2516
>
>
>
Author
1 Aug 2005 5:40 AM
jsfromynr
Hi Frank,
     Sorry for responding to your post too late, I gave answer for the
first part
Is it possible to write an SQL query to find gaps in a sequential key
field?
Key Field
7
8
10
11
14


Part (1) I would like the query to return the gaps
9
12
13

Part (2) Or better yet, the range of the gaps
9,9
12,13


I didnot tried Part (2) for which David's or Anith's Query is
excellent. I was just showing a way to use Auxillary table. My sincere
apology for not attempting it in whole.

With warm regards
Jatinder Singh


frank chang wrote:
Show quote
> Jatinder Singh, I read your previous post and i have some questions for you
> about your post
>
> > -- The following query give wrong results when 8 and 9 values are not
> > there
> > SELECT T1.keycol+1, MIN(T2.keycol)-1
> >  FROM YourTable AS T1
> >  JOIN YourTable AS T2
> >   ON T1.keycol < T2.keycol
> >  GROUP BY T1.keycol
> >  HAVING T1.keycol+1 < MIN(T2.keycol)
> >
>
> I tested this query by David Portas and it works fine even when the 8 and 9
> values are not there.
>
> > -- Here sequence is an auxiallry table having numbers from 1-99999
> > select seq from sequence S1,(select  min(keycol) M1,max (keycol) M2
> > from YourTable) S2
> > where S1.seq between S2.M1 and S2.M2
> > and S1.SEQ NOT IN (SELECT keycol from YourTable)
>
> Your query  result set does not show that the beginning and end of the range
> gaps.
> Also, why are you creating an auxiliary sequence table from 1 to 99999? It
> just uses more memory in SQL Server cache that could be used by some more
> important table. Thank you.
>
>
>
> "jsfromynr" wrote:
>
> > Hi D Babin,
> > create table YourTable(keycol int)
> > insert into YourTable values(7)
> > --insert into YourTable values(8 )
> > --insert into YourTable values(9 )
> > insert into YourTable values(11 )
> > insert into YourTable values(14 )
> >
> > -- The following query give wrong results when 8 and 9 values are not
> > there
> > SELECT T1.keycol+1, MIN(T2.keycol)-1
> >  FROM YourTable AS T1
> >  JOIN YourTable AS T2
> >   ON T1.keycol < T2.keycol
> >  GROUP BY T1.keycol
> >  HAVING T1.keycol+1 < MIN(T2.keycol)
> >
> > -- If you are looking for a single column Then I go with Aaron
> > Bertrand's Solution
> > -- Here sequence is an auxiallry table having numbers from 1-99999
> > select seq from sequence S1,(select  min(keycol) M1,max (keycol) M2
> > from YourTable) S2
> > where S1.seq between S2.M1 and S2.M2
> > and S1.SEQ NOT IN (SELECT keycol from YourTable)
> >
> > drop table yourtable
> >
> > With warm regards
> > Jatinder Singh
Author
29 Jul 2005 8:33 PM
frank chang
Jatinder Singh, I modified your query (using Aaron Bertrand's suggestion) to
select the beginning and end of the range gaps (in sequential order):

select number from numberstest S1,
(select  t1.keycol M1, min (t2.keycol) M2
from testing t1, testing t2
where t1.keycol + 1 < t2.keycol
group by t1.keycol
) S2 -- S2 is an derived table which avoids having to use temporary table
where S1.number between S2.M1 and S2.M2
and  NOT EXISTS (SELECT keycol from testing where keycol = S1.number)

Please tell me this if this runs faster than my original query from yesterday:

select t1.keycol + 1 as "begin", t2.keycol - 1 as "end"
from testing t1
inner join testing t2 on t1.keycol  < t2.keycol
and t2.keycol - t1.keycol > 1
and
not exists (select a.keycol from testing a
where a.keycol > t1.keycol and a.keycol < t2.keycol)

Thank you.


Show quote
"jsfromynr" wrote:

> Hi D Babin,
> create table YourTable(keycol int)
> insert into YourTable values(7)
> --insert into YourTable values(8 )
> --insert into YourTable values(9 )
> insert into YourTable values(11 )
> insert into YourTable values(14 )
>
> -- The following query give wrong results when 8 and 9 values are not
> there
> SELECT T1.keycol+1, MIN(T2.keycol)-1
>  FROM YourTable AS T1
>  JOIN YourTable AS T2
>   ON T1.keycol < T2.keycol
>  GROUP BY T1.keycol
>  HAVING T1.keycol+1 < MIN(T2.keycol)
>
> -- If you are looking for a single column Then I go with Aaron
> Bertrand's Solution
> -- Here sequence is an auxiallry table having numbers from 1-99999
> select seq from sequence S1,(select  min(keycol) M1,max (keycol) M2
> from YourTable) S2
> where S1.seq between S2.M1 and S2.M2
> and S1.SEQ NOT IN (SELECT keycol from YourTable)
>
> drop table yourtable
>
> With warm regards
> Jatinder Singh
> D Babin wrote:
> > Is it possible to write an SQL query to find gaps in a sequential key field?
> >
> > Key Field
> > 7
> > 8
> > 10
> > 11
> > 14
> >
> > I would like the query to return the gaps
> > 9
> > 12
> > 13
> >
> > Or better yet, the range of the gaps
> > 9,9
> > 12,13
> >
> > Any suggestions?
>
>
Author
29 Jul 2005 8:57 PM
Aaron Bertrand [SQL Server MVP]
> Please tell me this if this runs faster

Aren't you in a better position to tell us which runs faster?
Author
31 Jul 2005 12:04 PM
frank chang
Aaron Bertrand, Here are  the results using SQL Profiler

create table #tmp3
(
    rownum int identity,
    value  int
)

insert into #tmp3
select number  Y1 from numberstest S1,
(select  t1.keycol M1, min (t2.keycol) M2
from testing t1, testing t2
where t1.keycol + 1 < t2.keycol
group by t1.keycol
) S2
where S1.number between S2.M1 and S2.M2
and  NOT EXISTS (SELECT keycol from testing where keycol = S1.number)

This insert is required becuase the original poster wanted the beginning and
end of the range gaps in separate columns. Also , SQL Server does not allow
derived tables in correlated subqueries. This insert took 423 msec

select t1.value, t2.value
from #tmp3 t1
inner join #tmp3 t2 on t1.rownum + 1 = t2.rownum
and t1.rownum % 2 =1

This select took 385 msec

On the other hand, this solution takes 390 msec

select t1.keycol + 1 as "begin", t2.keycol - 1 as "end"
from testing t1
inner join testing t2 on t1.keycol < t2.keycol
and t2.keycol - t1.keycol > 1
and
not exists (select a.keycol from testing a
where a.keycol > t1.keycol and a.keycol < t2.keycol)


Here is the DDL for table testing

create table testing(keycol int)
insert into testing values(7)
insert into testing values(8 )
insert into testing values(9 )
insert into testing values(11 )
insert into testing values(14 )
insert into testing values(17)
insert into testing values(18 )
insert into testing values(19 )
insert into testing values(21 )
insert into testing values(24 )
insert into testing values(27)
insert into testing values(28 )
insert into testing values(29 )
insert into testing values(31 )
insert into testing values(34 )



Show quote
"Aaron Bertrand [SQL Server MVP]" wrote:

> > Please tell me this if this runs faster
>
> Aren't you in a better position to tell us which runs faster?
>
>
>
Author
31 Jul 2005 1:02 PM
Dieter Noeth
D Babin wrote:

Show quote
> Is it possible to write an SQL query to find gaps in a sequential key field?
>
> Key Field
> 7
> 8
> 10
> 11
> 14
>
> I would like the query to return the gaps
> 9
> 12
> 13
>
> Or better yet, the range of the gaps
> 9,9
> 12,13

I use this one, it's especially fast if there's a clustered index on keycol:

select
   keycol + 1,
   next_val - 1
from
(
select
   keycol,
   (select min( keycol) from testing t2
    where t2.keycol > t.keycol) as next_val
from testing t
) dt
where next_val - keycol > 1

Dieter
Author
7 Aug 2005 5:29 PM
frank chang
Dieter, Yes, your SQL query does fly when you have an clustered index on the
key field column. However, a relational table can have only one clustered
index . The designer of your relational table may have already used a
clustered index on other column(s) of the table.
          A Clustered index is B-tree data structure. I recently wrote a C++
class which implements a B-tree. You would be amazed at the complexity of the
B-tree when dealing with deletes, updates and inserts. For example, B-tree
page splits ,which occur during updates. deletes or inserts, consume precious
CPU and I/O resources.
         Dieter, I examined the execution plan for your SQL query without an
index for a table with 100000 rows. SQL Server creates a temporary index in
tempdb to process your query. This index is very large and consumes a lot of
CPU and I/O resources. As a result. the query takes almost an hour to run. I
would be glad to hear your thoughts on this issue. My email is
frank_chan***@hotmail.com.

Show quote
"Dieter Noeth" wrote:

> D Babin wrote:
>
> > Is it possible to write an SQL query to find gaps in a sequential key field?
> >
> > Key Field
> > 7
> > 8
> > 10
> > 11
> > 14
> >
> > I would like the query to return the gaps
> > 9
> > 12
> > 13
> >
> > Or better yet, the range of the gaps
> > 9,9
> > 12,13
>
> I use this one, it's especially fast if there's a clustered index on keycol:
>
> select
>    keycol + 1,
>    next_val - 1
> from
> (
> select
>    keycol,
>    (select min( keycol) from testing t2
>     where t2.keycol > t.keycol) as next_val
> from testing t
> ) dt
> where next_val - keycol > 1
>
> Dieter
>
Author
8 Aug 2005 6:26 AM
jsfromynr
Hi There,
You can try this link to get  understanding  of how David's and Anith's
Query Worked.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsqlpro04/html/sp04d8.asp

With warm regards
Jatinder Singh
Author
8 Aug 2005 4:56 PM
frank chang
Jatinder, Thank you for the link from Sql Server magazine. I just finished
reading it and compared Dieter's, David's and Anith's queries with the row
based approach using my test data of 100000 rows. The row-based solution ran
80% faster than the set-based solutions. So, I guess it always is not
necessary to use a clustered index or a set-based solution to improve the
processing time for large data sets.

Show quote
"jsfromynr" wrote:

> Hi There,
> You can try this link to get  understanding  of how David's and Anith's
> Query Worked.
>
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsqlpro04/html/sp04d8.asp
>
> With warm regards
> Jatinder Singh
>
>
Author
9 Aug 2005 6:26 AM
jsfromynr
Hi Frank,
   Can a Row Based Solution be faster than Set Based ? I donot think
so. Take a look at Table 1 where performance is compared.

With warm regards
Jatinder Singh

frank chang wrote:
Show quote
> Jatinder, Thank you for the link from Sql Server magazine. I just finished
> reading it and compared Dieter's, David's and Anith's queries with the row
> based approach using my test data of 100000 rows. The row-based solution ran
> 80% faster than the set-based solutions. So, I guess it always is not
> necessary to use a clustered index or a set-based solution to improve the
> processing time for large data sets.
>
> "jsfromynr" wrote:
>
> > Hi There,
> > You can try this link to get  understanding  of how David's and Anith's
> > Query Worked.
> >
> > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsqlpro04/html/sp04d8.asp
> >
> > With warm regards
> > Jatinder Singh
> >
> >
Author
9 Aug 2005 11:24 AM
frank chang
Jatinder, Are you reading the same article as I am? In the referenced link,
on Page 6, Alexander Kozak writes "But if you check the estimated execution
plan and realize that a sequence of numbers has high fragmentation, it will
probably behoove you to run a row-based solution." I followed Alexander
Kozak's test and created a 100000 row sequence by deleting every modulo 2 row.
           BTW, What is your opinion of the numbers solution that you and
Aaron Betrand originally proposed for this problem? Alexander Kozaks's paper
did not analyze this option.

  Best regards.
         frank_chan***@hotmail.com

Show quote
"jsfromynr" wrote:

> Hi Frank,
>    Can a Row Based Solution be faster than Set Based ? I donot think
> so. Take a look at Table 1 where performance is compared.
>
> With warm regards
> Jatinder Singh
>
> frank chang wrote:
> > Jatinder, Thank you for the link from Sql Server magazine. I just finished
> > reading it and compared Dieter's, David's and Anith's queries with the row
> > based approach using my test data of 100000 rows. The row-based solution ran
> > 80% faster than the set-based solutions. So, I guess it always is not
> > necessary to use a clustered index or a set-based solution to improve the
> > processing time for large data sets.
> >
> > "jsfromynr" wrote:
> >
> > > Hi There,
> > > You can try this link to get  understanding  of how David's and Anith's
> > > Query Worked.
> > >
> > > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsqlpro04/html/sp04d8.asp
> > >
> > > With warm regards
> > > Jatinder Singh
> > >
> > >
>
>
Author
10 Aug 2005 6:11 AM
jsfromynr
Hi Frank,

Table 1. Test results for the solutions in Listing 1 and Listing 5.
Numbers of rows (islands)
80317 (6)
5,199,633 (315)
31,210,600 (3611)
3,466,421 (1,733,317)

Row-based solution for islands (Listing 1)
34 sec   80317 (6)
3 min, 41 sec  5,199,633 (315)
21 min  31,210,600 (3611)
6 min, 41 sec  3,466,421 (1,733,317)

Set-based solution #1 for islands (Listing 5)
190 ms  80317 (6)
11 sec  5,199,633 (315)
2 min, 41 sec  31,210,600 (3611)
NA  3,466,421 (1,733,317)

This is what I am referring to .
With warm regards
Jatinder Singh
Author
10 Aug 2005 1:26 PM
frank chang
Jatinder, Yes, I read these results. But they apply do a dataset different
than mine. As I previously mentioned in my last post on page 6 of Mr. Kozak's
article, he confirms my findings on my data set. Have you had time to ponder
the merits of the number sequence technique that you and Aaron Bertrand
proposed.? Jatinder, are you with Sapient in Cambridge,MA or India? Best
regards.


Show quote
"jsfromynr" wrote:

> Hi Frank,
>
> Table 1. Test results for the solutions in Listing 1 and Listing 5.
> Numbers of rows (islands)
>  80317 (6)
>  5,199,633 (315)
>  31,210,600 (3611)
>  3,466,421 (1,733,317)
>
> Row-based solution for islands (Listing 1)
>  34 sec   80317 (6)
>  3 min, 41 sec  5,199,633 (315)
>  21 min  31,210,600 (3611)
>  6 min, 41 sec  3,466,421 (1,733,317)
>
> Set-based solution #1 for islands (Listing 5)
>  190 ms  80317 (6)
>  11 sec  5,199,633 (315)
>  2 min, 41 sec  31,210,600 (3611)
>  NA  3,466,421 (1,733,317)

> This is what I am referring to .
> With warm regards
> Jatinder Singh
>
>
Author
10 Aug 2005 2:39 PM
jsfromynr
Hi Frank,

I am from India. I had not tested it for performance or so.

With warm regards
Jatinder Singh

AddThis Social Bookmark Button