|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
How to find gaps in sequential keyKey Field 7 8 10 11 14 I would like the query to return the gaps 9 12 13 Or better yet, the range of the gaps 9,9 12,13 Any suggestions? Yes, use a numbers table.
http://www.aspfaq.com/2516 Show quote "D Babin" <DBa***@discussions.microsoft.com> wrote in message news:485C3250-0E8C-4217-9984-92D86B7C19DA@microsoft.com... > Is it possible to write an SQL query to find gaps in a sequential key > field? > > Key Field > 7 > 8 > 10 > 11 > 14 > > I would like the query to return the gaps > 9 > 12 > 13 > > Or better yet, the range of the gaps > 9,9 > 12,13 > > Any suggestions? Do:
SELECT t1.col + 1 AS "start", MIN( t2.col ) - 1 AS "end" FROM tbl t1 INNER JOIN tbl t2 ON t1.col < t2.col GROUP BY t1.col HAVING MIN( t2.col ) - t1.col > 1 ; -- Anith Here's one method:
SELECT T1.keycol+1, MIN(T2.keycol)-1 FROM YourTable AS T1 JOIN YourTable AS T2 ON T1.keycol < T2.keycol GROUP BY T1.keycol HAVING T1.keycol+1 < MIN(T2.keycol) -- David Portas SQL Server MVP -- Equivalently in SQL-92 syntax
select t1.field + 1 as "begin", t2.field - 1 as "end" from testa t1 inner join testa t2 on t1.field < t2.field and t2.field - t1.field > 1 and not exists (select a.field from testa a where a.field > t1.field and a.field < t2.field) Show quote "D Babin" wrote: > Is it possible to write an SQL query to find gaps in a sequential key field? > > Key Field > 7 > 8 > 10 > 11 > 14 > > I would like the query to return the gaps > 9 > 12 > 13 > > Or better yet, the range of the gaps > 9,9 > 12,13 > > Any suggestions? Hi D Babin,
create table YourTable(keycol int) insert into YourTable values(7) --insert into YourTable values(8 ) --insert into YourTable values(9 ) insert into YourTable values(11 ) insert into YourTable values(14 ) -- The following query give wrong results when 8 and 9 values are not there SELECT T1.keycol+1, MIN(T2.keycol)-1 FROM YourTable AS T1 JOIN YourTable AS T2 ON T1.keycol < T2.keycol GROUP BY T1.keycol HAVING T1.keycol+1 < MIN(T2.keycol) -- If you are looking for a single column Then I go with Aaron Bertrand's Solution -- Here sequence is an auxiallry table having numbers from 1-99999 select seq from sequence S1,(select min(keycol) M1,max (keycol) M2 from YourTable) S2 where S1.seq between S2.M1 and S2.M2 and S1.SEQ NOT IN (SELECT keycol from YourTable) drop table yourtable With warm regards Jatinder Singh D Babin wrote: Show quote > Is it possible to write an SQL query to find gaps in a sequential key field? > > Key Field > 7 > 8 > 10 > 11 > 14 > > I would like the query to return the gaps > 9 > 12 > 13 > > Or better yet, the range of the gaps > 9,9 > 12,13 > > Any suggestions? Jatinder Singh, I read your previous post and i have some questions for you
about your post > -- The following query give wrong results when 8 and 9 values are not I tested this query by David Portas and it works fine even when the 8 and 9 > there > SELECT T1.keycol+1, MIN(T2.keycol)-1 > FROM YourTable AS T1 > JOIN YourTable AS T2 > ON T1.keycol < T2.keycol > GROUP BY T1.keycol > HAVING T1.keycol+1 < MIN(T2.keycol) > values are not there. > -- Here sequence is an auxiallry table having numbers from 1-99999 Your query result set does not show that the beginning and end of the range > select seq from sequence S1,(select min(keycol) M1,max (keycol) M2 > from YourTable) S2 > where S1.seq between S2.M1 and S2.M2 > and S1.SEQ NOT IN (SELECT keycol from YourTable) gaps. Also, why are you creating an auxiliary sequence table from 1 to 99999? It just uses more memory in SQL Server cache that could be used by some more important table. Thank you. Show quote "jsfromynr" wrote: > Hi D Babin, > create table YourTable(keycol int) > insert into YourTable values(7) > --insert into YourTable values(8 ) > --insert into YourTable values(9 ) > insert into YourTable values(11 ) > insert into YourTable values(14 ) > > -- The following query give wrong results when 8 and 9 values are not > there > SELECT T1.keycol+1, MIN(T2.keycol)-1 > FROM YourTable AS T1 > JOIN YourTable AS T2 > ON T1.keycol < T2.keycol > GROUP BY T1.keycol > HAVING T1.keycol+1 < MIN(T2.keycol) > > -- If you are looking for a single column Then I go with Aaron > Bertrand's Solution > -- Here sequence is an auxiallry table having numbers from 1-99999 > select seq from sequence S1,(select min(keycol) M1,max (keycol) M2 > from YourTable) S2 > where S1.seq between S2.M1 and S2.M2 > and S1.SEQ NOT IN (SELECT keycol from YourTable) > > drop table yourtable > > With warm regards > Jatinder Singh > Also, why are you creating an auxiliary sequence table from 1 to 99999? It A numbers table is probably more useful, and less of a burden > just uses more memory in SQL Server cache that could be used by some more > important table. performance-wise, than you seem to think. http://www.aspfaq.com/2516 Aaron Bertrand, Thank you for your reply. I am trying to identify what the
benefits of the numbers table technique are using Jatinder Singh's select statement select seq from sequence S1,(select min(keycol) M1,max (keycol) M2 from YourTable) S2 where S1.seq between S2.M1 and S2.M2 and S1.SEQ NOT IN (SELECT keycol from YourTable) if we use CREATE TABLE SEQUENCE ( SEQ INT IDENTITY(1,1) PRIMARY KEY CLUSTERED ) the Sql server execution plan show that a Clustered Index Seek is used for "where S1.seq between S2.M1 and S2.M2". if we use CREATE TABLE SEQUENCE ( SEQ INT IDENTITY(1,1) ) the SQL Server Execution Plan shows that a table scan is used for "where S1.seq between S2.M1 and S2.M2". So, it appears that the primary clustered index on the number tables improves the SQL Server execution plan. But how can I see the real benefits of the numbered table approach? Maybe, SQL Profiler can reveal them. Again, I don't have the actual data set that you experimented with on your blog but I am curious how many rows were in it. Could you please identifity the specific benefits of the numbered table approach in this type application(i.e. finding gaps)? Thank you. Show quote "Aaron Bertrand [SQL Server MVP]" wrote: > > Also, why are you creating an auxiliary sequence table from 1 to 99999? It > > just uses more memory in SQL Server cache that could be used by some more > > important table. > > A numbers table is probably more useful, and less of a burden > performance-wise, than you seem to think. > > http://www.aspfaq.com/2516 > > > Hi Frank,
Sorry for responding to your post too late, I gave answer for the first part Is it possible to write an SQL query to find gaps in a sequential key field? Key Field 7 8 10 11 14 Part (1) I would like the query to return the gaps 9 12 13 Part (2) Or better yet, the range of the gaps 9,9 12,13 I didnot tried Part (2) for which David's or Anith's Query is excellent. I was just showing a way to use Auxillary table. My sincere apology for not attempting it in whole. With warm regards Jatinder Singh frank chang wrote: Show quote > Jatinder Singh, I read your previous post and i have some questions for you > about your post > > > -- The following query give wrong results when 8 and 9 values are not > > there > > SELECT T1.keycol+1, MIN(T2.keycol)-1 > > FROM YourTable AS T1 > > JOIN YourTable AS T2 > > ON T1.keycol < T2.keycol > > GROUP BY T1.keycol > > HAVING T1.keycol+1 < MIN(T2.keycol) > > > > I tested this query by David Portas and it works fine even when the 8 and 9 > values are not there. > > > -- Here sequence is an auxiallry table having numbers from 1-99999 > > select seq from sequence S1,(select min(keycol) M1,max (keycol) M2 > > from YourTable) S2 > > where S1.seq between S2.M1 and S2.M2 > > and S1.SEQ NOT IN (SELECT keycol from YourTable) > > Your query result set does not show that the beginning and end of the range > gaps. > Also, why are you creating an auxiliary sequence table from 1 to 99999? It > just uses more memory in SQL Server cache that could be used by some more > important table. Thank you. > > > > "jsfromynr" wrote: > > > Hi D Babin, > > create table YourTable(keycol int) > > insert into YourTable values(7) > > --insert into YourTable values(8 ) > > --insert into YourTable values(9 ) > > insert into YourTable values(11 ) > > insert into YourTable values(14 ) > > > > -- The following query give wrong results when 8 and 9 values are not > > there > > SELECT T1.keycol+1, MIN(T2.keycol)-1 > > FROM YourTable AS T1 > > JOIN YourTable AS T2 > > ON T1.keycol < T2.keycol > > GROUP BY T1.keycol > > HAVING T1.keycol+1 < MIN(T2.keycol) > > > > -- If you are looking for a single column Then I go with Aaron > > Bertrand's Solution > > -- Here sequence is an auxiallry table having numbers from 1-99999 > > select seq from sequence S1,(select min(keycol) M1,max (keycol) M2 > > from YourTable) S2 > > where S1.seq between S2.M1 and S2.M2 > > and S1.SEQ NOT IN (SELECT keycol from YourTable) > > > > drop table yourtable > > > > With warm regards > > Jatinder Singh Jatinder Singh, I modified your query (using Aaron Bertrand's suggestion) to
select the beginning and end of the range gaps (in sequential order): select number from numberstest S1, (select t1.keycol M1, min (t2.keycol) M2 from testing t1, testing t2 where t1.keycol + 1 < t2.keycol group by t1.keycol ) S2 -- S2 is an derived table which avoids having to use temporary table where S1.number between S2.M1 and S2.M2 and NOT EXISTS (SELECT keycol from testing where keycol = S1.number) Please tell me this if this runs faster than my original query from yesterday: select t1.keycol + 1 as "begin", t2.keycol - 1 as "end" from testing t1 inner join testing t2 on t1.keycol < t2.keycol and t2.keycol - t1.keycol > 1 and not exists (select a.keycol from testing a where a.keycol > t1.keycol and a.keycol < t2.keycol) Thank you. Show quote "jsfromynr" wrote: > Hi D Babin, > create table YourTable(keycol int) > insert into YourTable values(7) > --insert into YourTable values(8 ) > --insert into YourTable values(9 ) > insert into YourTable values(11 ) > insert into YourTable values(14 ) > > -- The following query give wrong results when 8 and 9 values are not > there > SELECT T1.keycol+1, MIN(T2.keycol)-1 > FROM YourTable AS T1 > JOIN YourTable AS T2 > ON T1.keycol < T2.keycol > GROUP BY T1.keycol > HAVING T1.keycol+1 < MIN(T2.keycol) > > -- If you are looking for a single column Then I go with Aaron > Bertrand's Solution > -- Here sequence is an auxiallry table having numbers from 1-99999 > select seq from sequence S1,(select min(keycol) M1,max (keycol) M2 > from YourTable) S2 > where S1.seq between S2.M1 and S2.M2 > and S1.SEQ NOT IN (SELECT keycol from YourTable) > > drop table yourtable > > With warm regards > Jatinder Singh > D Babin wrote: > > Is it possible to write an SQL query to find gaps in a sequential key field? > > > > Key Field > > 7 > > 8 > > 10 > > 11 > > 14 > > > > I would like the query to return the gaps > > 9 > > 12 > > 13 > > > > Or better yet, the range of the gaps > > 9,9 > > 12,13 > > > > Any suggestions? > > > Please tell me this if this runs faster Aren't you in a better position to tell us which runs faster?Aaron Bertrand, Here are the results using SQL Profiler
create table #tmp3 ( rownum int identity, value int ) insert into #tmp3 select number Y1 from numberstest S1, (select t1.keycol M1, min (t2.keycol) M2 from testing t1, testing t2 where t1.keycol + 1 < t2.keycol group by t1.keycol ) S2 where S1.number between S2.M1 and S2.M2 and NOT EXISTS (SELECT keycol from testing where keycol = S1.number) This insert is required becuase the original poster wanted the beginning and end of the range gaps in separate columns. Also , SQL Server does not allow derived tables in correlated subqueries. This insert took 423 msec select t1.value, t2.value from #tmp3 t1 inner join #tmp3 t2 on t1.rownum + 1 = t2.rownum and t1.rownum % 2 =1 This select took 385 msec On the other hand, this solution takes 390 msec select t1.keycol + 1 as "begin", t2.keycol - 1 as "end" from testing t1 inner join testing t2 on t1.keycol < t2.keycol and t2.keycol - t1.keycol > 1 and not exists (select a.keycol from testing a where a.keycol > t1.keycol and a.keycol < t2.keycol) Here is the DDL for table testing create table testing(keycol int) insert into testing values(7) insert into testing values(8 ) insert into testing values(9 ) insert into testing values(11 ) insert into testing values(14 ) insert into testing values(17) insert into testing values(18 ) insert into testing values(19 ) insert into testing values(21 ) insert into testing values(24 ) insert into testing values(27) insert into testing values(28 ) insert into testing values(29 ) insert into testing values(31 ) insert into testing values(34 ) Show quote "Aaron Bertrand [SQL Server MVP]" wrote: > > Please tell me this if this runs faster > > Aren't you in a better position to tell us which runs faster? > > > D Babin wrote:
Show quote > Is it possible to write an SQL query to find gaps in a sequential key field? I use this one, it's especially fast if there's a clustered index on keycol:> > Key Field > 7 > 8 > 10 > 11 > 14 > > I would like the query to return the gaps > 9 > 12 > 13 > > Or better yet, the range of the gaps > 9,9 > 12,13 select keycol + 1, next_val - 1 from ( select keycol, (select min( keycol) from testing t2 where t2.keycol > t.keycol) as next_val from testing t ) dt where next_val - keycol > 1 Dieter Dieter, Yes, your SQL query does fly when you have an clustered index on the
key field column. However, a relational table can have only one clustered index . The designer of your relational table may have already used a clustered index on other column(s) of the table. A Clustered index is B-tree data structure. I recently wrote a C++ class which implements a B-tree. You would be amazed at the complexity of the B-tree when dealing with deletes, updates and inserts. For example, B-tree page splits ,which occur during updates. deletes or inserts, consume precious CPU and I/O resources. Dieter, I examined the execution plan for your SQL query without an index for a table with 100000 rows. SQL Server creates a temporary index in tempdb to process your query. This index is very large and consumes a lot of CPU and I/O resources. As a result. the query takes almost an hour to run. I would be glad to hear your thoughts on this issue. My email is frank_chan***@hotmail.com. Show quote "Dieter Noeth" wrote: > D Babin wrote: > > > Is it possible to write an SQL query to find gaps in a sequential key field? > > > > Key Field > > 7 > > 8 > > 10 > > 11 > > 14 > > > > I would like the query to return the gaps > > 9 > > 12 > > 13 > > > > Or better yet, the range of the gaps > > 9,9 > > 12,13 > > I use this one, it's especially fast if there's a clustered index on keycol: > > select > keycol + 1, > next_val - 1 > from > ( > select > keycol, > (select min( keycol) from testing t2 > where t2.keycol > t.keycol) as next_val > from testing t > ) dt > where next_val - keycol > 1 > > Dieter > Hi There,
You can try this link to get understanding of how David's and Anith's Query Worked. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsqlpro04/html/sp04d8.asp With warm regards Jatinder Singh Jatinder, Thank you for the link from Sql Server magazine. I just finished
reading it and compared Dieter's, David's and Anith's queries with the row based approach using my test data of 100000 rows. The row-based solution ran 80% faster than the set-based solutions. So, I guess it always is not necessary to use a clustered index or a set-based solution to improve the processing time for large data sets. Show quote "jsfromynr" wrote: > Hi There, > You can try this link to get understanding of how David's and Anith's > Query Worked. > > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsqlpro04/html/sp04d8.asp > > With warm regards > Jatinder Singh > > Hi Frank,
Can a Row Based Solution be faster than Set Based ? I donot think so. Take a look at Table 1 where performance is compared. With warm regards Jatinder Singh frank chang wrote: Show quote > Jatinder, Thank you for the link from Sql Server magazine. I just finished > reading it and compared Dieter's, David's and Anith's queries with the row > based approach using my test data of 100000 rows. The row-based solution ran > 80% faster than the set-based solutions. So, I guess it always is not > necessary to use a clustered index or a set-based solution to improve the > processing time for large data sets. > > "jsfromynr" wrote: > > > Hi There, > > You can try this link to get understanding of how David's and Anith's > > Query Worked. > > > > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsqlpro04/html/sp04d8.asp > > > > With warm regards > > Jatinder Singh > > > > Jatinder, Are you reading the same article as I am? In the referenced link,
on Page 6, Alexander Kozak writes "But if you check the estimated execution plan and realize that a sequence of numbers has high fragmentation, it will probably behoove you to run a row-based solution." I followed Alexander Kozak's test and created a 100000 row sequence by deleting every modulo 2 row. BTW, What is your opinion of the numbers solution that you and Aaron Betrand originally proposed for this problem? Alexander Kozaks's paper did not analyze this option. Best regards. frank_chan***@hotmail.com Show quote "jsfromynr" wrote: > Hi Frank, > Can a Row Based Solution be faster than Set Based ? I donot think > so. Take a look at Table 1 where performance is compared. > > With warm regards > Jatinder Singh > > frank chang wrote: > > Jatinder, Thank you for the link from Sql Server magazine. I just finished > > reading it and compared Dieter's, David's and Anith's queries with the row > > based approach using my test data of 100000 rows. The row-based solution ran > > 80% faster than the set-based solutions. So, I guess it always is not > > necessary to use a clustered index or a set-based solution to improve the > > processing time for large data sets. > > > > "jsfromynr" wrote: > > > > > Hi There, > > > You can try this link to get understanding of how David's and Anith's > > > Query Worked. > > > > > > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsqlpro04/html/sp04d8.asp > > > > > > With warm regards > > > Jatinder Singh > > > > > > > > Hi Frank,
Table 1. Test results for the solutions in Listing 1 and Listing 5. Numbers of rows (islands) 80317 (6) 5,199,633 (315) 31,210,600 (3611) 3,466,421 (1,733,317) Row-based solution for islands (Listing 1) 34 sec 80317 (6) 3 min, 41 sec 5,199,633 (315) 21 min 31,210,600 (3611) 6 min, 41 sec 3,466,421 (1,733,317) Set-based solution #1 for islands (Listing 5) 190 ms 80317 (6) 11 sec 5,199,633 (315) 2 min, 41 sec 31,210,600 (3611) NA 3,466,421 (1,733,317) This is what I am referring to . With warm regards Jatinder Singh Jatinder, Yes, I read these results. But they apply do a dataset different
than mine. As I previously mentioned in my last post on page 6 of Mr. Kozak's article, he confirms my findings on my data set. Have you had time to ponder the merits of the number sequence technique that you and Aaron Bertrand proposed.? Jatinder, are you with Sapient in Cambridge,MA or India? Best regards. Show quote "jsfromynr" wrote: > Hi Frank, > > Table 1. Test results for the solutions in Listing 1 and Listing 5. > Numbers of rows (islands) > 80317 (6) > 5,199,633 (315) > 31,210,600 (3611) > 3,466,421 (1,733,317) > > Row-based solution for islands (Listing 1) > 34 sec 80317 (6) > 3 min, 41 sec 5,199,633 (315) > 21 min 31,210,600 (3611) > 6 min, 41 sec 3,466,421 (1,733,317) > > Set-based solution #1 for islands (Listing 5) > 190 ms 80317 (6) > 11 sec 5,199,633 (315) > 2 min, 41 sec 31,210,600 (3611) > NA 3,466,421 (1,733,317) > > This is what I am referring to . > With warm regards > Jatinder Singh > > |
|||||||||||||||||||||||