|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
"Where date" problemI bumped into this issue: If I run the following statement : "select sum(total) from sales where date>=dateadd(m,-14,getdate())" I get the result in 2-3 seconds, but if I run the following: "select sum(total) from sales where date>='2005-01-01 00:00'" it takes more than 40 seconds. the "sales" table has approx. 40 million rows, and there is a clustered index with "date" column as the first column. This dousn't make sence to me, for 2 reasons: 1. both the statements should read almost the same data (actually, the faster one reads more data!) 2. the faster one has to compare the date of each row to a result of a formula, while the slower one compares to a constant. Any help will be appriciated. Thx, Gilad -- It's never too late to be who you might have been. - George Elliot I'm confused:
You say that > "select sum(total) from sales where date>=dateadd(m,-14,getdate())" is quick. Let us call this query A.And that > "select sum(total) from sales where date>='2005-01-01 00:00'" is slow. Let us call this query B.It seems to me like A will return more rows than B, but you say that: > 1. both the statements should read almost the same data (actually, the Since you have a clustered index on the date column, SQL Server will probably navigate the index > faster one reads more data!) tree to find the first rows that matches the condition and read the rows in the data pages from there. The more rows, the higher cost (more pages to read, more rows to be returned). To me, the fact that B is slower than A is not at all surprising, it is expected. You also say: > 2. the faster one has to compare the date of each row to a result of a The formula will only be calculated once for the whole query, and that calculation will only take a > formula, while the slower one compares to a constant. few microseconds. I suggest you study a bit on how indexes are structured and used if you still find this surprising. -- Show quoteTibor Karaszi, SQL Server MVP http://www.karaszi.com/sqlserver/default.asp http://www.solidqualitylearning.com/ Blog: http://solidqualitylearning.com/blogs/tibor/ "גלעד" <@discussions.microsoft.com> wrote in message news:D983E12D-A0FB-4023-B95C-3BF232B566CB@microsoft.com... > Hi all, > > I bumped into this issue: > > If I run the following statement : > "select sum(total) from sales where date>=dateadd(m,-14,getdate())" > I get the result in 2-3 seconds, but if I run the following: > "select sum(total) from sales where date>='2005-01-01 00:00'" > it takes more than 40 seconds. > > the "sales" table has approx. 40 million rows, and there is a clustered > index with "date" column as the first column. > > This dousn't make sence to me, for 2 reasons: > 1. both the statements should read almost the same data (actually, the > faster one reads more data!) > 2. the faster one has to compare the date of each row to a result of a > formula, while the slower one compares to a constant. > > Any help will be appriciated. > > Thx, > Gilad > > -- > It's never too late to be who you might have been. - George Elliot Tibor,
Since today it's Feb. 17, the dateadd function returns '2004-12-17'. This is why I said that query A will return more rows then the one where I specifically compared against '2005-01-01' (query B). However, the results are almost the same, because the difference is 13.5 months and 14 months. If the difference between the 2 statements was 3 seconds to 4 seconds, I wouldn't be puzzled, but 3 seconds to 40 - I can't understand. And the main issue is that query A, which reads 14 months of data, runs 3 seconds, and query B, which reads 13.5 months of data, runs 40 seconds! Any ideas? -- Show quoteIt''''''''s never too late to be who you might have been. - George Elliot "Tibor Karaszi" wrote: > I'm confused: > > You say that > > "select sum(total) from sales where date>=dateadd(m,-14,getdate())" > is quick. Let us call this query A. > > And that > > "select sum(total) from sales where date>='2005-01-01 00:00'" > is slow. Let us call this query B. > > It seems to me like A will return more rows than B, but you say that: > > 1. both the statements should read almost the same data (actually, the > > faster one reads more data!) > > Since you have a clustered index on the date column, SQL Server will probably navigate the index > tree to find the first rows that matches the condition and read the rows in the data pages from > there. The more rows, the higher cost (more pages to read, more rows to be returned). To me, the > fact that B is slower than A is not at all surprising, it is expected. > > You also say: > > 2. the faster one has to compare the date of each row to a result of a > > formula, while the slower one compares to a constant. > > The formula will only be calculated once for the whole query, and that calculation will only take a > few microseconds. > > I suggest you study a bit on how indexes are structured and used if you still find this surprising. > -- > Tibor Karaszi, SQL Server MVP > http://www.karaszi.com/sqlserver/default.asp > http://www.solidqualitylearning.com/ > Blog: http://solidqualitylearning.com/blogs/tibor/ > > > "גלעד" <@discussions.microsoft.com> wrote in message > news:D983E12D-A0FB-4023-B95C-3BF232B566CB@microsoft.com... > > Hi all, > > > > I bumped into this issue: > > > > If I run the following statement : > > "select sum(total) from sales where date>=dateadd(m,-14,getdate())" > > I get the result in 2-3 seconds, but if I run the following: > > "select sum(total) from sales where date>='2005-01-01 00:00'" > > it takes more than 40 seconds. > > > > the "sales" table has approx. 40 million rows, and there is a clustered > > index with "date" column as the first column. > > > > This dousn't make sence to me, for 2 reasons: > > 1. both the statements should read almost the same data (actually, the > > faster one reads more data!) > > 2. the faster one has to compare the date of each row to a result of a > > formula, while the slower one compares to a constant. > > > > Any help will be appriciated. > > > > Thx, > > Gilad > > > > -- > > It's never too late to be who you might have been. - George Elliot > > Sorry, I missed that you took a difference of months, for some reason I read it as days...
I now can understand why you are puzzled. Did you check the execution plan? Are they the same? -- Show quoteTibor Karaszi, SQL Server MVP http://www.karaszi.com/sqlserver/default.asp http://www.solidqualitylearning.com/ Blog: http://solidqualitylearning.com/blogs/tibor/ "גלעד" <@discussions.microsoft.com> wrote in message news:88EBEF92-3B02-4046-A55C-CA2BFFBAC81A@microsoft.com... > Tibor, > > Since today it's Feb. 17, the dateadd function returns '2004-12-17'. This is > why I said that query A will return more rows then the one where I > specifically compared against '2005-01-01' (query B). > > However, the results are almost the same, because the difference is 13.5 > months and 14 months. > > If the difference between the 2 statements was 3 seconds to 4 seconds, I > wouldn't be puzzled, but 3 seconds to 40 - I can't understand. > > And the main issue is that query A, which reads 14 months of data, runs 3 > seconds, and query B, which reads 13.5 months of data, runs 40 seconds! > > Any ideas? > -- > It''''''''s never too late to be who you might have been. - George Elliot > > > "Tibor Karaszi" wrote: > >> I'm confused: >> >> You say that >> > "select sum(total) from sales where date>=dateadd(m,-14,getdate())" >> is quick. Let us call this query A. >> >> And that >> > "select sum(total) from sales where date>='2005-01-01 00:00'" >> is slow. Let us call this query B. >> >> It seems to me like A will return more rows than B, but you say that: >> > 1. both the statements should read almost the same data (actually, the >> > faster one reads more data!) >> >> Since you have a clustered index on the date column, SQL Server will probably navigate the index >> tree to find the first rows that matches the condition and read the rows in the data pages from >> there. The more rows, the higher cost (more pages to read, more rows to be returned). To me, the >> fact that B is slower than A is not at all surprising, it is expected. >> >> You also say: >> > 2. the faster one has to compare the date of each row to a result of a >> > formula, while the slower one compares to a constant. >> >> The formula will only be calculated once for the whole query, and that calculation will only take >> a >> few microseconds. >> >> I suggest you study a bit on how indexes are structured and used if you still find this >> surprising. >> -- >> Tibor Karaszi, SQL Server MVP >> http://www.karaszi.com/sqlserver/default.asp >> http://www.solidqualitylearning.com/ >> Blog: http://solidqualitylearning.com/blogs/tibor/ >> >> >> "גלעד" <@discussions.microsoft.com> wrote in message >> news:D983E12D-A0FB-4023-B95C-3BF232B566CB@microsoft.com... >> > Hi all, >> > >> > I bumped into this issue: >> > >> > If I run the following statement : >> > "select sum(total) from sales where date>=dateadd(m,-14,getdate())" >> > I get the result in 2-3 seconds, but if I run the following: >> > "select sum(total) from sales where date>='2005-01-01 00:00'" >> > it takes more than 40 seconds. >> > >> > the "sales" table has approx. 40 million rows, and there is a clustered >> > index with "date" column as the first column. >> > >> > This dousn't make sence to me, for 2 reasons: >> > 1. both the statements should read almost the same data (actually, the >> > faster one reads more data!) >> > 2. the faster one has to compare the date of each row to a result of a >> > formula, while the slower one compares to a constant. >> > >> > Any help will be appriciated. >> > >> > Thx, >> > Gilad >> > >> > -- >> > It's never too late to be who you might have been. - George Elliot >> >> Tibor,
Found the problem. Query B chose a different index (the full query is more complicated). I use a table hint to use the clustered index and now it works just fine. Thanks for your help, Gilad -- Show quoteIt''''''''s never too late to be who you might have been. - George Elliot "Tibor Karaszi" wrote: > I'm confused: > > You say that > > "select sum(total) from sales where date>=dateadd(m,-14,getdate())" > is quick. Let us call this query A. > > And that > > "select sum(total) from sales where date>='2005-01-01 00:00'" > is slow. Let us call this query B. > > It seems to me like A will return more rows than B, but you say that: > > 1. both the statements should read almost the same data (actually, the > > faster one reads more data!) > > Since you have a clustered index on the date column, SQL Server will probably navigate the index > tree to find the first rows that matches the condition and read the rows in the data pages from > there. The more rows, the higher cost (more pages to read, more rows to be returned). To me, the > fact that B is slower than A is not at all surprising, it is expected. > > You also say: > > 2. the faster one has to compare the date of each row to a result of a > > formula, while the slower one compares to a constant. > > The formula will only be calculated once for the whole query, and that calculation will only take a > few microseconds. > > I suggest you study a bit on how indexes are structured and used if you still find this surprising. > -- > Tibor Karaszi, SQL Server MVP > http://www.karaszi.com/sqlserver/default.asp > http://www.solidqualitylearning.com/ > Blog: http://solidqualitylearning.com/blogs/tibor/ > > > "גלעד" <@discussions.microsoft.com> wrote in message > news:D983E12D-A0FB-4023-B95C-3BF232B566CB@microsoft.com... > > Hi all, > > > > I bumped into this issue: > > > > If I run the following statement : > > "select sum(total) from sales where date>=dateadd(m,-14,getdate())" > > I get the result in 2-3 seconds, but if I run the following: > > "select sum(total) from sales where date>='2005-01-01 00:00'" > > it takes more than 40 seconds. > > > > the "sales" table has approx. 40 million rows, and there is a clustered > > index with "date" column as the first column. > > > > This dousn't make sence to me, for 2 reasons: > > 1. both the statements should read almost the same data (actually, the > > faster one reads more data!) > > 2. the faster one has to compare the date of each row to a result of a > > formula, while the slower one compares to a constant. > > > > Any help will be appriciated. > > > > Thx, > > Gilad > > > > -- > > It's never too late to be who you might have been. - George Elliot > > Run the queries in Query Analyzer using the Show Execution Plan menu option.
Once done, compare the Physical Operation, Logical Operation, CPU Cost, and I/O Cost. Graphically Displaying the Execution Plan Using SQL Query Analyzer: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/optimsql/odp_tun_1_5pde.asp Also, take into consideration SQL Server's buffer cache. The execution plan may be different, but they both ultimately read the same pages, so I would expect whatever query run 2nd to execute faster. The only way to really compare the runtime duration of two similar queries is to stop / restart the SQL Server service prior to execution for each so that the buffers are completly purged. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/architec/8_ar_sa_7usz.asp Show quote "????" <@discussions.microsoft.com> wrote in message news:D983E12D-A0FB-4023-B95C-3BF232B566CB@microsoft.com... > Hi all, > > I bumped into this issue: > > If I run the following statement : > "select sum(total) from sales where date>=dateadd(m,-14,getdate())" > I get the result in 2-3 seconds, but if I run the following: > "select sum(total) from sales where date>='2005-01-01 00:00'" > it takes more than 40 seconds. > > the "sales" table has approx. 40 million rows, and there is a clustered > index with "date" column as the first column. > > This dousn't make sence to me, for 2 reasons: > 1. both the statements should read almost the same data (actually, the > faster one reads more data!) > 2. the faster one has to compare the date of each row to a result of a > formula, while the slower one compares to a constant. > > Any help will be appriciated. > > Thx, > Gilad > > -- > It's never too late to be who you might have been. - George Elliot JT,
Found the problem. Query B chose a different index (the full query is more complicated). I use a table hint to use the clustered index and now it works just fine. Thanks for your help, Gilad -- Show quoteIt''''''''s never too late to be who you might have been. - George Elliot "JT" wrote: > Run the queries in Query Analyzer using the Show Execution Plan menu option. > Once done, compare the Physical Operation, Logical Operation, CPU Cost, and > I/O Cost. > Graphically Displaying the Execution Plan Using SQL Query Analyzer: > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/optimsql/odp_tun_1_5pde.asp > > Also, take into consideration SQL Server's buffer cache. The execution plan > may be different, but they both ultimately read the same pages, so I would > expect whatever query run 2nd to execute faster. The only way to really > compare the runtime duration of two similar queries is to stop / restart the > SQL Server service prior to execution for each so that the buffers are > completly purged. > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/architec/8_ar_sa_7usz.asp > > "????" <@discussions.microsoft.com> wrote in message > news:D983E12D-A0FB-4023-B95C-3BF232B566CB@microsoft.com... > > Hi all, > > > > I bumped into this issue: > > > > If I run the following statement : > > "select sum(total) from sales where date>=dateadd(m,-14,getdate())" > > I get the result in 2-3 seconds, but if I run the following: > > "select sum(total) from sales where date>='2005-01-01 00:00'" > > it takes more than 40 seconds. > > > > the "sales" table has approx. 40 million rows, and there is a clustered > > index with "date" column as the first column. > > > > This dousn't make sence to me, for 2 reasons: > > 1. both the statements should read almost the same data (actually, the > > faster one reads more data!) > > 2. the faster one has to compare the date of each row to a result of a > > formula, while the slower one compares to a constant. > > > > Any help will be appriciated. > > > > Thx, > > Gilad > > > > -- > > It's never too late to be who you might have been. - George Elliot > > > If you can, try updating the statistics on sales, then retry your original
queries without the index hint: UPDATE STATISTICS sales WITH FULLSCAN select sum(total) from sales where date>=dateadd(m,-14,getdate()) select sum(total) from sales where date>='2005-01-01 00:00' It would be interesting to see what the different between the queries would be after the statistics were updated. -- Show quote"גלעד" wrote: > JT, > > Found the problem. > Query B chose a different index (the full query is more complicated). > I use a table hint to use the clustered index and now it works just fine. > > Thanks for your help, > Gilad > > -- > It''''''''s never too late to be who you might have been. - George Elliot > > > "JT" wrote: > > > Run the queries in Query Analyzer using the Show Execution Plan menu option. > > Once done, compare the Physical Operation, Logical Operation, CPU Cost, and > > I/O Cost. > > Graphically Displaying the Execution Plan Using SQL Query Analyzer: > > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/optimsql/odp_tun_1_5pde.asp > > > > Also, take into consideration SQL Server's buffer cache. The execution plan > > may be different, but they both ultimately read the same pages, so I would > > expect whatever query run 2nd to execute faster. The only way to really > > compare the runtime duration of two similar queries is to stop / restart the > > SQL Server service prior to execution for each so that the buffers are > > completly purged. > > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/architec/8_ar_sa_7usz.asp > > > > "????" <@discussions.microsoft.com> wrote in message > > news:D983E12D-A0FB-4023-B95C-3BF232B566CB@microsoft.com... > > > Hi all, > > > > > > I bumped into this issue: > > > > > > If I run the following statement : > > > "select sum(total) from sales where date>=dateadd(m,-14,getdate())" > > > I get the result in 2-3 seconds, but if I run the following: > > > "select sum(total) from sales where date>='2005-01-01 00:00'" > > > it takes more than 40 seconds. > > > > > > the "sales" table has approx. 40 million rows, and there is a clustered > > > index with "date" column as the first column. > > > > > > This dousn't make sence to me, for 2 reasons: > > > 1. both the statements should read almost the same data (actually, the > > > faster one reads more data!) > > > 2. the faster one has to compare the date of each row to a result of a > > > formula, while the slower one compares to a constant. > > > > > > Any help will be appriciated. > > > > > > Thx, > > > Gilad > > > > > > -- > > > It's never too late to be who you might have been. - George Elliot > > > > > > |
|||||||||||||||||||||||