|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Proper Index On a LIKE and GROUP BY queryI have very big table. I don't know the exact size, but the total size of the DB is ~80 GB and the table has ~18 million rows, its columns types are: int, varchar 500, float, char 500, char 100. I'm trying to do the following query and it takes a very long time: SELECT col1, count(*), sum(col3) FROM table WHERE col5 LIKE 'string%' GROUP BY col1 I thought that the most important thing in this kind of query is to build a clustered index on col1 since this is the column that has the group by clause but since the query is very slow I wonder whether I have the right index, or maybe I should build another one on col5 (that has the LIKE cluse). Will an index help when searching for a LIKE 'string%' values? Should I change the clustered index to another columns? I'm using SQL Server 2000. Thanks, Roee. Roee wrote:
> I thought that the most important thing in this kind of query is to build a You should definitely have an index on col5. Whether the optimizer uses> clustered index on col1 since this is the column that has the group by clause > but since the query is very slow I wonder whether I have the right index, or > maybe I should build another one on col5 (that has the LIKE cluse). > > Will an index help when searching for a LIKE 'string%' values? it depends on what you end up searching for. Doing a LIKE '%' will result in a table scan, whereas a LIKE 'abceef%' should produce an index seek. You could try a covering index (col5, col1, col3) as well. The best indexing is almost always "it depends", but I'd try a couple of
things. If the number of rows that would qualify for you "LIKE 'string%' clause is small (3% or less), you would probably benefit from a clustered indes on col5, or even a non-clustered index on col5, col1, and col3 (in that order) as a covering index. Only you can determine the best via testing. I would also re-evaluate the char(500) and char(100) datatypes. Char will always use the full width, and if much of your data in those two columns are smaller that the specified size, you're wasting space. Varchar may significantly reduce the table size, and improve query performance. HTH Vern Rabe Show quote "Roee" wrote: > Hello, > > I have very big table. I don't know the exact size, but the total size of > the DB is ~80 GB and the table has ~18 million rows, its columns types are: > int, varchar 500, float, char 500, char 100. > > I'm trying to do the following query and it takes a very long time: > SELECT col1, count(*), sum(col3) > FROM table > WHERE col5 LIKE 'string%' > GROUP BY col1 > > I thought that the most important thing in this kind of query is to build a > clustered index on col1 since this is the column that has the group by clause > but since the query is very slow I wonder whether I have the right index, or > maybe I should build another one on col5 (that has the LIKE cluse). > > Will an index help when searching for a LIKE 'string%' values? > Should I change the clustered index to another columns? > > I'm using SQL Server 2000. > > Thanks, > Roee. Hi,
First, it seems that the number of rows that qualify for the LIKE clause varies from 70,000 to 1.5 million, so it hard to say of the this number is smaller than 3% since 3% of 18 million is 540,000. (just to be clear, I'm talking about the total number of rows the query returns, not per each col1) But I will definitly try to change the char columns to varchar, and I will try to add the index on col5, col1, col3. Out of curiosity, both you and Chris suggested that the index should be on col3 as well, where obviously - since this column is summed - sql server needs to scan all the rows that qualify in order to calculate the sum result. So why is the index on this column important? Is it because the information is positioned on the index itself and there will be no need to access the actual rows of the table (on the disk I mean)? this is what you mean by "covering index"? Thanks again, Roee. Show quote "Vern Rabe" wrote: > The best indexing is almost always "it depends", but I'd try a couple of > things. If the number of rows that would qualify for you "LIKE 'string%' > clause is small (3% or less), you would probably benefit from a clustered > indes on col5, or even a non-clustered index on col5, col1, and col3 (in that > order) as a covering index. Only you can determine the best via testing. > > I would also re-evaluate the char(500) and char(100) datatypes. Char will > always use the full width, and if much of your data in those two columns are > smaller that the specified size, you're wasting space. Varchar may > significantly reduce the table size, and improve query performance. > > HTH > Vern Rabe > > "Roee" wrote: > > > Hello, > > > > I have very big table. I don't know the exact size, but the total size of > > the DB is ~80 GB and the table has ~18 million rows, its columns types are: > > int, varchar 500, float, char 500, char 100. > > > > I'm trying to do the following query and it takes a very long time: > > SELECT col1, count(*), sum(col3) > > FROM table > > WHERE col5 LIKE 'string%' > > GROUP BY col1 > > > > I thought that the most important thing in this kind of query is to build a > > clustered index on col1 since this is the column that has the group by clause > > but since the query is very slow I wonder whether I have the right index, or > > maybe I should build another one on col5 (that has the LIKE cluse). > > > > Will an index help when searching for a LIKE 'string%' values? > > Should I change the clustered index to another columns? > > > > I'm using SQL Server 2000. > > > > Thanks, > > Roee. > So why is the index on this column important? Is it because the Yes, that's exactly what a covering index does -- it contains all the > information > is positioned on the index itself and there will be no need to access the > actual rows of the table (on the disk I mean)? this is what you mean by > "covering index"? columns needed by query so that the base table or clustered index is never accessed resulting in fewer disk I/O operations Regards, Gail -- Show quoteGail Erickson [MS] SQL Server Documentation Team This posting is provided "AS IS" with no warranties, and confers no rights Download the latest version of Books Online from http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx "Roee" <R***@discussions.microsoft.com> wrote in message news:E61F2965-DE8B-42E2-8F53-9C7C2029F113@microsoft.com... > Hi, > > First, it seems that the number of rows that qualify for the LIKE clause > varies from 70,000 to 1.5 million, so it hard to say of the this number is > smaller than 3% since 3% of 18 million is 540,000. > (just to be clear, I'm talking about the total number of rows the query > returns, not per each col1) > > But I will definitly try to change the char columns to varchar, and I will > try to add the index on col5, col1, col3. > > Out of curiosity, both you and Chris suggested that the index should be on > col3 as well, where obviously - since this column is summed - sql server > needs to scan all the rows that qualify in order to calculate the sum > result. > So why is the index on this column important? Is it because the > information > is positioned on the index itself and there will be no need to access the > actual rows of the table (on the disk I mean)? this is what you mean by > "covering index"? > > Thanks again, > Roee. > > "Vern Rabe" wrote: > >> The best indexing is almost always "it depends", but I'd try a couple of >> things. If the number of rows that would qualify for you "LIKE 'string%' >> clause is small (3% or less), you would probably benefit from a clustered >> indes on col5, or even a non-clustered index on col5, col1, and col3 (in >> that >> order) as a covering index. Only you can determine the best via testing. >> >> I would also re-evaluate the char(500) and char(100) datatypes. Char will >> always use the full width, and if much of your data in those two columns >> are >> smaller that the specified size, you're wasting space. Varchar may >> significantly reduce the table size, and improve query performance. >> >> HTH >> Vern Rabe >> >> "Roee" wrote: >> >> > Hello, >> > >> > I have very big table. I don't know the exact size, but the total size >> > of >> > the DB is ~80 GB and the table has ~18 million rows, its columns types >> > are: >> > int, varchar 500, float, char 500, char 100. >> > >> > I'm trying to do the following query and it takes a very long time: >> > SELECT col1, count(*), sum(col3) >> > FROM table >> > WHERE col5 LIKE 'string%' >> > GROUP BY col1 >> > >> > I thought that the most important thing in this kind of query is to >> > build a >> > clustered index on col1 since this is the column that has the group by >> > clause >> > but since the query is very slow I wonder whether I have the right >> > index, or >> > maybe I should build another one on col5 (that has the LIKE cluse). >> > >> > Will an index help when searching for a LIKE 'string%' values? >> > Should I change the clustered index to another columns? >> > >> > I'm using SQL Server 2000. >> > >> > Thanks, >> > Roee. Yes, you got it.
A 'covering' index is one where all of the values sought are in the index itself and therefore there is no need to go to the actual data table for any values. An INDEX on col5, col1 is a good candidate to be selected by the query processor since the WHERE clause filters on col5, then groups on col1. By adding col3 to the mix, there is no need to go to the datatable for any of the requested values. One stop shopping. -- Show quoteArnie Rowland, Ph.D. Westwood Consulting, Inc Most good judgment comes from experience. Most experience comes from bad judgment. - Anonymous "Roee" <R***@discussions.microsoft.com> wrote in message news:E61F2965-DE8B-42E2-8F53-9C7C2029F113@microsoft.com... > Hi, > > First, it seems that the number of rows that qualify for the LIKE clause > varies from 70,000 to 1.5 million, so it hard to say of the this number is > smaller than 3% since 3% of 18 million is 540,000. > (just to be clear, I'm talking about the total number of rows the query > returns, not per each col1) > > But I will definitly try to change the char columns to varchar, and I will > try to add the index on col5, col1, col3. > > Out of curiosity, both you and Chris suggested that the index should be on > col3 as well, where obviously - since this column is summed - sql server > needs to scan all the rows that qualify in order to calculate the sum > result. > So why is the index on this column important? Is it because the > information > is positioned on the index itself and there will be no need to access the > actual rows of the table (on the disk I mean)? this is what you mean by > "covering index"? > > Thanks again, > Roee. > > "Vern Rabe" wrote: > >> The best indexing is almost always "it depends", but I'd try a couple of >> things. If the number of rows that would qualify for you "LIKE 'string%' >> clause is small (3% or less), you would probably benefit from a clustered >> indes on col5, or even a non-clustered index on col5, col1, and col3 (in >> that >> order) as a covering index. Only you can determine the best via testing. >> >> I would also re-evaluate the char(500) and char(100) datatypes. Char will >> always use the full width, and if much of your data in those two columns >> are >> smaller that the specified size, you're wasting space. Varchar may >> significantly reduce the table size, and improve query performance. >> >> HTH >> Vern Rabe >> >> "Roee" wrote: >> >> > Hello, >> > >> > I have very big table. I don't know the exact size, but the total size >> > of >> > the DB is ~80 GB and the table has ~18 million rows, its columns types >> > are: >> > int, varchar 500, float, char 500, char 100. >> > >> > I'm trying to do the following query and it takes a very long time: >> > SELECT col1, count(*), sum(col3) >> > FROM table >> > WHERE col5 LIKE 'string%' >> > GROUP BY col1 >> > >> > I thought that the most important thing in this kind of query is to >> > build a >> > clustered index on col1 since this is the column that has the group by >> > clause >> > but since the query is very slow I wonder whether I have the right >> > index, or >> > maybe I should build another one on col5 (that has the LIKE cluse). >> > >> > Will an index help when searching for a LIKE 'string%' values? >> > Should I change the clustered index to another columns? >> > >> > I'm using SQL Server 2000. >> > >> > Thanks, >> > Roee. And to add to what Gail and Arnie said, a covering index should be
non-clustered. A clustered index is the data itself and would defeat the purpose of a covering index. Vern Rabe Show quote "Roee" wrote: > Hi, > > First, it seems that the number of rows that qualify for the LIKE clause > varies from 70,000 to 1.5 million, so it hard to say of the this number is > smaller than 3% since 3% of 18 million is 540,000. > (just to be clear, I'm talking about the total number of rows the query > returns, not per each col1) > > But I will definitly try to change the char columns to varchar, and I will > try to add the index on col5, col1, col3. > > Out of curiosity, both you and Chris suggested that the index should be on > col3 as well, where obviously - since this column is summed - sql server > needs to scan all the rows that qualify in order to calculate the sum result. > So why is the index on this column important? Is it because the information > is positioned on the index itself and there will be no need to access the > actual rows of the table (on the disk I mean)? this is what you mean by > "covering index"? > > Thanks again, > Roee. > > "Vern Rabe" wrote: > > > The best indexing is almost always "it depends", but I'd try a couple of > > things. If the number of rows that would qualify for you "LIKE 'string%' > > clause is small (3% or less), you would probably benefit from a clustered > > indes on col5, or even a non-clustered index on col5, col1, and col3 (in that > > order) as a covering index. Only you can determine the best via testing. > > > > I would also re-evaluate the char(500) and char(100) datatypes. Char will > > always use the full width, and if much of your data in those two columns are > > smaller that the specified size, you're wasting space. Varchar may > > significantly reduce the table size, and improve query performance. > > > > HTH > > Vern Rabe > > > > "Roee" wrote: > > > > > Hello, > > > > > > I have very big table. I don't know the exact size, but the total size of > > > the DB is ~80 GB and the table has ~18 million rows, its columns types are: > > > int, varchar 500, float, char 500, char 100. > > > > > > I'm trying to do the following query and it takes a very long time: > > > SELECT col1, count(*), sum(col3) > > > FROM table > > > WHERE col5 LIKE 'string%' > > > GROUP BY col1 > > > > > > I thought that the most important thing in this kind of query is to build a > > > clustered index on col1 since this is the column that has the group by clause > > > but since the query is very slow I wonder whether I have the right index, or > > > maybe I should build another one on col5 (that has the LIKE cluse). > > > > > > Will an index help when searching for a LIKE 'string%' values? > > > Should I change the clustered index to another columns? > > > > > > I'm using SQL Server 2000. > > > > > > Thanks, > > > Roee. Thanks everybody.
I'm working on it now, will probably take a few hours (if not an entire day...) to change all the columns (there are actually 10 columns that needs to be altered since there are 5 tables of that kind - can't put all the data in one table, too big), and then build the indices. Hope for good results... Show quote :) "Vern Rabe" wrote: > And to add to what Gail and Arnie said, a covering index should be > non-clustered. A clustered index is the data itself and would defeat the > purpose of a covering index. > > Vern Rabe > > "Roee" wrote: > > > Hi, > > > > First, it seems that the number of rows that qualify for the LIKE clause > > varies from 70,000 to 1.5 million, so it hard to say of the this number is > > smaller than 3% since 3% of 18 million is 540,000. > > (just to be clear, I'm talking about the total number of rows the query > > returns, not per each col1) > > > > But I will definitly try to change the char columns to varchar, and I will > > try to add the index on col5, col1, col3. > > > > Out of curiosity, both you and Chris suggested that the index should be on > > col3 as well, where obviously - since this column is summed - sql server > > needs to scan all the rows that qualify in order to calculate the sum result. > > So why is the index on this column important? Is it because the information > > is positioned on the index itself and there will be no need to access the > > actual rows of the table (on the disk I mean)? this is what you mean by > > "covering index"? > > > > Thanks again, > > Roee. > > > > "Vern Rabe" wrote: > > > > > The best indexing is almost always "it depends", but I'd try a couple of > > > things. If the number of rows that would qualify for you "LIKE 'string%' > > > clause is small (3% or less), you would probably benefit from a clustered > > > indes on col5, or even a non-clustered index on col5, col1, and col3 (in that > > > order) as a covering index. Only you can determine the best via testing. > > > > > > I would also re-evaluate the char(500) and char(100) datatypes. Char will > > > always use the full width, and if much of your data in those two columns are > > > smaller that the specified size, you're wasting space. Varchar may > > > significantly reduce the table size, and improve query performance. > > > > > > HTH > > > Vern Rabe > > > > > > "Roee" wrote: > > > > > > > Hello, > > > > > > > > I have very big table. I don't know the exact size, but the total size of > > > > the DB is ~80 GB and the table has ~18 million rows, its columns types are: > > > > int, varchar 500, float, char 500, char 100. > > > > > > > > I'm trying to do the following query and it takes a very long time: > > > > SELECT col1, count(*), sum(col3) > > > > FROM table > > > > WHERE col5 LIKE 'string%' > > > > GROUP BY col1 > > > > > > > > I thought that the most important thing in this kind of query is to build a > > > > clustered index on col1 since this is the column that has the group by clause > > > > but since the query is very slow I wonder whether I have the right index, or > > > > maybe I should build another one on col5 (that has the LIKE cluse). > > > > > > > > Will an index help when searching for a LIKE 'string%' values? > > > > Should I change the clustered index to another columns? > > > > > > > > I'm using SQL Server 2000. > > > > > > > > Thanks, > > > > Roee. |
|||||||||||||||||||||||