Home All Groups Group Topic Archive Search About
Author
8 Jul 2005 1:16 PM
Robin Tucker
Reading the above thread about "Count" got me thinking about why it's
generally considered bad practice to use Count(*) (or Count anything come to
think of it).  Considering that rows are "fixed length", surely it's easy
for the database to count all the rows in a table without scanning the whole
table?

How does it work under the hood so to speak, as to make the case of counting
the rows in a table so slow?

Author
8 Jul 2005 1:26 PM
Aaron Bertrand [SQL Server MVP]
It does an index scan if an index exists.  On a large table, this can be
very expensive.  On a heap, it can be even worse because of the I/O required
to do a scan... though I'm sure you don't have any heaps, right?

I am not sure why speed is a concern.  How often are you blindly throwing
"SELECT COUNT(*) FROM MegaTable" at the database?

sysindexes is a better place to get this information if you are more worried
about speed than accuracy.  And you can improve accuracy by running DBCC
UPDATEUSAGE regularly.




Show quote
"Robin Tucker" <idontwanttobespammedanymore@reallyidont.com> wrote in
message news:daluao$bsj$1$8302bc10@news.demon.co.uk...
> Reading the above thread about "Count" got me thinking about why it's
> generally considered bad practice to use Count(*) (or Count anything come
> to think of it).  Considering that rows are "fixed length", surely it's
> easy for the database to count all the rows in a table without scanning
> the whole table?
>
> How does it work under the hood so to speak, as to make the case of
> counting the rows in a table so slow?
>
>
>
Author
8 Jul 2005 2:05 PM
Daniel Wilson
I often do "SELECT COUNT(*) From MegaTable WHERE Condition1, ...n"

If it really is a huge table & I really query it often (I'm thinking of one
place I query every 30 seconds and the table exceeds 1,000,000 records) I
create an index that covers the conditions in my WHERE clause.

Is there a better way to do this?

dwilson

Show quote
"Aaron Bertrand [SQL Server MVP]" <ten.xoc@dnartreb.noraa> wrote in message
news:ezYCOC8gFHA.2060@TK2MSFTNGP10.phx.gbl...
> It does an index scan if an index exists.  On a large table, this can be
> very expensive.  On a heap, it can be even worse because of the I/O
required
> to do a scan... though I'm sure you don't have any heaps, right?
>
> I am not sure why speed is a concern.  How often are you blindly throwing
> "SELECT COUNT(*) FROM MegaTable" at the database?
>
> sysindexes is a better place to get this information if you are more
worried
> about speed than accuracy.  And you can improve accuracy by running DBCC
> UPDATEUSAGE regularly.
>
>
>
>
> "Robin Tucker" <idontwanttobespammedanymore@reallyidont.com> wrote in
> message news:daluao$bsj$1$8302bc10@news.demon.co.uk...
> > Reading the above thread about "Count" got me thinking about why it's
> > generally considered bad practice to use Count(*) (or Count anything
come
> > to think of it).  Considering that rows are "fixed length", surely it's
> > easy for the database to count all the rows in a table without scanning
> > the whole table?
> >
> > How does it work under the hood so to speak, as to make the case of
> > counting the rows in a table so slow?
> >
> >
> >
>
>
Author
8 Jul 2005 1:28 PM
Mike Epprecht (SQL MVP)
Hi

COUNT(*) has to physically look at each row in a table and count it.

A Row does not match a page, as multiple rows can exist on a page so there
is no quick way to count it.

"Inside SQL Server 2000" by Kalen Delaney is a excellent reference on
internals.

Regards
--------------------------------
Mike Epprecht, Microsoft SQL Server MVP
Zurich, Switzerland

MVP Program: http://www.microsoft.com/mvp

Blog: http://www.msmvps.com/epprecht/



Show quote
"Robin Tucker" wrote:

> Reading the above thread about "Count" got me thinking about why it's
> generally considered bad practice to use Count(*) (or Count anything come to
> think of it).  Considering that rows are "fixed length", surely it's easy
> for the database to count all the rows in a table without scanning the whole
> table?
>
> How does it work under the hood so to speak, as to make the case of counting
> the rows in a table so slow?
>
>
>
>
Author
8 Jul 2005 1:34 PM
Steve Kass
Robin,

  If you are thinking that the table is stored like an
array, and the length can be divided by the size of
an element, you are misguided here.  To store data
this way would be terribly inefficient.  For example,
to delete 1 row from a 1,000,000-row table would
require moving the data from 500,000 rows on
average.  Instead, SQL Server deletes the target
row from its data page only, and only by inspecting
each data page can the query processor determine
how many rows are on each page, and therefore in
the table.

The storage model for a SQL Server table is much
different from that of an array.  COUNT(*) will be
evaluated using the leaf level of the narrowest table index,
but each data page of that index must be considered.

If it is important to evaluate COUNT(*) efficiently, you
can create an indexed view to maintain the row count.

See http://groups.google.co.uk/groups?q=AF626850-1EB2-4886-959A-040461F99ACD
for an example.

Steve Kass
Drew University


Robin Tucker wrote:

Show quote
>Reading the above thread about "Count" got me thinking about why it's
>generally considered bad practice to use Count(*) (or Count anything come to
>think of it).  Considering that rows are "fixed length", surely it's easy
>for the database to count all the rows in a table without scanning the whole
>table?
>
>How does it work under the hood so to speak, as to make the case of counting
>the rows in a table so slow?
>
>
>

>
Author
8 Jul 2005 1:50 PM
Robin Tucker
Ahhhh, yes thats it.

I visualise a table as a large array which is totally wrong.

(I don't use Count (*) FROM GiganticTable by the way in any of my production
code, I was just curious to know why I avoid it!).


Show quote
"Steve Kass" <sk***@drew.edu> wrote in message
news:uAUCVH8gFHA.2072@TK2MSFTNGP14.phx.gbl...
> Robin,
>
>  If you are thinking that the table is stored like an
> array, and the length can be divided by the size of
> an element, you are misguided here.  To store data
> this way would be terribly inefficient.  For example,
> to delete 1 row from a 1,000,000-row table would
> require moving the data from 500,000 rows on
> average.  Instead, SQL Server deletes the target
> row from its data page only, and only by inspecting
> each data page can the query processor determine
> how many rows are on each page, and therefore in
> the table.
>
> The storage model for a SQL Server table is much
> different from that of an array.  COUNT(*) will be
> evaluated using the leaf level of the narrowest table index,
> but each data page of that index must be considered.
>
> If it is important to evaluate COUNT(*) efficiently, you
> can create an indexed view to maintain the row count.
>
> See
> http://groups.google.co.uk/groups?q=AF626850-1EB2-4886-959A-040461F99ACD
> for an example.
>
> Steve Kass
> Drew University
>
>
> Robin Tucker wrote:
>
>>Reading the above thread about "Count" got me thinking about why it's
>>generally considered bad practice to use Count(*) (or Count anything come
>>to think of it).  Considering that rows are "fixed length", surely it's
>>easy for the database to count all the rows in a table without scanning
>>the whole table?
>>
>>How does it work under the hood so to speak, as to make the case of
>>counting the rows in a table so slow?
>>
>>
>>
>>
Author
8 Jul 2005 3:46 PM
JT
It is probably best not to design an application that depends on making
frequent record counts. If there is an identity column on the table, if the
identity started at 1, and if there are no deletes from the table, then
querying the maximum value of this column should give you a quick and
correct count. If the identity started at something other than 1 or if there
are gaps, then you can make whatever + / - adjustment is needed to the
result for a somewhat accurate count.

Show quote
"Robin Tucker" <idontwanttobespammedanymore@reallyidont.com> wrote in
message news:daluao$bsj$1$8302bc10@news.demon.co.uk...
> Reading the above thread about "Count" got me thinking about why it's
> generally considered bad practice to use Count(*) (or Count anything come
to
> think of it).  Considering that rows are "fixed length", surely it's easy
> for the database to count all the rows in a table without scanning the
whole
> table?
>
> How does it work under the hood so to speak, as to make the case of
counting
> the rows in a table so slow?
>
>
>
Author
8 Jul 2005 4:38 PM
Thomas Coleman
That, IMO, would be a extraordinarily bad use of the Identity column. It is far
too easy to create gaps in your sequence and not realize that gaps exist. A
better way would be to have a small table and a couple of triggers that update
the count into this table on each insert or delete. I suppose my first
recommendation would be better indexing, second would be the indexed view and
last would be the trigger idea.



Thomas


Show quote
"JT" <some***@microsoft.com> wrote in message
news:%233%23b%23R9gFHA.1468@TK2MSFTNGP14.phx.gbl...
> It is probably best not to design an application that depends on making
> frequent record counts. If there is an identity column on the table, if the
> identity started at 1, and if there are no deletes from the table, then
> querying the maximum value of this column should give you a quick and
> correct count. If the identity started at something other than 1 or if there
> are gaps, then you can make whatever + / - adjustment is needed to the
> result for a somewhat accurate count.
>
> "Robin Tucker" <idontwanttobespammedanymore@reallyidont.com> wrote in
> message news:daluao$bsj$1$8302bc10@news.demon.co.uk...
>> Reading the above thread about "Count" got me thinking about why it's
>> generally considered bad practice to use Count(*) (or Count anything come
> to
>> think of it).  Considering that rows are "fixed length", surely it's easy
>> for the database to count all the rows in a table without scanning the
> whole
>> table?
>>
>> How does it work under the hood so to speak, as to make the case of
> counting
>> the rows in a table so slow?
>>
>>
>>
>
>
Author
8 Jul 2005 4:44 PM
Aaron Bertrand [SQL Server MVP]
> better way would be to have a small table and a couple of triggers that
> update the count into this table on each insert or delete.

Or better yet, to avoid bogging down OLTP operations and transactions, a
scheduled job that polls the table periodically and grabs a count
(optionally with nolock), or updates usage and uses sysindexes.  If it is a
high activity database, this will be less work than maintaining counts for
every operation, and while on a less busy system you will take counts more
often than you may need, it will not interfere with any of your application
code (and will not require triggers).

A
Author
8 Jul 2005 6:08 PM
Thomas Coleman
Agreed. After I wrote the trigger idea I thought about and realized I wouldn't
want to do it that way. Either way, using an Identity is definitely not the way
to go.


Thomas

Show quote
"Aaron Bertrand [SQL Server MVP]" <ten.xoc@dnartreb.noraa> wrote in message
news:e6YSNx9gFHA.2632@TK2MSFTNGP09.phx.gbl...
>> better way would be to have a small table and a couple of triggers that
>> update the count into this table on each insert or delete.
>
> Or better yet, to avoid bogging down OLTP operations and transactions, a
> scheduled job that polls the table periodically and grabs a count (optionally
> with nolock), or updates usage and uses sysindexes.  If it is a high activity
> database, this will be less work than maintaining counts for every operation,
> and while on a less busy system you will take counts more often than you may
> need, it will not interfere with any of your application code (and will not
> require triggers).
>
> A
>
Author
8 Jul 2005 6:49 PM
JT
Yes, in an OLTP system there can be gaps in identity values due to aborted
transactions, deletions, etc. and this would make it unreliable.

In the case where I used max(id) to get a count of rows, it is part of a
datawarehousing ETL process where a text file is bulk copied into a new
table, and a count of the records inserted is needed for inclusion on a
report. In this specific situation, I've found the count to be consistently
accurate.

Show quote
"JT" <some***@microsoft.com> wrote in message
news:%233%23b%23R9gFHA.1468@TK2MSFTNGP14.phx.gbl...
> It is probably best not to design an application that depends on making
> frequent record counts. If there is an identity column on the table, if
the
> identity started at 1, and if there are no deletes from the table, then
> querying the maximum value of this column should give you a quick and
> correct count. If the identity started at something other than 1 or if
there
> are gaps, then you can make whatever + / - adjustment is needed to the
> result for a somewhat accurate count.
>
> "Robin Tucker" <idontwanttobespammedanymore@reallyidont.com> wrote in
> message news:daluao$bsj$1$8302bc10@news.demon.co.uk...
> > Reading the above thread about "Count" got me thinking about why it's
> > generally considered bad practice to use Count(*) (or Count anything
come
> to
> > think of it).  Considering that rows are "fixed length", surely it's
easy
> > for the database to count all the rows in a table without scanning the
> whole
> > table?
> >
> > How does it work under the hood so to speak, as to make the case of
> counting
> > the rows in a table so slow?
> >
> >
> >
>
>
Author
8 Jul 2005 7:04 PM
Aaron Bertrand [SQL Server MVP]
> In the case where I used max(id) to get a count of rows, it is part of a
> datawarehousing ETL process where a text file is bulk copied into a new
> table, and a count of the records inserted is needed

So why not do SELECT @@ROWCOUNT right after the BULK INSERT statement?
Adding a surrogate key just to keep track of rowcounts is not a very
efficient way to do this...
Author
8 Jul 2005 7:25 PM
JT
The ETL is performed by a Visual Basic application using the command line
version of the bulk copy program, so I don't have access to @@rowcount. The
identity column is also the table's primary key. Record count is always
instantaneous and accurate.

Show quote
"Aaron Bertrand [SQL Server MVP]" <ten.xoc@dnartreb.noraa> wrote in message
news:OjoxO$%23gFHA.2916@TK2MSFTNGP14.phx.gbl...
> > In the case where I used max(id) to get a count of rows, it is part of a
> > datawarehousing ETL process where a text file is bulk copied into a new
> > table, and a count of the records inserted is needed
>
> So why not do SELECT @@ROWCOUNT right after the BULK INSERT statement?
> Adding a surrogate key just to keep track of rowcounts is not a very
> efficient way to do this...
>
>
Author
9 Jul 2005 8:33 PM
Colin Dawson
You'll never need to do a select count(*) again.  I've been looking at the
sys table very closely over the last few weeks.   There is actually a field
in one of the systables that contains the number of rows in the table!!!!!!!
Talk about great features hidden in the back of sql server!

So how do you get the count?  And I'll bet that it'll be quicker than
practically every select count(*) no matter how many indexes you create.

Here's a stored procedures to do it....   (you'll need to have the stored
procedure in the same database at the table, but I'm sure that someone will
alter this to work anywhere)   Hey, MS how about shipping that version in
the master database in SQL 2005?   Maybe call it sp_rowcount

Ceate Procedure ProcRowCount
  @TableName sysname
As
Select
  si.rows
From sysindexes si
Join sysobjects so On so.id = si.id
Where si.indid <= 1
and so.name = @TableName
Go

I've not tested this version of the stored procedure, it should work though.
You could change this so that it's a UDF, that way you can use it just like
select count(*).



Show quote
"Robin Tucker" <idontwanttobespammedanymore@reallyidont.com> wrote in
message news:daluao$bsj$1$8302bc10@news.demon.co.uk...
> Reading the above thread about "Count" got me thinking about why it's
> generally considered bad practice to use Count(*) (or Count anything come
> to think of it).  Considering that rows are "fixed length", surely it's
> easy for the database to count all the rows in a table without scanning
> the whole table?
>
> How does it work under the hood so to speak, as to make the case of
> counting the rows in a table so slow?
>
>
>
Author
9 Jul 2005 8:51 PM
Mike Epprecht (SQL MVP)
Hi

Sysindexes is not guaranteed to be up to date. In the trade off for
performance, some system tables are only updated at a later stage.

There have been many discussions on this in these groups.

Regards
--------------------------------
Mike Epprecht, Microsoft SQL Server MVP
Zurich, Switzerland

IM: m***@epprecht.net

MVP Program: http://www.microsoft.com/mvp

Blog: http://www.msmvps.com/epprecht/

Show quote
"Colin Dawson" <newsgro***@cjdawson.com> wrote in message
news:kwWze.66356$G8.5246@text.news.blueyonder.co.uk...
> You'll never need to do a select count(*) again.  I've been looking at the
> sys table very closely over the last few weeks.   There is actually a
> field in one of the systables that contains the number of rows in the
> table!!!!!!! Talk about great features hidden in the back of sql server!
>
> So how do you get the count?  And I'll bet that it'll be quicker than
> practically every select count(*) no matter how many indexes you create.
>
> Here's a stored procedures to do it....   (you'll need to have the stored
> procedure in the same database at the table, but I'm sure that someone
> will alter this to work anywhere)   Hey, MS how about shipping that
> version in the master database in SQL 2005?   Maybe call it sp_rowcount
>
> Ceate Procedure ProcRowCount
>  @TableName sysname
> As
> Select
>  si.rows
> From sysindexes si
> Join sysobjects so On so.id = si.id
> Where si.indid <= 1
> and so.name = @TableName
> Go
>
> I've not tested this version of the stored procedure, it should work
> though. You could change this so that it's a UDF, that way you can use it
> just like select count(*).
>
>
>
> "Robin Tucker" <idontwanttobespammedanymore@reallyidont.com> wrote in
> message news:daluao$bsj$1$8302bc10@news.demon.co.uk...
>> Reading the above thread about "Count" got me thinking about why it's
>> generally considered bad practice to use Count(*) (or Count anything come
>> to think of it).  Considering that rows are "fixed length", surely it's
>> easy for the database to count all the rows in a table without scanning
>> the whole table?
>>
>> How does it work under the hood so to speak, as to make the case of
>> counting the rows in a table so slow?
>>
>>
>>
>
>
Author
11 Jul 2005 1:47 PM
Aaron Bertrand [SQL Server MVP]
> You'll never need to do a select count(*) again.  I've been looking at the
> sys table very closely over the last few weeks.   There is actually a
> field in one of the systables that contains the number of rows in the
> table!!!!!!!

Yes, however it is only updated periodically.  So, to rely on a row count
for any table, you will have to do DBCC UPDATEUSAGE first.

A

AddThis Social Bookmark Button