Home All Groups Group Topic Archive Search About

My table design - where did I go wrong?

Author
29 Jul 2005 6:58 PM
Rich
CREATE TABLE Sales1
(
  varchar (10) CustID,
  varchar (10) TransID,
  datetime SaleDate,
  money  S1,
  money  S2,
  money  S3,
  money  S4,
  numeric V1
)

CREATE NONCLUSTERED INDEX Sales1_CustID ON Sales1 (CustID)

CREATE NONCLUSTERED INDEX Sales1_SaleDate ON Sales1 (SaleDate)

"money" is just the right size for my fields. This table has 9,500,000
records.

Although I need to do this select in less than a half second, it takes 1
full minute:
SELECT MAX(S1) FROM Sales1 WHERE SaleDate > '1/1/2005'

And I need to do this select in less than a half second, but it takes 3
minutes:
SELECT AVG(S1 / S2) FROM Sales1 WHERE S3 > S4

Am I supposed to create a new field with these values pre-calculated? I hope
not, because I have several other formulas - up to 500 different types of
selects which are all similar.

I am testing with SQL 2000 Developer. XEON CPU, fast SATA hard drive and 2GB
of memory.

Thanks

Author
29 Jul 2005 7:59 PM
Mike Epprecht (SQL MVP)
Hi

The 1st query can use the index your created on SaleDate.

The 2nd query has no option, but to read every row  (9.5 million) in the
table to find where S3 > S4, and then based on that subset, so the S1/S2
evaluation.
Either add and index, to restrict the select to something that can use one
of the indexes.

Regards
--------------------------------
Mike Epprecht, Microsoft SQL Server MVP
Zurich, Switzerland

IM: m***@epprecht.net

MVP Program: http://www.microsoft.com/mvp

Blog: http://www.msmvps.com/epprecht/

"Rich" <no@spam.invalid> wrote in message
news:G_uGe.53624$4o.5775@fed1read06...
Show quote
> CREATE TABLE Sales1
> (
>  varchar (10) CustID,
>  varchar (10) TransID,
>  datetime SaleDate,
>  money  S1,
>  money  S2,
>  money  S3,
>  money  S4,
>  numeric V1
> )
>
> CREATE NONCLUSTERED INDEX Sales1_CustID ON Sales1 (CustID)
>
> CREATE NONCLUSTERED INDEX Sales1_SaleDate ON Sales1 (SaleDate)
>
> "money" is just the right size for my fields. This table has 9,500,000
> records.
>
> Although I need to do this select in less than a half second, it takes 1
> full minute:
> SELECT MAX(S1) FROM Sales1 WHERE SaleDate > '1/1/2005'
>
> And I need to do this select in less than a half second, but it takes 3
> minutes:
> SELECT AVG(S1 / S2) FROM Sales1 WHERE S3 > S4
>
> Am I supposed to create a new field with these values pre-calculated? I
> hope
> not, because I have several other formulas - up to 500 different types of
> selects which are all similar.
>
> I am testing with SQL 2000 Developer. XEON CPU, fast SATA hard drive and
> 2GB
> of memory.
>
> Thanks
>
>
>
Author
29 Jul 2005 9:37 PM
Rich
Shoot... SQL server might not work for me. I have to be able to calculate
sums, averages, etc. based on a number of different fields, and there are
millions of records. I have to get results within milliseconds. Is there a
way I can tell SQL server to load *everything* into memory?

Show quote
"Mike Epprecht (SQL MVP)" <m***@epprecht.net> wrote in message
news:OVlgKgHlFHA.1440@TK2MSFTNGP14.phx.gbl...
> Hi
>
> The 1st query can use the index your created on SaleDate.
>
> The 2nd query has no option, but to read every row  (9.5 million) in the
> table to find where S3 > S4, and then based on that subset, so the S1/S2
> evaluation.
> Either add and index, to restrict the select to something that can use one
> of the indexes.
>
> Regards
> --------------------------------
> Mike Epprecht, Microsoft SQL Server MVP
> Zurich, Switzerland
>
> IM: m***@epprecht.net
>
> MVP Program: http://www.microsoft.com/mvp
>
> Blog: http://www.msmvps.com/epprecht/
>
> "Rich" <no@spam.invalid> wrote in message
> news:G_uGe.53624$4o.5775@fed1read06...
> > CREATE TABLE Sales1
> > (
> >  varchar (10) CustID,
> >  varchar (10) TransID,
> >  datetime SaleDate,
> >  money  S1,
> >  money  S2,
> >  money  S3,
> >  money  S4,
> >  numeric V1
> > )
> >
> > CREATE NONCLUSTERED INDEX Sales1_CustID ON Sales1 (CustID)
> >
> > CREATE NONCLUSTERED INDEX Sales1_SaleDate ON Sales1 (SaleDate)
> >
> > "money" is just the right size for my fields. This table has 9,500,000
> > records.
> >
> > Although I need to do this select in less than a half second, it takes 1
> > full minute:
> > SELECT MAX(S1) FROM Sales1 WHERE SaleDate > '1/1/2005'
> >
> > And I need to do this select in less than a half second, but it takes 3
> > minutes:
> > SELECT AVG(S1 / S2) FROM Sales1 WHERE S3 > S4
> >
> > Am I supposed to create a new field with these values pre-calculated? I
> > hope
> > not, because I have several other formulas - up to 500 different types
of
> > selects which are all similar.
> >
> > I am testing with SQL 2000 Developer. XEON CPU, fast SATA hard drive and
> > 2GB
> > of memory.
> >
> > Thanks
> >
> >
> >
>
>
Author
29 Jul 2005 10:03 PM
Colin Dawson
How about thinking a little outside the norm.  Denormalisation may be useful
here.
The idea is that you have a table which contains the results of the query
already done.

Then you could add triggers or maybe a SQL Job to re-calculate the
aggregates, and fill in a table.

Averages (or more correctly the mean value) can be calculated easily, during
the insert.  Update's and delete's may be a little more tricky, but then
that's there the JOB comes in.  If it's not critical that the aggregate is
calculated NOW, but can be done lets say once a day then a SQL Job could be
setup to run re-calculate every so often.  Does it really matter if it takes
a minute if no one is sat there waiting for it?   OK, not what you wanted to
hear I know, but it's a possibility.

Another thought is that you may want to take a look into using composite
indexes.  For you second query, try this index it may help to improve the
performance (my SQL box is at work so I can't try it)

Before you create the index, run the query and get SQL to display the Actual
Execution plan.  You'll notice that the Query is performing a Table Scan,
(or if you've now created a clustered index, as Clustered Index scan).   Use
SQL Profiler to take a reading of the Number of Reads that SQL needs to
execute your query.

Now Create the following index

Create NonClustered Index Sales1_S3_S4_S1_S2 On Sales1 ( S3, S4, S1, S2 )

If I've got it right, when you re-execute the query, the execution plan will
show that it's performing an Index Scan Or possibly even and Index Seek (but
I don't think this will happen)  Now use SQL Profiler to get the number of
reads, you'll notice that it was alot lower.  Also the execution time should
be lower too.    Now turn off the Execution Plan, and stop the profiler to
get the production speed of the query.

I know that I've rambled on a bit, but I hope it gives you some ideas.

Regards

Colin Dawson
www.cjdawson.com
"Rich" <no@spam.invalid> wrote in message
news:EjxGe.53631$4o.43600@fed1read06...
Show quote
> Shoot... SQL server might not work for me. I have to be able to calculate
> sums, averages, etc. based on a number of different fields, and there are
> millions of records. I have to get results within milliseconds. Is there a
> way I can tell SQL server to load *everything* into memory?
>
> "Mike Epprecht (SQL MVP)" <m***@epprecht.net> wrote in message
> news:OVlgKgHlFHA.1440@TK2MSFTNGP14.phx.gbl...
>> Hi
>>
>> The 1st query can use the index your created on SaleDate.
>>
>> The 2nd query has no option, but to read every row  (9.5 million) in the
>> table to find where S3 > S4, and then based on that subset, so the S1/S2
>> evaluation.
>> Either add and index, to restrict the select to something that can use
>> one
>> of the indexes.
>>
>> Regards
>> --------------------------------
>> Mike Epprecht, Microsoft SQL Server MVP
>> Zurich, Switzerland
>>
>> IM: m***@epprecht.net
>>
>> MVP Program: http://www.microsoft.com/mvp
>>
>> Blog: http://www.msmvps.com/epprecht/
>>
>> "Rich" <no@spam.invalid> wrote in message
>> news:G_uGe.53624$4o.5775@fed1read06...
>> > CREATE TABLE Sales1
>> > (
>> >  varchar (10) CustID,
>> >  varchar (10) TransID,
>> >  datetime SaleDate,
>> >  money  S1,
>> >  money  S2,
>> >  money  S3,
>> >  money  S4,
>> >  numeric V1
>> > )
>> >
>> > CREATE NONCLUSTERED INDEX Sales1_CustID ON Sales1 (CustID)
>> >
>> > CREATE NONCLUSTERED INDEX Sales1_SaleDate ON Sales1 (SaleDate)
>> >
>> > "money" is just the right size for my fields. This table has 9,500,000
>> > records.
>> >
>> > Although I need to do this select in less than a half second, it takes
>> > 1
>> > full minute:
>> > SELECT MAX(S1) FROM Sales1 WHERE SaleDate > '1/1/2005'
>> >
>> > And I need to do this select in less than a half second, but it takes 3
>> > minutes:
>> > SELECT AVG(S1 / S2) FROM Sales1 WHERE S3 > S4
>> >
>> > Am I supposed to create a new field with these values pre-calculated? I
>> > hope
>> > not, because I have several other formulas - up to 500 different types
> of
>> > selects which are all similar.
>> >
>> > I am testing with SQL 2000 Developer. XEON CPU, fast SATA hard drive
>> > and
>> > 2GB
>> > of memory.
>> >
>> > Thanks
>> >
>> >
>> >
>>
>>
>
>
Author
29 Jul 2005 11:00 PM
Rich
Thanks I'll give that a try. But I have to recalculate those 500 fields
every 5 minutes... do you suppose that "Real Time OLAP" would help in this
case?

Show quote
"Colin Dawson" <newsgro***@cjdawson.com> wrote in message
news:iJxGe.79457$G8.17569@text.news.blueyonder.co.uk...
> How about thinking a little outside the norm.  Denormalisation may be
useful
> here.
> The idea is that you have a table which contains the results of the query
> already done.
>
> Then you could add triggers or maybe a SQL Job to re-calculate the
> aggregates, and fill in a table.
>
> Averages (or more correctly the mean value) can be calculated easily,
during
> the insert.  Update's and delete's may be a little more tricky, but then
> that's there the JOB comes in.  If it's not critical that the aggregate is
> calculated NOW, but can be done lets say once a day then a SQL Job could
be
> setup to run re-calculate every so often.  Does it really matter if it
takes
> a minute if no one is sat there waiting for it?   OK, not what you wanted
to
> hear I know, but it's a possibility.
>
> Another thought is that you may want to take a look into using composite
> indexes.  For you second query, try this index it may help to improve the
> performance (my SQL box is at work so I can't try it)
>
> Before you create the index, run the query and get SQL to display the
Actual
> Execution plan.  You'll notice that the Query is performing a Table Scan,
> (or if you've now created a clustered index, as Clustered Index scan).
Use
> SQL Profiler to take a reading of the Number of Reads that SQL needs to
> execute your query.
>
> Now Create the following index
>
> Create NonClustered Index Sales1_S3_S4_S1_S2 On Sales1 ( S3, S4, S1, S2 )
>
> If I've got it right, when you re-execute the query, the execution plan
will
> show that it's performing an Index Scan Or possibly even and Index Seek
(but
> I don't think this will happen)  Now use SQL Profiler to get the number of
> reads, you'll notice that it was alot lower.  Also the execution time
should
> be lower too.    Now turn off the Execution Plan, and stop the profiler to
> get the production speed of the query.
>
> I know that I've rambled on a bit, but I hope it gives you some ideas.
>
> Regards
>
> Colin Dawson
> www.cjdawson.com
> "Rich" <no@spam.invalid> wrote in message
> news:EjxGe.53631$4o.43600@fed1read06...
> > Shoot... SQL server might not work for me. I have to be able to
calculate
> > sums, averages, etc. based on a number of different fields, and there
are
> > millions of records. I have to get results within milliseconds. Is there
a
> > way I can tell SQL server to load *everything* into memory?
> >
> > "Mike Epprecht (SQL MVP)" <m***@epprecht.net> wrote in message
> > news:OVlgKgHlFHA.1440@TK2MSFTNGP14.phx.gbl...
> >> Hi
> >>
> >> The 1st query can use the index your created on SaleDate.
> >>
> >> The 2nd query has no option, but to read every row  (9.5 million) in
the
> >> table to find where S3 > S4, and then based on that subset, so the
S1/S2
> >> evaluation.
> >> Either add and index, to restrict the select to something that can use
> >> one
> >> of the indexes.
> >>
> >> Regards
> >> --------------------------------
> >> Mike Epprecht, Microsoft SQL Server MVP
> >> Zurich, Switzerland
> >>
> >> IM: m***@epprecht.net
> >>
> >> MVP Program: http://www.microsoft.com/mvp
> >>
> >> Blog: http://www.msmvps.com/epprecht/
> >>
> >> "Rich" <no@spam.invalid> wrote in message
> >> news:G_uGe.53624$4o.5775@fed1read06...
> >> > CREATE TABLE Sales1
> >> > (
> >> >  varchar (10) CustID,
> >> >  varchar (10) TransID,
> >> >  datetime SaleDate,
> >> >  money  S1,
> >> >  money  S2,
> >> >  money  S3,
> >> >  money  S4,
> >> >  numeric V1
> >> > )
> >> >
> >> > CREATE NONCLUSTERED INDEX Sales1_CustID ON Sales1 (CustID)
> >> >
> >> > CREATE NONCLUSTERED INDEX Sales1_SaleDate ON Sales1 (SaleDate)
> >> >
> >> > "money" is just the right size for my fields. This table has
9,500,000
Show quote
> >> > records.
> >> >
> >> > Although I need to do this select in less than a half second, it
takes
> >> > 1
> >> > full minute:
> >> > SELECT MAX(S1) FROM Sales1 WHERE SaleDate > '1/1/2005'
> >> >
> >> > And I need to do this select in less than a half second, but it takes
3
> >> > minutes:
> >> > SELECT AVG(S1 / S2) FROM Sales1 WHERE S3 > S4
> >> >
> >> > Am I supposed to create a new field with these values pre-calculated?
I
> >> > hope
> >> > not, because I have several other formulas - up to 500 different
types
> > of
> >> > selects which are all similar.
> >> >
> >> > I am testing with SQL 2000 Developer. XEON CPU, fast SATA hard drive
> >> > and
> >> > 2GB
> >> > of memory.
> >> >
> >> > Thanks
> >> >
> >> >
> >> >
> >>
> >>
> >
> >
>
>
Author
31 Jul 2005 10:14 AM
Colin Dawson
Honestly, I've not looked at OLAP yet.   I really think that I should do at
some point soon.

re-calculating 500 fields every five mins! ouch.   I know that this idea
could end up in a complete tangle, but maybe it would be worth splitting the
workload down.     For example if you're working out averages, use the real
data to work out the averages for portions of the data, then to get the
average for the whole, use the results from the portions to calculate the
whole.  Also does the entire table have to be re-calculated?  If not, don't
do the whole table.    For example....

Say that you're talking about sales for the last week.  Today is Sunday, and
as the information is entered into the system each day, and the previous
days information doesn't get altered, there's no need to re-calulate
yesterdays average, it's static.

Calculating the average for the week so far is as simple as calculating the
average for monday to saturday - i.e. The six seperate averages for added
together, then divided by 6.  Then all you need to do is include todays
average, which will only ever be on a small dataset compared to the entire
table.  Bet this can be calulated in a matter of ms.

I can't give you any specifics on how to get the best performance from your
database, for the simple reasong that I don't have a full and complete
understanding of your data.  This is the key to getting maximum performance
out of any database.

Regards

Colin Dawson
www.cjdawson.com



"Rich" <no@spam.invalid> wrote in message
news:exyGe.53638$4o.28608@fed1read06...
Show quote
> Thanks I'll give that a try. But I have to recalculate those 500 fields
> every 5 minutes... do you suppose that "Real Time OLAP" would help in this
> case?
>
> "Colin Dawson" <newsgro***@cjdawson.com> wrote in message
> news:iJxGe.79457$G8.17569@text.news.blueyonder.co.uk...
>> How about thinking a little outside the norm.  Denormalisation may be
> useful
>> here.
>> The idea is that you have a table which contains the results of the query
>> already done.
>>
>> Then you could add triggers or maybe a SQL Job to re-calculate the
>> aggregates, and fill in a table.
>>
>> Averages (or more correctly the mean value) can be calculated easily,
> during
>> the insert.  Update's and delete's may be a little more tricky, but then
>> that's there the JOB comes in.  If it's not critical that the aggregate
>> is
>> calculated NOW, but can be done lets say once a day then a SQL Job could
> be
>> setup to run re-calculate every so often.  Does it really matter if it
> takes
>> a minute if no one is sat there waiting for it?   OK, not what you wanted
> to
>> hear I know, but it's a possibility.
>>
>> Another thought is that you may want to take a look into using composite
>> indexes.  For you second query, try this index it may help to improve the
>> performance (my SQL box is at work so I can't try it)
>>
>> Before you create the index, run the query and get SQL to display the
> Actual
>> Execution plan.  You'll notice that the Query is performing a Table Scan,
>> (or if you've now created a clustered index, as Clustered Index scan).
> Use
>> SQL Profiler to take a reading of the Number of Reads that SQL needs to
>> execute your query.
>>
>> Now Create the following index
>>
>> Create NonClustered Index Sales1_S3_S4_S1_S2 On Sales1 ( S3, S4, S1, S2 )
>>
>> If I've got it right, when you re-execute the query, the execution plan
> will
>> show that it's performing an Index Scan Or possibly even and Index Seek
> (but
>> I don't think this will happen)  Now use SQL Profiler to get the number
>> of
>> reads, you'll notice that it was alot lower.  Also the execution time
> should
>> be lower too.    Now turn off the Execution Plan, and stop the profiler
>> to
>> get the production speed of the query.
>>
>> I know that I've rambled on a bit, but I hope it gives you some ideas.
>>
>> Regards
>>
>> Colin Dawson
>> www.cjdawson.com
>> "Rich" <no@spam.invalid> wrote in message
>> news:EjxGe.53631$4o.43600@fed1read06...
>> > Shoot... SQL server might not work for me. I have to be able to
> calculate
>> > sums, averages, etc. based on a number of different fields, and there
> are
>> > millions of records. I have to get results within milliseconds. Is
>> > there
> a
>> > way I can tell SQL server to load *everything* into memory?
>> >
>> > "Mike Epprecht (SQL MVP)" <m***@epprecht.net> wrote in message
>> > news:OVlgKgHlFHA.1440@TK2MSFTNGP14.phx.gbl...
>> >> Hi
>> >>
>> >> The 1st query can use the index your created on SaleDate.
>> >>
>> >> The 2nd query has no option, but to read every row  (9.5 million) in
> the
>> >> table to find where S3 > S4, and then based on that subset, so the
> S1/S2
>> >> evaluation.
>> >> Either add and index, to restrict the select to something that can use
>> >> one
>> >> of the indexes.
>> >>
>> >> Regards
>> >> --------------------------------
>> >> Mike Epprecht, Microsoft SQL Server MVP
>> >> Zurich, Switzerland
>> >>
>> >> IM: m***@epprecht.net
>> >>
>> >> MVP Program: http://www.microsoft.com/mvp
>> >>
>> >> Blog: http://www.msmvps.com/epprecht/
>> >>
>> >> "Rich" <no@spam.invalid> wrote in message
>> >> news:G_uGe.53624$4o.5775@fed1read06...
>> >> > CREATE TABLE Sales1
>> >> > (
>> >> >  varchar (10) CustID,
>> >> >  varchar (10) TransID,
>> >> >  datetime SaleDate,
>> >> >  money  S1,
>> >> >  money  S2,
>> >> >  money  S3,
>> >> >  money  S4,
>> >> >  numeric V1
>> >> > )
>> >> >
>> >> > CREATE NONCLUSTERED INDEX Sales1_CustID ON Sales1 (CustID)
>> >> >
>> >> > CREATE NONCLUSTERED INDEX Sales1_SaleDate ON Sales1 (SaleDate)
>> >> >
>> >> > "money" is just the right size for my fields. This table has
> 9,500,000
>> >> > records.
>> >> >
>> >> > Although I need to do this select in less than a half second, it
> takes
>> >> > 1
>> >> > full minute:
>> >> > SELECT MAX(S1) FROM Sales1 WHERE SaleDate > '1/1/2005'
>> >> >
>> >> > And I need to do this select in less than a half second, but it
>> >> > takes
> 3
>> >> > minutes:
>> >> > SELECT AVG(S1 / S2) FROM Sales1 WHERE S3 > S4
>> >> >
>> >> > Am I supposed to create a new field with these values
>> >> > pre-calculated?
> I
>> >> > hope
>> >> > not, because I have several other formulas - up to 500 different
> types
>> > of
>> >> > selects which are all similar.
>> >> >
>> >> > I am testing with SQL 2000 Developer. XEON CPU, fast SATA hard drive
>> >> > and
>> >> > 2GB
>> >> > of memory.
>> >> >
>> >> > Thanks
>> >> >
>> >> >
>> >> >
>> >>
>> >>
>> >
>> >
>>
>>
>
>
Author
29 Jul 2005 10:01 PM
--CELKO--
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications.  It is very hard to debug code when you do not let us
see it.

But based on your weird, vague personal language:

1) You have no key because in NULL-able
2) You don't know the proper format for dates in SQL
3) You don't know about the math errors in the deprecated MONEY data
type - use DECIMAL(s,p) instead.
4)  You don't know how to declare NUMERIC (s,p) data
5)  Either you were sloppy or you don't know what a repeating group and
1NF are.
6) You don't know that rows are not records and columns are not fields.


if you cared about the people helping your for free, would your DDL
look like this?

CREATE TABLE SalesTransactions
(trans_nbr CHAR(10) NOT NULL PRIMARY KEY,
cust_id CHAR(10) NOT NULL
   REFERENCES Customers (cust_id)
   ON UPDATE CASCADE,
sales_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL
  CHECK (<<SET to 00:00:00 Hrs >>),
repeating_grp_1 DECIMAL (12,4) DEFAULT 0.00 NOT NULL,
repeating_grp_2 DECIMAL (12,4) DEFAULT 0.00 NOT NULL,
repeating_grp_3 DECIMAL (12,4) DEFAULT 0.00 NOT NULL,
repeating_grp_4 DECIMAL (12,4) DEFAULT 0.00 NOT NULL,
vague_thing_1 NUMERIC(12,4) DEFAULT 0.00 NOT NULL);

You can use a CASE expession with aggregates to do all of these things
in one statement.  You will simply have to make a table scan for some
of them, so indexing will not help. Do them all at once or put them in
a VIEW.

CREATE VIEW Godzilla (..)
AS
SELECT MAX(CASE WHEN sales_date SaleDate > '2005-01-01'
                              THEN repeating_grp_1 ELSE NULL END)
             AS max_1,
             AVG (CASE WHEN repeating_grp_3 > repeating_grp_S4
                                THEN repeating_grp_1/repeating_grp_2)
ELSE NULL END
             AS foobar_avg,
             etc.
  FROM SalesTransactions1;

But this table feels like a set of design flaws.

AddThis Social Bookmark Button