|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
My table design - where did I go wrong?( varchar (10) CustID, varchar (10) TransID, datetime SaleDate, money S1, money S2, money S3, money S4, numeric V1 ) CREATE NONCLUSTERED INDEX Sales1_CustID ON Sales1 (CustID) CREATE NONCLUSTERED INDEX Sales1_SaleDate ON Sales1 (SaleDate) "money" is just the right size for my fields. This table has 9,500,000 records. Although I need to do this select in less than a half second, it takes 1 full minute: SELECT MAX(S1) FROM Sales1 WHERE SaleDate > '1/1/2005' And I need to do this select in less than a half second, but it takes 3 minutes: SELECT AVG(S1 / S2) FROM Sales1 WHERE S3 > S4 Am I supposed to create a new field with these values pre-calculated? I hope not, because I have several other formulas - up to 500 different types of selects which are all similar. I am testing with SQL 2000 Developer. XEON CPU, fast SATA hard drive and 2GB of memory. Thanks Hi
The 1st query can use the index your created on SaleDate. The 2nd query has no option, but to read every row (9.5 million) in the table to find where S3 > S4, and then based on that subset, so the S1/S2 evaluation. Either add and index, to restrict the select to something that can use one of the indexes. Regards -------------------------------- Mike Epprecht, Microsoft SQL Server MVP Zurich, Switzerland IM: m***@epprecht.net MVP Program: http://www.microsoft.com/mvp Blog: http://www.msmvps.com/epprecht/ "Rich" <no@spam.invalid> wrote in message news:G_uGe.53624$4o.5775@fed1read06...Show quote > CREATE TABLE Sales1 > ( > varchar (10) CustID, > varchar (10) TransID, > datetime SaleDate, > money S1, > money S2, > money S3, > money S4, > numeric V1 > ) > > CREATE NONCLUSTERED INDEX Sales1_CustID ON Sales1 (CustID) > > CREATE NONCLUSTERED INDEX Sales1_SaleDate ON Sales1 (SaleDate) > > "money" is just the right size for my fields. This table has 9,500,000 > records. > > Although I need to do this select in less than a half second, it takes 1 > full minute: > SELECT MAX(S1) FROM Sales1 WHERE SaleDate > '1/1/2005' > > And I need to do this select in less than a half second, but it takes 3 > minutes: > SELECT AVG(S1 / S2) FROM Sales1 WHERE S3 > S4 > > Am I supposed to create a new field with these values pre-calculated? I > hope > not, because I have several other formulas - up to 500 different types of > selects which are all similar. > > I am testing with SQL 2000 Developer. XEON CPU, fast SATA hard drive and > 2GB > of memory. > > Thanks > > > Shoot... SQL server might not work for me. I have to be able to calculate
sums, averages, etc. based on a number of different fields, and there are millions of records. I have to get results within milliseconds. Is there a way I can tell SQL server to load *everything* into memory? Show quote "Mike Epprecht (SQL MVP)" <m***@epprecht.net> wrote in message news:OVlgKgHlFHA.1440@TK2MSFTNGP14.phx.gbl... > Hi > > The 1st query can use the index your created on SaleDate. > > The 2nd query has no option, but to read every row (9.5 million) in the > table to find where S3 > S4, and then based on that subset, so the S1/S2 > evaluation. > Either add and index, to restrict the select to something that can use one > of the indexes. > > Regards > -------------------------------- > Mike Epprecht, Microsoft SQL Server MVP > Zurich, Switzerland > > IM: m***@epprecht.net > > MVP Program: http://www.microsoft.com/mvp > > Blog: http://www.msmvps.com/epprecht/ > > "Rich" <no@spam.invalid> wrote in message > news:G_uGe.53624$4o.5775@fed1read06... > > CREATE TABLE Sales1 > > ( > > varchar (10) CustID, > > varchar (10) TransID, > > datetime SaleDate, > > money S1, > > money S2, > > money S3, > > money S4, > > numeric V1 > > ) > > > > CREATE NONCLUSTERED INDEX Sales1_CustID ON Sales1 (CustID) > > > > CREATE NONCLUSTERED INDEX Sales1_SaleDate ON Sales1 (SaleDate) > > > > "money" is just the right size for my fields. This table has 9,500,000 > > records. > > > > Although I need to do this select in less than a half second, it takes 1 > > full minute: > > SELECT MAX(S1) FROM Sales1 WHERE SaleDate > '1/1/2005' > > > > And I need to do this select in less than a half second, but it takes 3 > > minutes: > > SELECT AVG(S1 / S2) FROM Sales1 WHERE S3 > S4 > > > > Am I supposed to create a new field with these values pre-calculated? I > > hope > > not, because I have several other formulas - up to 500 different types of > > selects which are all similar. > > > > I am testing with SQL 2000 Developer. XEON CPU, fast SATA hard drive and > > 2GB > > of memory. > > > > Thanks > > > > > > > > How about thinking a little outside the norm. Denormalisation may be useful
here. The idea is that you have a table which contains the results of the query already done. Then you could add triggers or maybe a SQL Job to re-calculate the aggregates, and fill in a table. Averages (or more correctly the mean value) can be calculated easily, during the insert. Update's and delete's may be a little more tricky, but then that's there the JOB comes in. If it's not critical that the aggregate is calculated NOW, but can be done lets say once a day then a SQL Job could be setup to run re-calculate every so often. Does it really matter if it takes a minute if no one is sat there waiting for it? OK, not what you wanted to hear I know, but it's a possibility. Another thought is that you may want to take a look into using composite indexes. For you second query, try this index it may help to improve the performance (my SQL box is at work so I can't try it) Before you create the index, run the query and get SQL to display the Actual Execution plan. You'll notice that the Query is performing a Table Scan, (or if you've now created a clustered index, as Clustered Index scan). Use SQL Profiler to take a reading of the Number of Reads that SQL needs to execute your query. Now Create the following index Create NonClustered Index Sales1_S3_S4_S1_S2 On Sales1 ( S3, S4, S1, S2 ) If I've got it right, when you re-execute the query, the execution plan will show that it's performing an Index Scan Or possibly even and Index Seek (but I don't think this will happen) Now use SQL Profiler to get the number of reads, you'll notice that it was alot lower. Also the execution time should be lower too. Now turn off the Execution Plan, and stop the profiler to get the production speed of the query. I know that I've rambled on a bit, but I hope it gives you some ideas. Regards Colin Dawson www.cjdawson.com "Rich" <no@spam.invalid> wrote in message news:EjxGe.53631$4o.43600@fed1read06...Show quote > Shoot... SQL server might not work for me. I have to be able to calculate > sums, averages, etc. based on a number of different fields, and there are > millions of records. I have to get results within milliseconds. Is there a > way I can tell SQL server to load *everything* into memory? > > "Mike Epprecht (SQL MVP)" <m***@epprecht.net> wrote in message > news:OVlgKgHlFHA.1440@TK2MSFTNGP14.phx.gbl... >> Hi >> >> The 1st query can use the index your created on SaleDate. >> >> The 2nd query has no option, but to read every row (9.5 million) in the >> table to find where S3 > S4, and then based on that subset, so the S1/S2 >> evaluation. >> Either add and index, to restrict the select to something that can use >> one >> of the indexes. >> >> Regards >> -------------------------------- >> Mike Epprecht, Microsoft SQL Server MVP >> Zurich, Switzerland >> >> IM: m***@epprecht.net >> >> MVP Program: http://www.microsoft.com/mvp >> >> Blog: http://www.msmvps.com/epprecht/ >> >> "Rich" <no@spam.invalid> wrote in message >> news:G_uGe.53624$4o.5775@fed1read06... >> > CREATE TABLE Sales1 >> > ( >> > varchar (10) CustID, >> > varchar (10) TransID, >> > datetime SaleDate, >> > money S1, >> > money S2, >> > money S3, >> > money S4, >> > numeric V1 >> > ) >> > >> > CREATE NONCLUSTERED INDEX Sales1_CustID ON Sales1 (CustID) >> > >> > CREATE NONCLUSTERED INDEX Sales1_SaleDate ON Sales1 (SaleDate) >> > >> > "money" is just the right size for my fields. This table has 9,500,000 >> > records. >> > >> > Although I need to do this select in less than a half second, it takes >> > 1 >> > full minute: >> > SELECT MAX(S1) FROM Sales1 WHERE SaleDate > '1/1/2005' >> > >> > And I need to do this select in less than a half second, but it takes 3 >> > minutes: >> > SELECT AVG(S1 / S2) FROM Sales1 WHERE S3 > S4 >> > >> > Am I supposed to create a new field with these values pre-calculated? I >> > hope >> > not, because I have several other formulas - up to 500 different types > of >> > selects which are all similar. >> > >> > I am testing with SQL 2000 Developer. XEON CPU, fast SATA hard drive >> > and >> > 2GB >> > of memory. >> > >> > Thanks >> > >> > >> > >> >> > > Thanks I'll give that a try. But I have to recalculate those 500 fields
every 5 minutes... do you suppose that "Real Time OLAP" would help in this case? Show quote "Colin Dawson" <newsgro***@cjdawson.com> wrote in message 9,500,000news:iJxGe.79457$G8.17569@text.news.blueyonder.co.uk... > How about thinking a little outside the norm. Denormalisation may be useful > here. > The idea is that you have a table which contains the results of the query > already done. > > Then you could add triggers or maybe a SQL Job to re-calculate the > aggregates, and fill in a table. > > Averages (or more correctly the mean value) can be calculated easily, during > the insert. Update's and delete's may be a little more tricky, but then > that's there the JOB comes in. If it's not critical that the aggregate is > calculated NOW, but can be done lets say once a day then a SQL Job could be > setup to run re-calculate every so often. Does it really matter if it takes > a minute if no one is sat there waiting for it? OK, not what you wanted to > hear I know, but it's a possibility. > > Another thought is that you may want to take a look into using composite > indexes. For you second query, try this index it may help to improve the > performance (my SQL box is at work so I can't try it) > > Before you create the index, run the query and get SQL to display the Actual > Execution plan. You'll notice that the Query is performing a Table Scan, > (or if you've now created a clustered index, as Clustered Index scan). Use > SQL Profiler to take a reading of the Number of Reads that SQL needs to > execute your query. > > Now Create the following index > > Create NonClustered Index Sales1_S3_S4_S1_S2 On Sales1 ( S3, S4, S1, S2 ) > > If I've got it right, when you re-execute the query, the execution plan will > show that it's performing an Index Scan Or possibly even and Index Seek (but > I don't think this will happen) Now use SQL Profiler to get the number of > reads, you'll notice that it was alot lower. Also the execution time should > be lower too. Now turn off the Execution Plan, and stop the profiler to > get the production speed of the query. > > I know that I've rambled on a bit, but I hope it gives you some ideas. > > Regards > > Colin Dawson > www.cjdawson.com > "Rich" <no@spam.invalid> wrote in message > news:EjxGe.53631$4o.43600@fed1read06... > > Shoot... SQL server might not work for me. I have to be able to calculate > > sums, averages, etc. based on a number of different fields, and there are > > millions of records. I have to get results within milliseconds. Is there a > > way I can tell SQL server to load *everything* into memory? > > > > "Mike Epprecht (SQL MVP)" <m***@epprecht.net> wrote in message > > news:OVlgKgHlFHA.1440@TK2MSFTNGP14.phx.gbl... > >> Hi > >> > >> The 1st query can use the index your created on SaleDate. > >> > >> The 2nd query has no option, but to read every row (9.5 million) in the > >> table to find where S3 > S4, and then based on that subset, so the S1/S2 > >> evaluation. > >> Either add and index, to restrict the select to something that can use > >> one > >> of the indexes. > >> > >> Regards > >> -------------------------------- > >> Mike Epprecht, Microsoft SQL Server MVP > >> Zurich, Switzerland > >> > >> IM: m***@epprecht.net > >> > >> MVP Program: http://www.microsoft.com/mvp > >> > >> Blog: http://www.msmvps.com/epprecht/ > >> > >> "Rich" <no@spam.invalid> wrote in message > >> news:G_uGe.53624$4o.5775@fed1read06... > >> > CREATE TABLE Sales1 > >> > ( > >> > varchar (10) CustID, > >> > varchar (10) TransID, > >> > datetime SaleDate, > >> > money S1, > >> > money S2, > >> > money S3, > >> > money S4, > >> > numeric V1 > >> > ) > >> > > >> > CREATE NONCLUSTERED INDEX Sales1_CustID ON Sales1 (CustID) > >> > > >> > CREATE NONCLUSTERED INDEX Sales1_SaleDate ON Sales1 (SaleDate) > >> > > >> > "money" is just the right size for my fields. This table has Show quote > >> > records. > >> > > >> > Although I need to do this select in less than a half second, it takes > >> > 1 > >> > full minute: > >> > SELECT MAX(S1) FROM Sales1 WHERE SaleDate > '1/1/2005' > >> > > >> > And I need to do this select in less than a half second, but it takes 3 > >> > minutes: > >> > SELECT AVG(S1 / S2) FROM Sales1 WHERE S3 > S4 > >> > > >> > Am I supposed to create a new field with these values pre-calculated? I > >> > hope > >> > not, because I have several other formulas - up to 500 different types > > of > >> > selects which are all similar. > >> > > >> > I am testing with SQL 2000 Developer. XEON CPU, fast SATA hard drive > >> > and > >> > 2GB > >> > of memory. > >> > > >> > Thanks > >> > > >> > > >> > > >> > >> > > > > > > Honestly, I've not looked at OLAP yet. I really think that I should do at
some point soon. re-calculating 500 fields every five mins! ouch. I know that this idea could end up in a complete tangle, but maybe it would be worth splitting the workload down. For example if you're working out averages, use the real data to work out the averages for portions of the data, then to get the average for the whole, use the results from the portions to calculate the whole. Also does the entire table have to be re-calculated? If not, don't do the whole table. For example.... Say that you're talking about sales for the last week. Today is Sunday, and as the information is entered into the system each day, and the previous days information doesn't get altered, there's no need to re-calulate yesterdays average, it's static. Calculating the average for the week so far is as simple as calculating the average for monday to saturday - i.e. The six seperate averages for added together, then divided by 6. Then all you need to do is include todays average, which will only ever be on a small dataset compared to the entire table. Bet this can be calulated in a matter of ms. I can't give you any specifics on how to get the best performance from your database, for the simple reasong that I don't have a full and complete understanding of your data. This is the key to getting maximum performance out of any database. Regards Colin Dawson www.cjdawson.com "Rich" <no@spam.invalid> wrote in message news:exyGe.53638$4o.28608@fed1read06...Show quote > Thanks I'll give that a try. But I have to recalculate those 500 fields > every 5 minutes... do you suppose that "Real Time OLAP" would help in this > case? > > "Colin Dawson" <newsgro***@cjdawson.com> wrote in message > news:iJxGe.79457$G8.17569@text.news.blueyonder.co.uk... >> How about thinking a little outside the norm. Denormalisation may be > useful >> here. >> The idea is that you have a table which contains the results of the query >> already done. >> >> Then you could add triggers or maybe a SQL Job to re-calculate the >> aggregates, and fill in a table. >> >> Averages (or more correctly the mean value) can be calculated easily, > during >> the insert. Update's and delete's may be a little more tricky, but then >> that's there the JOB comes in. If it's not critical that the aggregate >> is >> calculated NOW, but can be done lets say once a day then a SQL Job could > be >> setup to run re-calculate every so often. Does it really matter if it > takes >> a minute if no one is sat there waiting for it? OK, not what you wanted > to >> hear I know, but it's a possibility. >> >> Another thought is that you may want to take a look into using composite >> indexes. For you second query, try this index it may help to improve the >> performance (my SQL box is at work so I can't try it) >> >> Before you create the index, run the query and get SQL to display the > Actual >> Execution plan. You'll notice that the Query is performing a Table Scan, >> (or if you've now created a clustered index, as Clustered Index scan). > Use >> SQL Profiler to take a reading of the Number of Reads that SQL needs to >> execute your query. >> >> Now Create the following index >> >> Create NonClustered Index Sales1_S3_S4_S1_S2 On Sales1 ( S3, S4, S1, S2 ) >> >> If I've got it right, when you re-execute the query, the execution plan > will >> show that it's performing an Index Scan Or possibly even and Index Seek > (but >> I don't think this will happen) Now use SQL Profiler to get the number >> of >> reads, you'll notice that it was alot lower. Also the execution time > should >> be lower too. Now turn off the Execution Plan, and stop the profiler >> to >> get the production speed of the query. >> >> I know that I've rambled on a bit, but I hope it gives you some ideas. >> >> Regards >> >> Colin Dawson >> www.cjdawson.com >> "Rich" <no@spam.invalid> wrote in message >> news:EjxGe.53631$4o.43600@fed1read06... >> > Shoot... SQL server might not work for me. I have to be able to > calculate >> > sums, averages, etc. based on a number of different fields, and there > are >> > millions of records. I have to get results within milliseconds. Is >> > there > a >> > way I can tell SQL server to load *everything* into memory? >> > >> > "Mike Epprecht (SQL MVP)" <m***@epprecht.net> wrote in message >> > news:OVlgKgHlFHA.1440@TK2MSFTNGP14.phx.gbl... >> >> Hi >> >> >> >> The 1st query can use the index your created on SaleDate. >> >> >> >> The 2nd query has no option, but to read every row (9.5 million) in > the >> >> table to find where S3 > S4, and then based on that subset, so the > S1/S2 >> >> evaluation. >> >> Either add and index, to restrict the select to something that can use >> >> one >> >> of the indexes. >> >> >> >> Regards >> >> -------------------------------- >> >> Mike Epprecht, Microsoft SQL Server MVP >> >> Zurich, Switzerland >> >> >> >> IM: m***@epprecht.net >> >> >> >> MVP Program: http://www.microsoft.com/mvp >> >> >> >> Blog: http://www.msmvps.com/epprecht/ >> >> >> >> "Rich" <no@spam.invalid> wrote in message >> >> news:G_uGe.53624$4o.5775@fed1read06... >> >> > CREATE TABLE Sales1 >> >> > ( >> >> > varchar (10) CustID, >> >> > varchar (10) TransID, >> >> > datetime SaleDate, >> >> > money S1, >> >> > money S2, >> >> > money S3, >> >> > money S4, >> >> > numeric V1 >> >> > ) >> >> > >> >> > CREATE NONCLUSTERED INDEX Sales1_CustID ON Sales1 (CustID) >> >> > >> >> > CREATE NONCLUSTERED INDEX Sales1_SaleDate ON Sales1 (SaleDate) >> >> > >> >> > "money" is just the right size for my fields. This table has > 9,500,000 >> >> > records. >> >> > >> >> > Although I need to do this select in less than a half second, it > takes >> >> > 1 >> >> > full minute: >> >> > SELECT MAX(S1) FROM Sales1 WHERE SaleDate > '1/1/2005' >> >> > >> >> > And I need to do this select in less than a half second, but it >> >> > takes > 3 >> >> > minutes: >> >> > SELECT AVG(S1 / S2) FROM Sales1 WHERE S3 > S4 >> >> > >> >> > Am I supposed to create a new field with these values >> >> > pre-calculated? > I >> >> > hope >> >> > not, because I have several other formulas - up to 500 different > types >> > of >> >> > selects which are all similar. >> >> > >> >> > I am testing with SQL 2000 Developer. XEON CPU, fast SATA hard drive >> >> > and >> >> > 2GB >> >> > of memory. >> >> > >> >> > Thanks >> >> > >> >> > >> >> > >> >> >> >> >> > >> > >> >> > > Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in your schema are. Sample data is also a good idea, along with clear specifications. It is very hard to debug code when you do not let us see it. But based on your weird, vague personal language: 1) You have no key because in NULL-able 2) You don't know the proper format for dates in SQL 3) You don't know about the math errors in the deprecated MONEY data type - use DECIMAL(s,p) instead. 4) You don't know how to declare NUMERIC (s,p) data 5) Either you were sloppy or you don't know what a repeating group and 1NF are. 6) You don't know that rows are not records and columns are not fields. if you cared about the people helping your for free, would your DDL look like this? CREATE TABLE SalesTransactions (trans_nbr CHAR(10) NOT NULL PRIMARY KEY, cust_id CHAR(10) NOT NULL REFERENCES Customers (cust_id) ON UPDATE CASCADE, sales_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL CHECK (<<SET to 00:00:00 Hrs >>), repeating_grp_1 DECIMAL (12,4) DEFAULT 0.00 NOT NULL, repeating_grp_2 DECIMAL (12,4) DEFAULT 0.00 NOT NULL, repeating_grp_3 DECIMAL (12,4) DEFAULT 0.00 NOT NULL, repeating_grp_4 DECIMAL (12,4) DEFAULT 0.00 NOT NULL, vague_thing_1 NUMERIC(12,4) DEFAULT 0.00 NOT NULL); You can use a CASE expession with aggregates to do all of these things in one statement. You will simply have to make a table scan for some of them, so indexing will not help. Do them all at once or put them in a VIEW. CREATE VIEW Godzilla (..) AS SELECT MAX(CASE WHEN sales_date SaleDate > '2005-01-01' THEN repeating_grp_1 ELSE NULL END) AS max_1, AVG (CASE WHEN repeating_grp_3 > repeating_grp_S4 THEN repeating_grp_1/repeating_grp_2) ELSE NULL END AS foobar_avg, etc. FROM SalesTransactions1; But this table feels like a set of design flaws. |
|||||||||||||||||||||||