|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Handling large amounts of data for reporting?I have an application which logs a considerable amount of data. Each day, we log about 50,000 to 100,000 rows of data. We like to report on this data... currently I'm using a stored procedure to calculate the statistics, however since this is an ad hoc, reports take a while to generate. So how do you guys handle large amounts of data? Is there a good way to precalculate a a set of statistics to handle ad hoc queries (i.e. By Hour, By Day, By Week, By Month). Our application also provides near realtime statistics... so precalculation has to be done on a continual basis. Does ..NET have any statistics classes that might help out with this sort of thing? I don't think .NET performance counters will work since they don't log persistent data. Any ideas? Thanks! -- Lucas Tam (REMOVEn***@rogers.com) Please delete "REMOVE" from the e-mail address when replying. http://members.ebay.com/aboutme/coolspot18/ > So how do you guys handle large amounts of data? Usually, you can run queries in the background that will aggregate to the various dimensions you want to measure (e.g. ). In one current system, we aggregate data by hour, then when someone wants to do a "this week" or "this month" type of query, we simply union the aggregated data (super fast) from before the last hour where stats were rolled up, against the raw data (which is also relatively fast, using the same kind of query that would "roll it up") and then apply group by to the derived table. This way the majority of the crunching is already done, and you are applying operations on raw data to only the last hour or less. You might also look at Analysis Services or other BI products, but I think the lag of rolling data into cubes/dimensions will fail your realtime requirement. You may also want to explore partitioned views depending on the types
of queries you're looking at; we partion our data daily (about 800k rows a day), and the optimizer knows wehre to look for the data we want to retrieve. Stu Do you need to report against a big table that grows 100K rows daily or
do you need to report against a table that has 100K rows? 100,000 rows of data is usually NOT a large amount on decent hardware. "AK" <AK_TIREDOFSPAM@hotmail.COM> wrote in news:1125092267.248510.101290 @g49g2000cwa.googlegroups.com:> Do you need to report against a big table that grows 100K rows daily or I need to report against a table that grows by ~100K row daily.> do you need to report against a table that has 100K rows? > > 100,000 rows of data is usually NOT a large amount on decent hardware. -- Lucas Tam (REMOVEn***@rogers.com) Please delete "REMOVE" from the e-mail address when replying. http://members.ebay.com/aboutme/coolspot18/ Summarized datasets, federated views, and OLAP cubes some of the basic
components, but the general solution is called data warehousing. Basically, the concept is to export a pre-calculated and pre-summarized subset of data in the OLTP system another database or server for the purpose of reporting or analysis. The goal is to design a system that is optimized to suit your own specific analytic needs, so the appropriate implementation details can very from one warehouse to the next. This document provides a good overview: http://userfs.cec.wustl.edu/~cse530/2004/Data-Warehousing-Combined.ppt Show quote "Lucas Tam" <REMOVEn***@rogers.com> wrote in message news:Xns96BEAE3D2946Bnntprogerscom@127.0.0.1... > Hi all, > > > I have an application which logs a considerable amount of data. Each day, > we log about 50,000 to 100,000 rows of data. > > We like to report on this data... currently I'm using a stored procedure > to > calculate the statistics, however since this is an ad hoc, reports take a > while to generate. > > So how do you guys handle large amounts of data? Is there a good way to > precalculate a a set of statistics to handle ad hoc queries (i.e. By Hour, > By Day, By Week, By Month). Our application also provides near realtime > statistics... so precalculation has to be done on a continual basis. Does > .NET have any statistics classes that might help out with this sort of > thing? I don't think .NET performance counters will work since they don't > log persistent data. > > Any ideas? > > Thanks! > > -- > Lucas Tam (REMOVEn***@rogers.com) > Please delete "REMOVE" from the e-mail address when replying. > http://members.ebay.com/aboutme/coolspot18/ "JT" <some***@microsoft.com> wrote in Hmmm... that's the problem.news:Oq9egaKrFHA.3452@TK2MSFTNGP14.phx.gbl: > The goal is to design a system that is optimized to suit your > own specific analytic needs, so the appropriate implementation details > can very from one warehouse to the next. Alot of the data our application generates is being used by external clients, and they all have differing reporting requirements. Not to mention, management seems to change the way queries are calculated every so often. Any ideas? Would the Star/Snowflake structures be my best solution? Thanks! -- Lucas Tam (REMOVEn***@rogers.com) Please delete "REMOVE" from the e-mail address when replying. Newmarket Volvo Sucks! http://newmarketvolvo.tripod.com |
|||||||||||||||||||||||