|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Optimizing this crazy setup...scans hundreds of meters in a single run, each meter containing 40 individual data acquisitions, each acquisiton containing 36 or so data values. Needless to say, this results in a bunch of data, and my current looping simply can't handle it. I'm writing a C# utility to compile matrices of data that represent the run taken, and performing some statistical analysis on it. After profiling the code with a large enough dataset (our test runs were just a few meters and ran quickly), I found the bottleneck, the matrix creation. Table Layout (from parent to child): [Strip (run) - holds efficiency] - [Acquisition - holds time, distance] - [PSPCData - holds location in acquisition, data value]. I currently SELECT all acquisitions from each strip. I loop through each one in C#, and perform a SELECT on PSPCData where the acquisition's GUID matches, in the order specified. I then take each Data value, divide by time and then divide that by the Strip's efficiency. I place it into a huge Matrix. Needless to say, if you're doing the math, I'm looping through 4000 acquisitions, doing a SELECT statement for each, performing calculations on it, etc. And this is taking forever. On my development box, it takes but 5-10 minutes or so, but on the handheld machines to be used in the field (400MB memory, 1.3GHz processor), it can take up to an hour. That's far too long, these need to generate in a few minutes max. So, I'm not sure how to proceed. Obviously the bottleneck is in the sheer amount of queries being performed to the DB. I lack the SQL experience to know all the tricks of the trade, so I'm begging for help. I really need to ask the database for a matrix of data, where each value has already had the calibration (time, efficiency) applied to it. Only SQL isn't great for returning matrices, so I'm unsure of how to accomplish this in the most efficient way. Anyone have any ideas? You'll win my eternal gratitude! :) David Harris "Dave" <omgro***@gmail.com> wrote in message .. . .news:1156515328.473273.99550@m79g2000cwm.googlegroups.com... > Ok, here's the skinny. We survey surfaces for radiation. Our equipment > scans hundreds of meters in a single run, each meter containing 40 > individual data acquisitions, each acquisiton containing 36 or so data > values. Needless to say, this results in a bunch of data, and my > current looping simply can't handle it. > > I think you primarilly need help formulating a query to return this data all > I really need to ask the database for a matrix of data, where each > value has already had the calibration (time, efficiency) applied to it. > Only SQL isn't great for returning matrices, so I'm unsure of how to > accomplish this in the most efficient way. Anyone have any ideas? > You'll win my eternal gratitude! :) > at once. To have enough information to help you need to post table DDL and sample inserts, and describe the matrix loading process in some detail, perhaps positing the C# definition of the type you are trying to load. David Also, even if you had to do all this calculating with C#, doing it on the
local device is likely a big mistake, as you have to round trip over the network to the database for every row. At a minimum you should be retrieving the entire data set then processing the rows one at a time on an application or web server, and only returning the final matrix to the client. Ideally, everything would be done on the database and returned to the callling application without any extra processing after the fact, but this may not be realistic. Whatever math you are doing in C# could almost certainly be done in a stored procedure, and possibly with a single SQL statement. 4000 rows with 36 collumns each is relatively little data. Post your table structures, and the required calculations, along with a small data sample (a dozen rows should be fine) and the expected results. Your C# code may or may not help, but it couldn't hurt to attach it as a text file. For information on posting DDL and sample data, click the link below... http://www.aspfaq.com/etiquette.asp?id=5006 Show quote "David Browne" <davidbaxterbrowne no potted m***@hotmail.com> wrote in message news:OYHOsHFyGHA.5068@TK2MSFTNGP02.phx.gbl... > > "Dave" <omgro***@gmail.com> wrote in message > news:1156515328.473273.99550@m79g2000cwm.googlegroups.com... > > Ok, here's the skinny. We survey surfaces for radiation. Our equipment > > scans hundreds of meters in a single run, each meter containing 40 > > individual data acquisitions, each acquisiton containing 36 or so data > > values. Needless to say, this results in a bunch of data, and my > > current looping simply can't handle it. > > > > . . . > > > > > I really need to ask the database for a matrix of data, where each > > value has already had the calibration (time, efficiency) applied to it. > > Only SQL isn't great for returning matrices, so I'm unsure of how to > > accomplish this in the most efficient way. Anyone have any ideas? > > You'll win my eternal gratitude! :) > > > > I think you primarilly need help formulating a query to return this data all > at once. To have enough information to help you need to post table DDL and > sample inserts, and describe the matrix loading process in some detail, > perhaps positing the C# definition of the type you are trying to load. > > > > David > > Jim Underwood wrote:
> Ideally, everything would be done on the database and returned to the See my reply to David Browne, I'm not able to get the DDL right away,> callling application without any extra processing after the fact, but this > may not be realistic. Whatever math you are doing in C# could almost > certainly be done in a stored procedure, and possibly with a single SQL > statement. but I'll post it as soon as I get there. I'd love to be using a server to process all the data, but unfortunately, we don't have the liberty of using networks at all, as many of the sites we scan for radiation are secure or government facilities, which don't even allow infrared. That actually caused a lot of problems in data warehousing, which has yet to be implemented well enough for field use. I agree that the math is simple and can easily be done in SQL Server, my problem is lack of experience with T-SQL in creating something that compile a usable table in C#. The statistical manipulations I do to the matrix after it is built are extremely complex and should be done in C#, but they also take up about 1% of the current processing time, which is negligible at this point. If I can get SQL to generate the matrix, then I can get processing time back under a minute, which is more than acceptable. I'm hoping the little I was able to post from memory will at least get an idea or two going. Hopefully I can go get the DDL within the next few hours... Thanks for the help! First, naming your GUIDs for the tables that are in makes things easier. If
a key always has the same name no matter what table it appears in then joining becomes much easier. The SQL will be much mroe intuitive to anyone else trying to understand it. PSPCData ----------------- AcquisitionGUID - varchar(60) BinIndex - int(4) Data - int(4) Acquisition ----------------- AcquisitionGUID - varchar(60) StripGUID - varchar(55) AcqNumber - int(4) TimeMs - int(4) Strip ------------------ StripGUID - varchar(55) SurveysGUID - varchar(50) StripNumber - int(4) Efficiency - float(8) Surveys ------------------- SurveysGUID - varchar(50) SurveyName - varchar(40) Based on the scaled down table definitions you provided, you might start with a view like this... create view dbo.SurveyDataView as Select PSPC.AcquisitionGUID , PSPC.BinIndex , PSPC.Data , Acq.AcqNumber , Acq.TimeMs , Strip.StripGUID , Strip.StripNumber , Strip.Efficiency , Surv.SurveysGUID , Surv.SurveyName from Acquisition as Acq inner join PSPCData as PSPC on Acq.AcquisitionGUID = PSPC.AcquisitionGUID inner join Strip as Strip on Acq.StripGUID = Strip.StripGUID inner join Surveys as Surv on Strip.SurveysGUID = Surv.SurveysGUID Now this view is returning all of your data in a single record set. Just filter by SurveysGUID. Select <column list> from dbo.SurveyDataView as surv where surv.SurveysGUID = 'guidvalue' Return this recordset to your client and access one row at a time, looping through the recordset. This should speed things up a bit, although you can still do a lot more with the SQL. We can begin doing some of the math like such... select AcquisitionGUID , BinIndex , Data , AcqNumber , TimeMs , StripGUID , StripNumber , Efficiency , SurveysGUID , SurveyName , (TimeMs * 1000) as CPS (TimeMs * 60 / 1000) as CPM (TimeMs * 60 / 1000) / Efficiency as DPM from dbo.SurveyDataView order by StripNumber Now you have your dataset, with all the rows, and some priliminary calculations, ordered by StripNumber. How much close does that get you to what you need? Show quote "Dave" <omgro***@gmail.com> wrote in message news:1156519274.279572.154990@h48g2000cwc.googlegroups.com... > Jim Underwood wrote: > > Ideally, everything would be done on the database and returned to the > > callling application without any extra processing after the fact, but this > > may not be realistic. Whatever math you are doing in C# could almost > > certainly be done in a stored procedure, and possibly with a single SQL > > statement. > > See my reply to David Browne, I'm not able to get the DDL right away, > but I'll post it as soon as I get there. I'd love to be using a server > to process all the data, but unfortunately, we don't have the liberty > of using networks at all, as many of the sites we scan for radiation > are secure or government facilities, which don't even allow infrared. > That actually caused a lot of problems in data warehousing, which has > yet to be implemented well enough for field use. I agree that the math > is simple and can easily be done in SQL Server, my problem is lack of > experience with T-SQL in creating something that compile a usable table > in C#. The statistical manipulations I do to the matrix after it is > built are extremely complex and should be done in C#, but they also > take up about 1% of the current processing time, which is negligible at > this point. If I can get SQL to generate the matrix, then I can get > processing time back under a minute, which is more than acceptable. I'm > hoping the little I was able to post from memory will at least get an > idea or two going. Hopefully I can go get the DDL within the next few > hours... Thanks for the help! > David Browne wrote:
> I think you primarilly need help formulating a query to return this data all Forgive me here, I'm not beside the data right now, so if you need> at once. To have enough information to help you need to post table DDL and > sample inserts, and describe the matrix loading process in some detail, > perhaps positing the C# definition of the type you are trying to load. actual INSERTs, then I can get that later. Also if you wish me to set up an example database (using SQL Server 2005), I'm willing to do that and send the backup via email. Basically, I have a 2d array of int's in C#, where I need the final values. It's declared as: int[,] maindata = new int[ totalacqs, totalpixels ]; where totalacqs is the total number of acquisitions (zero-based so no need for -1), and totalpixels is the length of each acquisition in data points. I'm thinking that if I can get a table back with the columns (x-location, y-location, value) (specifying index in the 2d array and the value), that will be a very quick single loop to do. But I'm open to whatever option someone wishes to provide. You should note: - All columns NOT NULL - GUID is my own special varchar identifier... totally inefficient but believe me it's the only way things work with our data warehousing - All relationships are obvious from the GUID columns - These tables are scaled down. Most of them hold 20 odd columns of other data, not relevant to the current setup Here are the tables: PSPCData ----------------- AcquisitionGUID - varchar(60) BinIndex - int(4) Data - int(4) Acquisition ----------------- GUID - varchar(60) StripGUID - varchar(55) AcqNumber - int(4) TimeMs - int(4) Strip ------------------ GUID - varchar(55) SurveysGUID - varchar(50) StripNumber - int(4) Efficiency - float(8) Surveys ------------------- GUID - varchar(50) SurveyName - varchar(40) ------------------------------------------------- Basically, I wish to give it a surveyGUID, and have it return the matrix table with values representing disintegrations per minute (dpm). I will show you how this is calculated: The value Data in PSPCData is in "counts" (number of pulses we receive for the detector). This can be put into "counts per second (cps)" by dividing by (TimeMs of Acquisition / 1000 (to put into seconds)). This can become "counts per minute (cpm)" by multiplying by 60. DPM is accomplished by dividing the cpm value by the Efficiency of that strip. The matrix should proceed from 0 to upper bound of the X dimension in order of Strips (select all strips with SurveyGUID = what I specify, order by StripNumber), then for each strip in order of Acquisitions (by AcqNumber), and then the Y dimension will be the calculated dpm values of PSPCData, ordered by BinIndex. All that make sense? Let me know if you wish an example DB to look at, I can send when I get back to my desk. Thanks again, David Harris
Show quote
"Dave" <omgro***@gmail.com> wrote in message I think you really just need to join Surveys, Strip, Acquisition and news:1156518481.053690.166080@75g2000cwc.googlegroups.com... > David Browne wrote: >> I think you primarilly need help formulating a query to return this data >> all >> at once. To have enough information to help you need to post table DDL >> and >> sample inserts, and describe the matrix loading process in some detail, >> perhaps positing the C# definition of the type you are trying to load. > > Forgive me here, I'm not beside the data right now, so if you need > actual INSERTs, then I can get that later. Also if you wish me to set > up an example database (using SQL Server 2005), I'm willing to do that > and send the backup via email. Basically, I have a 2d array of int's in > C#, where I need the final values. It's declared as: int[,] maindata = > new int[ totalacqs, totalpixels ]; where totalacqs is the total number > of acquisitions (zero-based so no need for -1), and totalpixels is the > length of each acquisition in data points. I'm thinking that if I can > get a table back with the columns (x-location, y-location, value) > (specifying index in the 2d array and the value), that will be a very > quick single loop to do. But I'm open to whatever option someone wishes > to provide. > > You should note: > - All columns NOT NULL > - GUID is my own special varchar identifier... totally inefficient but > believe me it's the only way things work with our data warehousing > - All relationships are obvious from the GUID columns > - These tables are scaled down. Most of them hold 20 odd columns of > other data, not relevant to the current setup > PSPCData, do a little arithmetic and order the results. Then load the data structure relying on the ordering of the query results. Here's an example using the tables you described. The Utils.Sequence function is used to generate sample data. David create schema Utils go create FUNCTION [Utils].[Sequence]( @from BIGINT, @to BIGINT ) RETURNS @Sequence TABLE(Value int primary key) AS BEGIN WITH L0(c) AS ( SELECT 1 UNION ALL SELECT 1 ), L1(c) AS ( SELECT 1 FROM L0 AS A, L0 AS B ), L2(c) AS( SELECT 1 FROM L1 AS A, L1 AS B ), L3(c) AS( SELECT 1 FROM L2 AS A, L2 AS B ), L4(c) AS( SELECT 1 FROM L3 AS A, L3 AS B ), L5(c) AS( SELECT 1 FROM L4 AS A, L4 AS B ), L6(c) AS( SELECT 1 FROM L5 AS A, L5 AS B ), Nums AS ( SELECT ROW_NUMBER() OVER (ORDER BY c) AS n FROM L5 with (noexpand) ) insert into @Sequence(Value) SELECT n+@from-1 AS Value FROM Nums WHERE n <= @to-@from+1 RETURN END go create table Surveys ( GUID varchar(50) primary key, SurveyName varchar(40) not null ) create table Strip ( GUID varchar(55) not null primary key, SurveysGUID varchar(50) not null references Surveys, StripNumber int not null, Efficiency float not null ) create table Acquisition ( GUID varchar(60) not null primary key, StripGUID varchar(55) not null references Strip, AcqNumber int not null, TimeMs int not null ) create table PSPCData ( AcquisitionGUID varchar(60) references Acquisition, BinIndex int not null, Data int, primary key (AcquisitionGUID, BinIndex) ) go insert into surveys (guid,surveyname) values ('survey a', 'survey a') insert into strip(GUID,SurveysGUID,StripNumber, Efficiency) select s.GUID + '-' + cast(seq.value as varchar(10)), s.GUID, seq.Value, (100 - ((seq.Value * 13) % 100)) / 100.0 Efficiency from Surveys s cross apply utils.Sequence(0,99) seq insert into Acquisition (GUID, StripGuid, AcqNumber, TimeMs) select s.GUID + '-' + cast(seq.value as varchar(10)) GUID, s.GUID StripGuid, seq.Value AcqNumber, 1+ ((seq.Value * 17) % 200) TimeMs from Strip s cross apply utils.Sequence(0,7 + s.StripNumber % 15) seq order by StripGuid, AcqNumber insert into PSPCData(AcquisitionGUID,Binindex,Data) select a.GUID AcquisitionGUID, seq.Value Binindex, 1 + ((a.TimeMs * seq.Value * 27) % 87) Data from Acquisition a cross apply utils.Sequence(0,99) seq order by AcquisitionGUID, BinIndex go select Surveys.SurveyName, Strip.StripNumber, d.BinIndex, 60 * d.Data / (a.TimeMS/1000.0) * Strip.Efficiency DPM from Surveys join Strip on Strip.SurveysGUID = Surveys.GUID join Acquisition a on a.StripGuid = Strip.GUID join PSPCData d on d.AcquisitionGUID = a.GUID order by Strip.StripNumber, a.AcqNumber, d.BinIndex David Browne wrote:
> I think you really just need to join Surveys, Strip, Acquisition and Thank you David and Jim... I didn't even think it'd be easy as crafting> PSPCData, do a little arithmetic and order the results. Then load the data > structure relying on the ordering of the query results. a good SELECT statement, but I now have this utility running in under a minute on the slow handheld PC, and everyone is breathing sighs of relief. :) Appreciate your help very much! David Harris |
|||||||||||||||||||||||