Home All Groups Group Topic Archive Search About

Optimizing this crazy setup...

Author
25 Aug 2006 2:15 PM
Dave
Ok, here's the skinny. We survey surfaces for radiation. Our equipment
scans hundreds of meters in a single run, each meter containing 40
individual data acquisitions, each acquisiton containing 36 or so data
values. Needless to say, this results in a bunch of data, and my
current looping simply can't handle it.

I'm writing a C# utility to compile matrices of data that represent the
run taken, and performing some statistical analysis on it. After
profiling the code with a large enough dataset (our test runs were just
a few meters and ran quickly), I found the bottleneck, the matrix
creation.

Table Layout (from parent to child): [Strip (run) - holds efficiency] -
[Acquisition - holds time, distance] - [PSPCData - holds location in
acquisition, data value].

I currently SELECT all acquisitions from each strip. I loop through
each one in C#, and perform a SELECT on PSPCData where the
acquisition's GUID matches, in the order specified. I then take each
Data value, divide by time and then divide that by the Strip's
efficiency. I place it into a huge Matrix.

Needless to say, if you're doing the math, I'm looping through 4000
acquisitions, doing a SELECT statement for each, performing
calculations on it, etc. And this is taking forever. On my development
box, it takes but 5-10 minutes or so, but on the handheld machines to
be used in the field (400MB memory, 1.3GHz processor), it can take up
to an hour. That's far too long, these need to generate in a few
minutes max.

So, I'm not sure how to proceed. Obviously the bottleneck is in the
sheer amount of queries being performed to the DB. I lack the SQL
experience to know all the tricks of the trade, so I'm begging for
help.

I really need to ask the database for a matrix of data, where each
value has already had the calibration (time, efficiency) applied to it.
Only SQL isn't great for returning matrices, so I'm unsure of how to
accomplish this in the most efficient way. Anyone have any ideas?
You'll win my eternal gratitude! :)

David Harris

Author
25 Aug 2006 2:22 PM
David Browne
"Dave" <omgro***@gmail.com> wrote in message
news:1156515328.473273.99550@m79g2000cwm.googlegroups.com...
> Ok, here's the skinny. We survey surfaces for radiation. Our equipment
> scans hundreds of meters in a single run, each meter containing 40
> individual data acquisitions, each acquisiton containing 36 or so data
> values. Needless to say, this results in a bunch of data, and my
> current looping simply can't handle it.
>

.. . .

>
> I really need to ask the database for a matrix of data, where each
> value has already had the calibration (time, efficiency) applied to it.
> Only SQL isn't great for returning matrices, so I'm unsure of how to
> accomplish this in the most efficient way. Anyone have any ideas?
> You'll win my eternal gratitude! :)
>

I think you primarilly need help formulating a query to return this data all
at once.  To have enough information to help you need to post table DDL and
sample inserts, and describe the matrix loading process in some detail,
perhaps positing the C# definition of the type you are trying to load.



David
Author
25 Aug 2006 3:05 PM
Jim Underwood
Also, even if you had to do all this calculating with C#, doing it on the
local device is likely a big mistake, as you have to round trip over the
network to the database for every row.  At a minimum you should be
retrieving the entire data set then processing the rows one at a time on an
application or web server, and only returning the final matrix to the
client.

Ideally, everything would be done on the database and returned to the
callling application without any extra processing after the fact, but this
may not be realistic.  Whatever math you are doing in C# could almost
certainly be done in a stored procedure, and possibly with a single SQL
statement.

4000 rows with 36 collumns each is relatively little data.  Post your table
structures, and the required calculations, along with a small data sample (a
dozen rows should be fine) and the expected results.  Your C# code may or
may not help, but it couldn't hurt to attach it as a text file.  For
information on posting DDL and sample data, click the link below...

http://www.aspfaq.com/etiquette.asp?id=5006

Show quote
"David Browne" <davidbaxterbrowne no potted m***@hotmail.com> wrote in
message news:OYHOsHFyGHA.5068@TK2MSFTNGP02.phx.gbl...
>
> "Dave" <omgro***@gmail.com> wrote in message
> news:1156515328.473273.99550@m79g2000cwm.googlegroups.com...
> > Ok, here's the skinny. We survey surfaces for radiation. Our equipment
> > scans hundreds of meters in a single run, each meter containing 40
> > individual data acquisitions, each acquisiton containing 36 or so data
> > values. Needless to say, this results in a bunch of data, and my
> > current looping simply can't handle it.
> >
>
> . . .
>
> >
> > I really need to ask the database for a matrix of data, where each
> > value has already had the calibration (time, efficiency) applied to it.
> > Only SQL isn't great for returning matrices, so I'm unsure of how to
> > accomplish this in the most efficient way. Anyone have any ideas?
> > You'll win my eternal gratitude! :)
> >
>
> I think you primarilly need help formulating a query to return this data
all
> at once.  To have enough information to help you need to post table DDL
and
> sample inserts, and describe the matrix loading process in some detail,
> perhaps positing the C# definition of the type you are trying to load.
>
>
>
> David
>
>
Author
25 Aug 2006 3:21 PM
Dave
Jim Underwood wrote:
> Ideally, everything would be done on the database and returned to the
> callling application without any extra processing after the fact, but this
> may not be realistic.  Whatever math you are doing in C# could almost
> certainly be done in a stored procedure, and possibly with a single SQL
> statement.

See my reply to David Browne, I'm not able to get the DDL right away,
but I'll post it as soon as I get there. I'd love to be using a server
to process all the data, but unfortunately, we don't have the liberty
of using networks at all, as many of the sites we scan for radiation
are secure or government facilities, which don't even allow infrared.
That actually caused a lot of problems in data warehousing, which has
yet to be implemented well enough for field use. I agree that the math
is simple and can easily be done in SQL Server, my problem is lack of
experience with T-SQL in creating something that compile a usable table
in C#. The statistical manipulations I do to the matrix after it is
built are extremely complex and should be done in C#, but they also
take up about 1% of the current processing time, which is negligible at
this point. If I can get SQL to generate the matrix, then I can get
processing time back under a minute, which is more than acceptable. I'm
hoping the little I was able to post from memory will at least get an
idea or two going. Hopefully I can go get the DDL within the next few
hours... Thanks for the help!
Author
25 Aug 2006 3:41 PM
Jim Underwood
First, naming your GUIDs for the tables that are in makes things easier.  If
a key always has the same name no matter what table it appears in then
joining becomes much easier.  The SQL will be much mroe intuitive to anyone
else trying to understand it.

PSPCData
-----------------
AcquisitionGUID - varchar(60)
BinIndex - int(4)
Data - int(4)

Acquisition
-----------------
AcquisitionGUID - varchar(60)
StripGUID - varchar(55)
AcqNumber - int(4)
TimeMs - int(4)

Strip
------------------
StripGUID - varchar(55)
SurveysGUID - varchar(50)
StripNumber - int(4)
Efficiency - float(8)

Surveys
-------------------
SurveysGUID - varchar(50)
SurveyName - varchar(40)

Based on the scaled down table definitions you provided, you might start
with a view like this...

create view dbo.SurveyDataView as
Select
PSPC.AcquisitionGUID ,
PSPC.BinIndex ,
PSPC.Data ,
Acq.AcqNumber ,
Acq.TimeMs ,
Strip.StripGUID ,
Strip.StripNumber ,
Strip.Efficiency ,
Surv.SurveysGUID ,
Surv.SurveyName
from Acquisition  as Acq
inner join PSPCData as PSPC
  on Acq.AcquisitionGUID = PSPC.AcquisitionGUID
inner join Strip as Strip
  on Acq.StripGUID = Strip.StripGUID
inner join Surveys as Surv
  on Strip.SurveysGUID = Surv.SurveysGUID


Now this view is returning all of your data in a single record set.  Just
filter by SurveysGUID.

Select <column list> from dbo.SurveyDataView as surv
where surv.SurveysGUID = 'guidvalue'

Return this recordset to your client and access one row at a time, looping
through the recordset.  This should speed things up a bit, although you can
still do a lot more with the SQL.

We can begin doing some of the math like such...

select  AcquisitionGUID ,
BinIndex ,
Data ,
AcqNumber ,
TimeMs ,
StripGUID ,
StripNumber ,
Efficiency ,
SurveysGUID ,
SurveyName ,
(TimeMs * 1000) as CPS
(TimeMs * 60 / 1000) as CPM
(TimeMs * 60 / 1000) / Efficiency as DPM
from dbo.SurveyDataView
order by StripNumber

Now you have your dataset, with all the rows, and some priliminary
calculations, ordered by StripNumber.  How much close does that get you to
what you need?


Show quote
"Dave" <omgro***@gmail.com> wrote in message
news:1156519274.279572.154990@h48g2000cwc.googlegroups.com...
> Jim Underwood wrote:
> > Ideally, everything would be done on the database and returned to the
> > callling application without any extra processing after the fact, but
this
> > may not be realistic.  Whatever math you are doing in C# could almost
> > certainly be done in a stored procedure, and possibly with a single SQL
> > statement.
>
> See my reply to David Browne, I'm not able to get the DDL right away,
> but I'll post it as soon as I get there. I'd love to be using a server
> to process all the data, but unfortunately, we don't have the liberty
> of using networks at all, as many of the sites we scan for radiation
> are secure or government facilities, which don't even allow infrared.
> That actually caused a lot of problems in data warehousing, which has
> yet to be implemented well enough for field use. I agree that the math
> is simple and can easily be done in SQL Server, my problem is lack of
> experience with T-SQL in creating something that compile a usable table
> in C#. The statistical manipulations I do to the matrix after it is
> built are extremely complex and should be done in C#, but they also
> take up about 1% of the current processing time, which is negligible at
> this point. If I can get SQL to generate the matrix, then I can get
> processing time back under a minute, which is more than acceptable. I'm
> hoping the little I was able to post from memory will at least get an
> idea or two going. Hopefully I can go get the DDL within the next few
> hours... Thanks for the help!
>
Author
25 Aug 2006 3:08 PM
Dave
David Browne wrote:
> I think you primarilly need help formulating a query to return this data all
> at once.  To have enough information to help you need to post table DDL and
> sample inserts, and describe the matrix loading process in some detail,
> perhaps positing the C# definition of the type you are trying to load.

Forgive me here, I'm not beside the data right now, so if you need
actual INSERTs, then I can get that later. Also if you wish me to set
up an example database (using SQL Server 2005), I'm willing to do that
and send the backup via email. Basically, I have a 2d array of int's in
C#, where I need the final values. It's declared as: int[,] maindata =
new int[ totalacqs, totalpixels ]; where totalacqs is the total number
of acquisitions (zero-based so no need for -1), and totalpixels is the
length of each acquisition in data points. I'm thinking that if I can
get a table back with the columns (x-location, y-location, value)
(specifying index in the 2d array and the value), that will be a very
quick single loop to do. But I'm open to whatever option someone wishes
to provide.

You should note:
- All columns NOT NULL
- GUID is my own special varchar identifier... totally inefficient but
believe me it's the only way things work with our data warehousing
- All relationships are obvious from the GUID columns
- These tables are scaled down. Most of them hold 20 odd columns of
other data, not relevant to the current setup

Here are the tables:

PSPCData
-----------------
AcquisitionGUID - varchar(60)
BinIndex - int(4)
Data - int(4)

Acquisition
-----------------
GUID - varchar(60)
StripGUID - varchar(55)
AcqNumber - int(4)
TimeMs - int(4)

Strip
------------------
GUID - varchar(55)
SurveysGUID - varchar(50)
StripNumber - int(4)
Efficiency - float(8)

Surveys
-------------------
GUID - varchar(50)
SurveyName - varchar(40)
-------------------------------------------------

Basically, I wish to give it a surveyGUID, and have it return the
matrix table with values representing disintegrations per minute (dpm).
I will show you how this is calculated:

The value Data in PSPCData is in "counts" (number of pulses we receive
for the detector). This can be put into "counts per second (cps)" by
dividing by (TimeMs of Acquisition / 1000 (to put into seconds)). This
can become "counts per minute (cpm)" by multiplying by 60. DPM is
accomplished by dividing the cpm value by the Efficiency of that strip.

The matrix should proceed from 0 to upper bound of the X dimension in
order of Strips (select all strips with SurveyGUID = what I specify,
order by StripNumber), then for each strip in order of Acquisitions (by
AcqNumber), and then the Y dimension will be the calculated dpm values
of PSPCData, ordered by BinIndex.

All that make sense? Let me know if you wish an example DB to look at,
I can send when I get back to my desk.

Thanks again,

David Harris
Author
25 Aug 2006 3:58 PM
David Browne
Show quote
"Dave" <omgro***@gmail.com> wrote in message
news:1156518481.053690.166080@75g2000cwc.googlegroups.com...
> David Browne wrote:
>> I think you primarilly need help formulating a query to return this data
>> all
>> at once.  To have enough information to help you need to post table DDL
>> and
>> sample inserts, and describe the matrix loading process in some detail,
>> perhaps positing the C# definition of the type you are trying to load.
>
> Forgive me here, I'm not beside the data right now, so if you need
> actual INSERTs, then I can get that later. Also if you wish me to set
> up an example database (using SQL Server 2005), I'm willing to do that
> and send the backup via email. Basically, I have a 2d array of int's in
> C#, where I need the final values. It's declared as: int[,] maindata =
> new int[ totalacqs, totalpixels ]; where totalacqs is the total number
> of acquisitions (zero-based so no need for -1), and totalpixels is the
> length of each acquisition in data points. I'm thinking that if I can
> get a table back with the columns (x-location, y-location, value)
> (specifying index in the 2d array and the value), that will be a very
> quick single loop to do. But I'm open to whatever option someone wishes
> to provide.
>
> You should note:
> - All columns NOT NULL
> - GUID is my own special varchar identifier... totally inefficient but
> believe me it's the only way things work with our data warehousing
> - All relationships are obvious from the GUID columns
> - These tables are scaled down. Most of them hold 20 odd columns of
> other data, not relevant to the current setup
>


I think you really just need to join Surveys, Strip, Acquisition and
PSPCData, do a little arithmetic and order the results.  Then load the data
structure relying on the ordering of the query results.

Here's an example using the tables you described.  The Utils.Sequence
function is used to generate sample data.


David


create schema Utils
go

create FUNCTION [Utils].[Sequence](
  @from BIGINT,
  @to   BIGINT
) RETURNS @Sequence TABLE(Value int primary key)
AS
BEGIN

   WITH L0(c) AS (
    SELECT 1 UNION ALL SELECT 1
  ), L1(c) AS (
    SELECT 1 FROM L0 AS A, L0 AS B
  ), L2(c) AS(
    SELECT 1 FROM L1 AS A, L1 AS B
  ), L3(c) AS(
    SELECT 1 FROM L2 AS A, L2 AS B
  ), L4(c) AS(
    SELECT 1 FROM L3 AS A, L3 AS B
  ), L5(c) AS(
    SELECT 1 FROM L4 AS A, L4 AS B
  ), L6(c) AS(
    SELECT 1 FROM L5 AS A, L5 AS B
  ), Nums AS (
    SELECT
      ROW_NUMBER() OVER (ORDER BY c) AS n
    FROM L5 with (noexpand)
  )
    insert into @Sequence(Value)
    SELECT n+@from-1 AS Value
    FROM Nums
    WHERE n <= @to-@from+1

  RETURN
END

go

create table Surveys
(
  GUID varchar(50) primary key,
  SurveyName varchar(40) not null
)

create table Strip
(
  GUID varchar(55) not null primary key,
  SurveysGUID varchar(50) not null
    references Surveys,
  StripNumber int not null,
  Efficiency float not null
)

create table Acquisition
(
  GUID varchar(60) not null primary key,
  StripGUID varchar(55) not null
    references Strip,
  AcqNumber int not null,
  TimeMs int not null
)


create table PSPCData
(
  AcquisitionGUID varchar(60)
    references Acquisition,
  BinIndex  int not null,
  Data int,
  primary key (AcquisitionGUID, BinIndex)
)

go

insert into surveys (guid,surveyname)
values ('survey a', 'survey a')


insert into strip(GUID,SurveysGUID,StripNumber, Efficiency)
select
  s.GUID + '-' + cast(seq.value as varchar(10)),
  s.GUID,
  seq.Value,
  (100 - ((seq.Value * 13) % 100)) / 100.0 Efficiency
from Surveys s
cross apply utils.Sequence(0,99) seq

insert into Acquisition (GUID, StripGuid, AcqNumber, TimeMs)
select
   s.GUID + '-' + cast(seq.value as varchar(10)) GUID,
   s.GUID StripGuid,
   seq.Value AcqNumber,
   1+ ((seq.Value * 17) % 200) TimeMs
from Strip s
cross apply utils.Sequence(0,7 + s.StripNumber % 15)  seq
order by StripGuid, AcqNumber

insert into PSPCData(AcquisitionGUID,Binindex,Data)
select
  a.GUID AcquisitionGUID,
  seq.Value Binindex,
  1 + ((a.TimeMs * seq.Value * 27) % 87) Data
from Acquisition a
cross apply utils.Sequence(0,99) seq
order by AcquisitionGUID, BinIndex

go

select
  Surveys.SurveyName,
  Strip.StripNumber,
  d.BinIndex,
  60 * d.Data / (a.TimeMS/1000.0)  * Strip.Efficiency DPM
from Surveys
join Strip
  on Strip.SurveysGUID = Surveys.GUID
join Acquisition a
  on a.StripGuid = Strip.GUID
join PSPCData d
  on d.AcquisitionGUID = a.GUID
order by
  Strip.StripNumber,
  a.AcqNumber,
  d.BinIndex
Author
25 Aug 2006 5:50 PM
Dave
David Browne wrote:
> I think you really just need to join Surveys, Strip, Acquisition and
> PSPCData, do a little arithmetic and order the results.  Then load the data
> structure relying on the ordering of the query results.

Thank you David and Jim... I didn't even think it'd be easy as crafting
a good SELECT statement, but I now have this utility running in under a
minute on the slow handheld PC, and everyone is breathing sighs of
relief. :)

Appreciate your help very much!

David Harris

AddThis Social Bookmark Button