Home All Groups Group Topic Archive Search About

How can I calculate a running YTD sum

Author
18 Aug 2006 4:05 PM
Arthur Dent
I could use some guidance here.  I would like to know how I can
calculate a running YTD sum value in the following scenario:


-- START CREATE EXAMPLE

CREATE TABLE dbo.tblTestData
    (
    generic_id int NOT NULL IDENTITY (1, 1),
    data_date smalldatetime NOT NULL,
    metric_value int NOT NULL
    ) ON [PRIMARY]
GO

ALTER TABLE dbo.tblTestData ADD CONSTRAINT
    PK_tblTestData PRIMARY KEY CLUSTERED
    (
    generic_id
    ) ON [PRIMARY]
GO

DECLARE @data_date AS SMALLDATETIME
SET @data_date = '1/1/2005'

WHILE @data_date <= GETDATE()
BEGIN

    INSERT INTO dbo.tblTestData (data_date, metric_value)
    VALUES (@data_date, CAST(RAND() * 100 AS INT))

    SET @data_date = DATEADD(mm, 1, @data_date)

END

-- END CREATE EXAMPLE

This creates a table with some bogus data by month.  Something like
this:

generic_id  data_date               metric_value
----------- ----------------------- ------------
1           2005-01-01 00:00:00     82
2           2005-02-01 00:00:00     95
3           2005-03-01 00:00:00     79
4           2005-04-01 00:00:00     65
5           2005-05-01 00:00:00     19
6           2005-06-01 00:00:00     84
7           2005-07-01 00:00:00     54
8           2005-08-01 00:00:00     47
9           2005-09-01 00:00:00     28
10          2005-10-01 00:00:00     52
11          2005-11-01 00:00:00     59
12          2005-12-01 00:00:00     69
13          2006-01-01 00:00:00     35
14          2006-02-01 00:00:00     6
15          2006-03-01 00:00:00     50
16          2006-04-01 00:00:00     45
17          2006-05-01 00:00:00     31
18          2006-06-01 00:00:00     78
19          2006-07-01 00:00:00     35
20          2006-08-01 00:00:00     16

I want to query that table calculating a new column, call it
"metric_value_ytd_sum".  It would add up the metric_value numbers each
month starting at the beginning of the given year up to the month of
the current row.  Using the example output above, the output might be
as follows:

generic_id  data_date               metric_value  metric_value_ytd_sum
----------- ----------------------- ------------- --------------------
1           2005-01-01 00:00:00     82            82
2           2005-02-01 00:00:00     95            177
3           2005-03-01 00:00:00     79            256
..
..
..

Also it's important to note that as the data moves from one year to the
next the YTD calculation needs to be sensative to this (it shouldn't do
a running sum on the whole dataset, probably some group by year
somewhere?)

Anyway, I get the feeling this is the type of thing I'll feel silly
about asking once I understand a solution, but just the same I've been
unable to figure it out so far and would appreciate any assistance.

Thanks!

Author
18 Aug 2006 4:15 PM
Aaron Bertrand [SQL Server MVP]
> I want to query that table calculating a new column, call it
> "metric_value_ytd_sum".

Why do you need to create a new column that has to be maintained constantly,
when you can always generate this new column at query time?  Something like
this (untested):

SELECT generic_id,
    data_date,
    metric_value,
    ytd_sum = (SELECT SUM(metric_value) FROM table
        WHERE data_date < outer.data_date
        AND data_date >= DATEADD(MONTH, 1-MONTH(outer.data_date),
        DATEADD(DAY, 1-DAY(outer.data_date), outer.data_date))
    )
FROM
    table outer
ORDER BY generic_id;


I strongly recommend against storing this data, mostly because it's
redundant information.



  It would add up the metric_value numbers each
Show quote
> month starting at the beginning of the given year up to the month of
> the current row.  Using the example output above, the output might be
> as follows:
>
> generic_id  data_date               metric_value  metric_value_ytd_sum
> ----------- ----------------------- ------------- --------------------
> 1           2005-01-01 00:00:00     82            82
> 2           2005-02-01 00:00:00     95            177
> 3           2005-03-01 00:00:00     79            256
> .
> .
> .
>
> Also it's important to note that as the data moves from one year to the
> next the YTD calculation needs to be sensative to this (it shouldn't do
> a running sum on the whole dataset, probably some group by year
> somewhere?)
>
> Anyway, I get the feeling this is the type of thing I'll feel silly
> about asking once I understand a solution, but just the same I've been
> unable to figure it out so far and would appreciate any assistance.
>
> Thanks!
>
Author
18 Aug 2006 4:19 PM
Aaron Bertrand [SQL Server MVP]
>        WHERE data_date < outer.data_date

Sorry, should be <=
Author
18 Aug 2006 4:34 PM
Arthur Dent
Perfect, thanks again!

Aaron Bertrand [SQL Server MVP] wrote:
Show quote
> >        WHERE data_date < outer.data_date
>
> Sorry, should be <=
Author
18 Aug 2006 4:32 PM
Arthur Dent
You're absolutely correct, I didn't mean to imply that I wanted to
store it as another column in the table, but rather have the value
calculated in the query at runtime as you state.

Your example is exactly what I was looking for.  The results were
off-set by a month, but I adjusted by making the lower bound "<="
instead of "<".  Also my db engine didn't like the alias "outer" so I
changed it:

SELECT generic_id, data_date, metric_value,
     (SELECT     SUM(metric_value) AS metric_value_total
     FROM tblTestData
     WHERE (data_date <= outerTable.data_date) AND
          (data_date >= DATEADD(MONTH, 1 - MONTH(outerTable.data_date),

          DATEADD(DAY, 1 - DAY(outerTable.data_date),
outerTable.data_date)))
     ) AS ytd_sum
FROM tblTestData AS outerTable
ORDER BY generic_id

Thanks for being so helpful!

Aaron Bertrand [SQL Server MVP] wrote:
Show quote
> > I want to query that table calculating a new column, call it
> > "metric_value_ytd_sum".
>
> Why do you need to create a new column that has to be maintained constantly,
> when you can always generate this new column at query time?  Something like
> this (untested):
>
> SELECT generic_id,
>     data_date,
>     metric_value,
>     ytd_sum = (SELECT SUM(metric_value) FROM table
>         WHERE data_date < outer.data_date
>         AND data_date >= DATEADD(MONTH, 1-MONTH(outer.data_date),
>         DATEADD(DAY, 1-DAY(outer.data_date), outer.data_date))
>     )
> FROM
>     table outer
> ORDER BY generic_id;
>
>
> I strongly recommend against storing this data, mostly because it's
> redundant information.
>
>
>
>   It would add up the metric_value numbers each
> > month starting at the beginning of the given year up to the month of
> > the current row.  Using the example output above, the output might be
> > as follows:
> >
> > generic_id  data_date               metric_value  metric_value_ytd_sum
> > ----------- ----------------------- ------------- --------------------
> > 1           2005-01-01 00:00:00     82            82
> > 2           2005-02-01 00:00:00     95            177
> > 3           2005-03-01 00:00:00     79            256
> > .
> > .
> > .
> >
> > Also it's important to note that as the data moves from one year to the
> > next the YTD calculation needs to be sensative to this (it shouldn't do
> > a running sum on the whole dataset, probably some group by year
> > somewhere?)
> >
> > Anyway, I get the feeling this is the type of thing I'll feel silly
> > about asking once I understand a solution, but just the same I've been
> > unable to figure it out so far and would appreciate any assistance.
> >
> > Thanks!
> >
Author
18 Aug 2006 4:22 PM
lord.zoltar
Arthur Dent wrote:
Show quote
> I want to query that table calculating a new column, call it
> "metric_value_ytd_sum".  It would add up the metric_value numbers each
> month starting at the beginning of the given year up to the month of
> the current row.  Using the example output above, the output might be
> as follows:
>
> generic_id  data_date               metric_value  metric_value_ytd_sum
> ----------- ----------------------- ------------- --------------------
> 1           2005-01-01 00:00:00     82            82
> 2           2005-02-01 00:00:00     95            177
> 3           2005-03-01 00:00:00     79            256
> .
> .
> .
>
> Also it's important to note that as the data moves from one year to the
> next the YTD calculation needs to be sensative to this (it shouldn't do
> a running sum on the whole dataset, probably some group by year
> somewhere?)
>
> Anyway, I get the feeling this is the type of thing I'll feel silly
> about asking once I understand a solution, but just the same I've been
> unable to figure it out so far and would appreciate any assistance.
>
> Thanks!


hmm...

I don't have access to a system to test on right now, maybe this will
work (lol! I'm not even 100% sure the syntax is legit, but it's an
interesting problem so I'm posting anyway! ;) ) ?

SELECT generic_id AS generic_id_0,
    data_date AS data_date_0,
    metric_value AS metric_value_0,
    metric_value_ytd_sum AS SUM(tmp.metric_value_1)
FROM tableWhoseNameIForgot,
    (SELECT generic_id AS generic_id_1,
        data_date AS data_date_1,
        metric_value AS metric_value_1
    FROM tableWhoseNameIForgot
    WHERE data_date_1 <= data_date_0
    ) AS tmp

I've seen this type of thing before in reporting tools, but those had
been implemented with loops in code that calculated the ytd sums and
then stored those in a different table which was used to generate
reports, the source tables were not used directly for reports. I was
told it was faster that way, but I'm doubtful that is true for much of
that tool in question...
Author
18 Aug 2006 4:42 PM
Arthur Dent
Whoa Lord Zoltar!   (just sounds funny to say)  I'm not sure I follow
(syntax is doesn't parse as you thought it wouldn't) but it looks like
you're thinking along the lines of a derrived table rather than a
subquery in the select clause as Aaron's solution suggested.  I'd be
curious to see if that way may work too... might come in handy as this
problem becomes more complex (multipe YTD calucalations going on with
different columns).  Ideas?

lord.zol***@gmail.com wrote:
Show quote
> Arthur Dent wrote:
> > I want to query that table calculating a new column, call it
> > "metric_value_ytd_sum".  It would add up the metric_value numbers each
> > month starting at the beginning of the given year up to the month of
> > the current row.  Using the example output above, the output might be
> > as follows:
> >
> > generic_id  data_date               metric_value  metric_value_ytd_sum
> > ----------- ----------------------- ------------- --------------------
> > 1           2005-01-01 00:00:00     82            82
> > 2           2005-02-01 00:00:00     95            177
> > 3           2005-03-01 00:00:00     79            256
> > .
> > .
> > .
> >
> > Also it's important to note that as the data moves from one year to the
> > next the YTD calculation needs to be sensative to this (it shouldn't do
> > a running sum on the whole dataset, probably some group by year
> > somewhere?)
> >
> > Anyway, I get the feeling this is the type of thing I'll feel silly
> > about asking once I understand a solution, but just the same I've been
> > unable to figure it out so far and would appreciate any assistance.
> >
> > Thanks!
>
>
> hmm...
>
> I don't have access to a system to test on right now, maybe this will
> work (lol! I'm not even 100% sure the syntax is legit, but it's an
> interesting problem so I'm posting anyway! ;) ) ?
>
> SELECT generic_id AS generic_id_0,
>     data_date AS data_date_0,
>     metric_value AS metric_value_0,
>     metric_value_ytd_sum AS SUM(tmp.metric_value_1)
> FROM tableWhoseNameIForgot,
>     (SELECT generic_id AS generic_id_1,
>         data_date AS data_date_1,
>         metric_value AS metric_value_1
>     FROM tableWhoseNameIForgot
>     WHERE data_date_1 <= data_date_0
>     ) AS tmp
>
> I've seen this type of thing before in reporting tools, but those had
> been implemented with loops in code that calculated the ytd sums and
> then stored those in a different table which was used to generate
> reports, the source tables were not used directly for reports. I was
> told it was faster that way, but I'm doubtful that is true for much of
> that tool in question...

AddThis Social Bookmark Button