Home All Groups Group Topic Archive Search About

Somewhat complicated GROUP behavior that has me stumped

Author
2 Sep 2005 4:16 AM
Stefan Wrobel
I'm trying to group results in a database in a way that I've never done
before and it's got me stumped.  Let me first say that I cannot redesign this
database at all, it is an off-the-shelf product that my company has
purchased, and I have to deal with the structure.  That said, I'm trying to
do something along the folowing lines:

Assuming that I have a table that looks like this:
AccountID    TransData    TransID
1            10            101
1            12            102

Basically the table stores a series of transactions that can be revised, and
if a revision is entered, the old data stays and a new entry for the same
AccountID is logged, but with a higher TransID.  Please let me know if you're
confused about what's going on.  Basically, I need a query that returns the
TransData for the highest TransID for each AccountID.  If this makes any
sense, I need to group by AccountID, and return the TransData column for the
row that has the highest TransID.  I've come up with something that
concatenates the TransData and TransID and takes the Max of that, and then
takes a substring of that to get back to just the TransData, but that seems
like a goofy, overcomplicated, and inefficient way to do it.  Anyone out
there know how to do this elegantly?  Thanks a ton!

Author
2 Sep 2005 4:36 AM
Steve Kass
Stefan,

This is a pretty common question, and the answer isn't obvious
until you've seen it.  At least for most people...  Your MAX
solution is an excellent idea, and in fact it can be the best
solution. (Be sure to use COALESCE on any nullable columns
in the mix.)  I think it can even be optimized by putting an index on the
concatenation (as a computed column), but it's late and I'm too
tired to check now.

A more common solution to your question  is this:

select
  AccountID, TransData
from T
where TransID = (
  select max(TransID) from T as Tcopy
  where Tcopy.AccountID = T.AccountID
)

or similarly,

select
  AccountID, TransData
from T
where not exists (
  select * from T as Tcopy
  where Tcopy.AccountID = T.AccountID
  and Tcopy.TransID > T.TransID
)

The SQL Server engine usually optimizes these well if
there are supporting indexes, but another thing to try if
you find otherwise is

select
  AccountID, TransData
from T
where TransID = (
  select top 1 TransID from T as Tcopy
  where Tcopy.AccountID = T.AccountID
  order by TransID desc
)

I suspect the nullability of TransID affects potential
query plans.

The nice thing about your "carry along" solution is that
it is expressed more clearly as an aggregate, which is
the natural way to think of the problem.

SK

Stefan Wrobel wrote:

Show quote
>I'm trying to group results in a database in a way that I've never done
>before and it's got me stumped.  Let me first say that I cannot redesign this
>database at all, it is an off-the-shelf product that my company has
>purchased, and I have to deal with the structure.  That said, I'm trying to
>do something along the folowing lines:
>
>Assuming that I have a table that looks like this:
>AccountID    TransData    TransID
>1            10            101
>1            12            102
>
>Basically the table stores a series of transactions that can be revised, and
>if a revision is entered, the old data stays and a new entry for the same
>AccountID is logged, but with a higher TransID.  Please let me know if you're
>confused about what's going on.  Basically, I need a query that returns the
>TransData for the highest TransID for each AccountID.  If this makes any
>sense, I need to group by AccountID, and return the TransData column for the
>row that has the highest TransID.  I've come up with something that
>concatenates the TransData and TransID and takes the Max of that, and then
>takes a substring of that to get back to just the TransData, but that seems
>like a goofy, overcomplicated, and inefficient way to do it.  Anyone out
>there know how to do this elegantly?  Thanks a ton!

>
Author
2 Sep 2005 4:40 AM
Steve Kass
Stefan,

I just noticed your "can't redesign" comment, so I'll add that
the optimization I mentioned might still be possible via an indexed
view.  If you create a view over this table with one extra (computed)
column defined by the concatenation, then put a clustered index
on (AccountID, newColumn), you might get lucky and have
the optimizer see how to use the index, even if your query is
against the table, in the case of SQL Server Enterprise Edition.
For Standard, you would have write the query directly against
the view, and perhaps even add the NOEXPAND query hint.
You also might get unlucky and have this materialized view cause
too much extra overhead to be worth anything.

SK

Stefan Wrobel wrote:

Show quote
>I'm trying to group results in a database in a way that I've never done
>before and it's got me stumped.  Let me first say that I cannot redesign this
>database at all, it is an off-the-shelf product that my company has
>purchased, and I have to deal with the structure.  That said, I'm trying to
>do something along the folowing lines:
>
>Assuming that I have a table that looks like this:
>AccountID    TransData    TransID
>1            10            101
>1            12            102
>
>Basically the table stores a series of transactions that can be revised, and
>if a revision is entered, the old data stays and a new entry for the same
>AccountID is logged, but with a higher TransID.  Please let me know if you're
>confused about what's going on.  Basically, I need a query that returns the
>TransData for the highest TransID for each AccountID.  If this makes any
>sense, I need to group by AccountID, and return the TransData column for the
>row that has the highest TransID.  I've come up with something that
>concatenates the TransData and TransID and takes the Max of that, and then
>takes a substring of that to get back to just the TransData, but that seems
>like a goofy, overcomplicated, and inefficient way to do it.  Anyone out
>there know how to do this elegantly?  Thanks a ton!

>
Author
2 Sep 2005 5:03 AM
R.D
Hi Stefan
Try this
SELECT A.AccountID,A.TransData,A.TransID  FROM TRANS A where A.TransID IN
(SELECT Max(TransID) FROM TRANS GROUP BY AccountId)

trans is your table name here

Regards
R.D


Show quote
"Stefan Wrobel" wrote:

> I'm trying to group results in a database in a way that I've never done
> before and it's got me stumped.  Let me first say that I cannot redesign this
> database at all, it is an off-the-shelf product that my company has
> purchased, and I have to deal with the structure.  That said, I'm trying to
> do something along the folowing lines:
>
> Assuming that I have a table that looks like this:
> AccountID    TransData    TransID
> 1            10            101
> 1            12            102
>
> Basically the table stores a series of transactions that can be revised, and
> if a revision is entered, the old data stays and a new entry for the same
> AccountID is logged, but with a higher TransID.  Please let me know if you're
> confused about what's going on.  Basically, I need a query that returns the
> TransData for the highest TransID for each AccountID.  If this makes any
> sense, I need to group by AccountID, and return the TransData column for the
> row that has the highest TransID.  I've come up with something that
> concatenates the TransData and TransID and takes the Max of that, and then
> takes a substring of that to get back to just the TransData, but that seems
> like a goofy, overcomplicated, and inefficient way to do it.  Anyone out
> there know how to do this elegantly?  Thanks a ton!
Author
2 Sep 2005 7:24 PM
Stefan Wrobel
Thanks to both of you for your help solving this.  I ended up using the
solution that Steve had proposed and R.D. also proposed.  The execution time
was basically equal for that and my aggregate method, but the query is just
much more readable without all of the crazy concatenation and substring
stuff.  It seems like a straightforward thing that would have a more elegant
solution, but there are obviously a million ways to get the job done!  Thanks
a ton to both of you!

Show quote
"R.D" wrote:

> Hi Stefan
> Try this
> SELECT A.AccountID,A.TransData,A.TransID  FROM TRANS A where A.TransID IN
> (SELECT Max(TransID) FROM TRANS GROUP BY AccountId)
>
> trans is your table name here
>
> Regards
> R.D
>
>
> "Stefan Wrobel" wrote:
>
> > I'm trying to group results in a database in a way that I've never done
> > before and it's got me stumped.  Let me first say that I cannot redesign this
> > database at all, it is an off-the-shelf product that my company has
> > purchased, and I have to deal with the structure.  That said, I'm trying to
> > do something along the folowing lines:
> >
> > Assuming that I have a table that looks like this:
> > AccountID    TransData    TransID
> > 1            10            101
> > 1            12            102
> >
> > Basically the table stores a series of transactions that can be revised, and
> > if a revision is entered, the old data stays and a new entry for the same
> > AccountID is logged, but with a higher TransID.  Please let me know if you're
> > confused about what's going on.  Basically, I need a query that returns the
> > TransData for the highest TransID for each AccountID.  If this makes any
> > sense, I need to group by AccountID, and return the TransData column for the
> > row that has the highest TransID.  I've come up with something that
> > concatenates the TransData and TransID and takes the Max of that, and then
> > takes a substring of that to get back to just the TransData, but that seems
> > like a goofy, overcomplicated, and inefficient way to do it.  Anyone out
> > there know how to do this elegantly?  Thanks a ton!
Author
2 Sep 2005 8:30 PM
Steve Kass
Stefan,

In SQL Server 2005, and on products that support analytic
functions, you can do this:

select AccountID, TransData, TransID from (
  select
    AccountID,
    TransData,
    TransID,
    rank() over (partition by AccountID order by TransID desc) as rk
  from T
) as T
where rk = 1

(Windowed functions cannot appear in the WHERE clause in SQL
Server 2005, but I don't know if that's a standard limitation.)

There may be a cleaner solution, too, but I'm new to these
features.

SK


Stefan Wrobel wrote:

Show quote
>Thanks to both of you for your help solving this.  I ended up using the
>solution that Steve had proposed and R.D. also proposed.  The execution time
>was basically equal for that and my aggregate method, but the query is just
>much more readable without all of the crazy concatenation and substring
>stuff.  It seems like a straightforward thing that would have a more elegant
>solution, but there are obviously a million ways to get the job done!  Thanks
>a ton to both of you!
>
>"R.D" wrote:
>

>
>>Hi Stefan
>>Try this
>>SELECT A.AccountID,A.TransData,A.TransID  FROM TRANS A where A.TransID IN
>>(SELECT Max(TransID) FROM TRANS GROUP BY AccountId)
>>
>>trans is your table name here
>>
>>Regards
>>R.D
>>
>>
>>"Stefan Wrobel" wrote:
>>
>>   
>>
>>>I'm trying to group results in a database in a way that I've never done
>>>before and it's got me stumped.  Let me first say that I cannot redesign this
>>>database at all, it is an off-the-shelf product that my company has
>>>purchased, and I have to deal with the structure.  That said, I'm trying to
>>>do something along the folowing lines:
>>>
>>>Assuming that I have a table that looks like this:
>>>AccountID    TransData    TransID
>>>1            10            101
>>>1            12            102
>>>
>>>Basically the table stores a series of transactions that can be revised, and
>>>if a revision is entered, the old data stays and a new entry for the same
>>>AccountID is logged, but with a higher TransID.  Please let me know if you're
>>>confused about what's going on.  Basically, I need a query that returns the
>>>TransData for the highest TransID for each AccountID.  If this makes any
>>>sense, I need to group by AccountID, and return the TransData column for the
>>>row that has the highest TransID.  I've come up with something that
>>>concatenates the TransData and TransID and takes the Max of that, and then
>>>takes a substring of that to get back to just the TransData, but that seems
>>>like a goofy, overcomplicated, and inefficient way to do it.  Anyone out
>>>there know how to do this elegantly?  Thanks a ton!
>>>     
>>>

AddThis Social Bookmark Button