Home All Groups Group Topic Archive Search About

SQL Join question - getting most recent value in a history table

Author
27 Jul 2006 8:34 AM
Chris Strug
Hi,

My question concerns three tables:

- tblHeader - containing general information such as product ID, description
etc
    HeaderID
    HeaderDescription
    HeaderSerialNumber

lstLocations - lookup table containing location IDs
    LocID
    LocDescription

tblLocationHistory - a transaction table containing the location history of
each header and the date of their installation.
    HistoryID
    HeaderID
    LocID
    InstallDate

What I'm after is a select which shows each line in tblHeader and its most
recent location. However my first attempt (as you would expect from SQL)
returns every product and every location they have been in - e.g. Product X
has 7 locations - thus appears 7 times in the results from the below query.
Unfortunately I'm after the most recent location of each Header *only*.

I feel like this is something I should be able to do with my eyes closed but
I can't seem to crack it. Please could someone point me in the right
direction! Should I be looking at using UDF's for example?

Cheers

Chris.

*******************

SELECT
    dbo.tblHeader.HeaderID,
    dbo.tblHeader.Type,
    dbo.tblHeader.SerialNumber,
    dbo.lstLocations.Location_Name,
    dbo.tblLocationHistory.InstallDate
FROM         dbo.lstLocations INNER JOIN
                      dbo.tblLocationHistory ON dbo.lstLocations.Location_ID
= dbo.tblLocationHistory.LocationID INNER JOIN
                      dbo.tblHeader ON dbo.tblLocationHistory.HeaderID =
dbo.tblHeader.HeaderID

Author
27 Jul 2006 10:33 AM
Chris Lim
Chris Strug wrote:
> What I'm after is a select which shows each line in tblHeader and its most
> recent location.

You need a derived table that holds the most recent location for each
header. Something like the following, though there are probably better
ways (it's getting late..!):

SELECT  h.HeaderID,
        h.Type,
        h.SerialNumber,
        l.Location_Name,
        lh.InstallDate
FROM dbo.tblHeader h
INNER JOIN (    SELECT  l.HeaderID,
                        LocID = MAX(l.LocID)
                FROM    lstLocations l
                WHERE   l.InstallDate = (   SELECT  MAX(l2.InstallDate)
                                            FROM    lstLocations l2
                                            WHERE   l2.HeaderID =
l.HeaderID)
                GROUP BY
                        l.Header ID) lastInstall
    ON lastInstall.HeaderID = h.HeaderID
INNER JOIN tblLocationHistory lh
    ON  lh.HeaderID = lastInstall.HeaderID
    AND lh.LocID = lastInstall.LocID
INNER JOIN lstLocations l
    ON  l.LocID = lh.LocID

Chris
Author
27 Jul 2006 10:38 AM
Chris Lim
Chris Lim wrote:
> SELECT  h.HeaderID,

Correction:

SELECT  h.HeaderID,
        h.Type,
        h.SerialNumber,
        l.Location_Name,
        lh.InstallDate
FROM dbo.tblHeader h
INNER JOIN (    SELECT  l.HeaderID,
                        LocID = MAX(l.LocID)
                FROM    dbo.tblLocationHistory l
                WHERE   l.InstallDate = (   SELECT  MAX(l2.InstallDate)
                                            FROM
dbo.tblLocationHistory l2
                                            WHERE   l2.HeaderID =
l.HeaderID)
                GROUP BY
                        l.Header ID) lastInstall
    ON lastInstall.HeaderID = h.HeaderID
INNER JOIN dbo.tblLocationHistory lh
    ON  lh.HeaderID = lastInstall.HeaderID
    AND lh.LocID = lastInstall.LocID
INNER JOIN dbo.lstLocations l
    ON  l.LocID = lh.LocID
Author
27 Jul 2006 10:50 AM
Chris Strug
Chris,

Many thanks for your help - from my limited understanding of SQL, I think I
see what you're doing.

I was just wondering, would it be possible (even recommended!) to abstract
the "derived" select into another view?

Anyway, thank you once again for your kind help.

Cheers

Chris


Show quote
"Chris Lim" <blackca***@hotmail.com> wrote in message
news:1153996695.893580.87360@75g2000cwc.googlegroups.com...
> Chris Lim wrote:
>> SELECT  h.HeaderID,
>
> Correction:
>
> SELECT  h.HeaderID,
>        h.Type,
>        h.SerialNumber,
>        l.Location_Name,
>        lh.InstallDate
> FROM dbo.tblHeader h
> INNER JOIN (    SELECT  l.HeaderID,
>                        LocID = MAX(l.LocID)
>                FROM    dbo.tblLocationHistory l
>                WHERE   l.InstallDate = (   SELECT  MAX(l2.InstallDate)
>                                            FROM
> dbo.tblLocationHistory l2
>                                            WHERE   l2.HeaderID =
> l.HeaderID)
>                GROUP BY
>                        l.Header ID) lastInstall
>    ON lastInstall.HeaderID = h.HeaderID
> INNER JOIN dbo.tblLocationHistory lh
>    ON  lh.HeaderID = lastInstall.HeaderID
>    AND lh.LocID = lastInstall.LocID
> INNER JOIN dbo.lstLocations l
>    ON  l.LocID = lh.LocID
>
Author
27 Jul 2006 10:59 AM
Chris Lim
Chris Strug wrote:
> Many thanks for your help - from my limited understanding of SQL, I think I
> see what you're doing.
>
> I was just wondering, would it be possible (even recommended!) to abstract
> the "derived" select into another view?

Putting that derived table into a view could be useful. It really
depends if you'd use that bit of information elsewhere in your system.
If this is the only place where you're likely to use it, then creating
a view would be an unnecessary overhead IMO.

Chris
Author
27 Jul 2006 5:06 PM
Steve Kass
Chris,

If you happen to be using SQL Server 2005, queries like this
are very easy to write.  The most recent location is the one
that would be ranked #1 in decending InstallDate order for
a particular HeaderID, so you can do this:

SELECT
    HeaderID,
    Type,
    SerialNumber,
    Location_Name,
    InstallDate
FROM (
  SELECT
      dbo.tblHeader.HeaderID,
      dbo.tblHeader.Type,
      dbo.tblHeader.SerialNumber,
      dbo.lstLocations.Location_Name,
      dbo.tblLocationHistory.InstallDate,
      ROW_NUMBER() OVER (
        PARTITION BY dbo.tblHeader.HeaderID
        ORDER BY dbo.tblLocationHistory.InstallDate DESC
      ) AS rk
  FROM dbo.lstLocations INNER JOIN dbo.tblLocationHistory
  ON dbo.lstLocations.Location_ID = dbo.tblLocationHistory.LocationID
  INNER JOIN dbo.tblHeader
  ON dbo.tblLocationHistory.HeaderID = dbo.tblHeader.HeaderID
) AS RankedResults
WHERE rk = 1

Steve Kass
Drew University
www.stevekass.com


Chris Strug wrote:

Show quote
>Hi,
>
>My question concerns three tables:
>
>- tblHeader - containing general information such as product ID, description
>etc
>    HeaderID
>    HeaderDescription
>    HeaderSerialNumber
>
>lstLocations - lookup table containing location IDs
>    LocID
>    LocDescription
>
>tblLocationHistory - a transaction table containing the location history of
>each header and the date of their installation.
>    HistoryID
>    HeaderID
>    LocID
>    InstallDate
>
>What I'm after is a select which shows each line in tblHeader and its most
>recent location. However my first attempt (as you would expect from SQL)
>returns every product and every location they have been in - e.g. Product X
>has 7 locations - thus appears 7 times in the results from the below query.
>Unfortunately I'm after the most recent location of each Header *only*.
>
>I feel like this is something I should be able to do with my eyes closed but
>I can't seem to crack it. Please could someone point me in the right
>direction! Should I be looking at using UDF's for example?
>
>Cheers
>
>Chris.
>
>*******************
>
>SELECT
>    dbo.tblHeader.HeaderID,
>    dbo.tblHeader.Type,
>    dbo.tblHeader.SerialNumber,
>    dbo.lstLocations.Location_Name,
>    dbo.tblLocationHistory.InstallDate
>FROM         dbo.lstLocations INNER JOIN
>                      dbo.tblLocationHistory ON dbo.lstLocations.Location_ID
>= dbo.tblLocationHistory.LocationID INNER JOIN
>                      dbo.tblHeader ON dbo.tblLocationHistory.HeaderID =
>dbo.tblHeader.HeaderID
>
>
>

>
Author
27 Jul 2006 9:24 PM
Chris Lim
Steve Kass wrote:
> Chris,
>
> If you happen to be using SQL Server 2005, queries like this
> are very easy to write.  The most recent location is the one
> that would be ranked #1 in decending InstallDate order for
> a particular HeaderID, so you can do this:

Hmm, very useful to know! Currently still using 2000, but will keep in
mind for 2005.
Author
27 Jul 2006 9:27 PM
Chris Lim
Steve Kass wrote:
> FROM (
>   SELECT
>       dbo.tblHeader.HeaderID,
>       dbo.tblHeader.Type,
>       dbo.tblHeader.SerialNumber,
>       dbo.lstLocations.Location_Name,
>       dbo.tblLocationHistory.InstallDate,
>       ROW_NUMBER() OVER (
>         PARTITION BY dbo.tblHeader.HeaderID
>         ORDER BY dbo.tblLocationHistory.InstallDate DESC
>       ) AS rk

I'm trying to understand how this works. Do you not even need a FROM
clause?
Author
27 Jul 2006 9:38 PM
Steve Kass
You do need a from clause, but you snipped it out here.  The query I
posted was

SELECT
   HeaderID,
   Type,
   SerialNumber,
   Location_Name,
   InstallDate
FROM (
SELECT
     dbo.tblHeader.HeaderID,
     dbo.tblHeader.Type,
     dbo.tblHeader.SerialNumber,
     dbo.lstLocations.Location_Name,
     dbo.tblLocationHistory.InstallDate,
     ROW_NUMBER() OVER (
       PARTITION BY dbo.tblHeader.HeaderID
       ORDER BY dbo.tblLocationHistory.InstallDate DESC
     ) AS rk
FROM dbo.lstLocations INNER JOIN dbo.tblLocationHistory
ON dbo.lstLocations.Location_ID = dbo.tblLocationHistory.LocationID
INNER JOIN dbo.tblHeader
ON dbo.tblLocationHistory.HeaderID = dbo.tblHeader.HeaderID
) AS RankedResults
WHERE rk = 1

The outermost SELECT selects from a derived table, and the derived
table selects from the tables of the original query.

What would be more readable, but which the syntax of the language
doesn't allow, is

SELECT
     dbo.tblHeader.HeaderID,
     dbo.tblHeader.Type,
     dbo.tblHeader.SerialNumber,
     dbo.lstLocations.Location_Name,
     dbo.tblLocationHistory.InstallDate
FROM dbo.lstLocations INNER JOIN dbo.tblLocationHistory
ON dbo.lstLocations.Location_ID = dbo.tblLocationHistory.LocationID
INNER JOIN dbo.tblHeader
ON dbo.tblLocationHistory.HeaderID = dbo.tblHeader.HeaderID
WHERE ROW_NUMBER() OVER (
       PARTITION BY dbo.tblHeader.HeaderID
       ORDER BY dbo.tblLocationHistory.InstallDate DESC
     ) = 1

If allowed, this would be the same thing, but the ROW_NUMBER()
calculation can only appear in SELECT and ORDER BY clauses.

There is a sneaky version that you can write with less typing:

SELECT TOP 1 WITH TIES
     dbo.tblHeader.HeaderID,
     dbo.tblHeader.Type,
     dbo.tblHeader.SerialNumber,
     dbo.lstLocations.Location_Name,
     dbo.tblLocationHistory.InstallDate
FROM dbo.lstLocations INNER JOIN dbo.tblLocationHistory
ON dbo.lstLocations.Location_ID = dbo.tblLocationHistory.LocationID
INNER JOIN dbo.tblHeader
ON dbo.tblLocationHistory.HeaderID = dbo.tblHeader.HeaderID
ORDER BY ROW_NUMBER() OVER (
       PARTITION BY dbo.tblHeader.HeaderID
       ORDER BY dbo.tblLocationHistory.InstallDate DESC
     )

This is less readable, I think.

SK


Chris Lim wrote:

Show quote
>Steve Kass wrote:

>
>>FROM (
>>  SELECT
>>      dbo.tblHeader.HeaderID,
>>      dbo.tblHeader.Type,
>>      dbo.tblHeader.SerialNumber,
>>      dbo.lstLocations.Location_Name,
>>      dbo.tblLocationHistory.InstallDate,
>>      ROW_NUMBER() OVER (
>>        PARTITION BY dbo.tblHeader.HeaderID
>>        ORDER BY dbo.tblLocationHistory.InstallDate DESC
>>      ) AS rk
>>   
>>
>
>I'm trying to understand how this works. Do you not even need a FROM
>clause?
>

>
Author
27 Jul 2006 9:51 PM
Chris Lim
Steve Kass wrote:
> The outermost SELECT selects from a derived table, and the derived
> table selects from the tables of the original query.

Oh I see. I mismatched the brackets when I was reading which is why I
got myself confused!

So the bit that SQL Server 2005 provides is:

>      ROW_NUMBER() OVER (
>        PARTITION BY dbo.tblHeader.HeaderID
>        ORDER BY dbo.tblLocationHistory.InstallDate DESC
>      ) AS rk

I can see myself using that a bit. Saves having to do sub-queries or
additional dervied tables to do the ranking!

Chris
Author
27 Jul 2006 5:35 PM
Alexander Kuznetsov
Chris,

In cases like this, you might go for a little denormalization and add a
column NumberOfPreviousVersions. Inserting a history row, setnit to 0
and up it for all the previous versions. That done, you can quickly
search on

NumberOfPreviousVersions = 0,

which is usually much faster and simpler than subqueries or OLAP
functions.
Author
27 Jul 2006 11:43 PM
--CELKO--
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. Sample data is also a good idea, along with clear
specifications.  It is very hard to debug code when you do not let us
see it.

You also need to learn ISO-11179 rules so you will stop using those
silly redundant affixes that describe physical implementations in an
RDBMS.  It makes you look like a BASIC programmer.  Also, tables are
sets which usually have more than one element, so you need to use
collecitve or plural names.

The proper way to model temporal data is with duration, not chronons
(google it).  Try this:

CREATE TABLE LocationHistory
(header_id INTEGER NOT NULL
  REFERENCES Headers (header_id),
loc_id INTEGER NOT NULL
  REFERENCES Locations (loc_id),
start_date DATETIME NOT NULL.
end_date DATETIME,  -- null is current location
PRIMARY KEY (header_id,  loc_id, start_date),
etc.);

>> What I'm after is a select which shows each line in Headers and its most recent location. <<

Look for "end_date IS NULL" in your query.  Proper DDL means much
easier coding.
Author
28 Jul 2006 9:48 AM
Chris Strug
All,

Just a note to say thank you for everyone's help - its certainly cleared up
my question and given me a to consider for future work.

Thanks again.

Chris.

AddThis Social Bookmark Button