Home All Groups Group Topic Archive Search About

merge two date ranges with overlap from two tables

Author
20 Jul 2006 3:28 PM
Jason
Hi everyone:

   I have two tables:

pat01 record patient status1 from date to date.

patientID       startdate             enddate          status1
1                2000-01-01           2000-01-25      Good
1               2000-01-26           2000-02-25       not so Good
1               2000-02-26            2000-03-01      BAD
2               2000-01-01            2000-04-01      BAD


and another table pat02 record patient status2 from date to date



patientID       startdate2        enddate2          status2
1               2000-01-01         2000-01-10      CHECKED
1               2000-01-11        2000-02-01       UNCHECKED


Please note that there are overlap between date range in two tables for
the same patient.Now I want to merge those two tables into one:



PATIENTID  STARTDATE      ENDDATE       STATUS1        STATUS2
1           2000-01-01    2000-01-10                 Good
    CHECKED
1           2000-01-11    2000-01-25                 Good
    UNCHECKED
1           2000-01-26    2000-02-01             NOT SO GOOD
unchecked
1           2000-02-02    2000-02-25                not so good
null
1           2000-02-26    2000-03-01                   BAD
null
2           2000-01-01    2000-04-01                     BAD
null


Both table have about 1 million rows, more than 10 years data. I know I
can use number table  to create row for each patient each day then
merge them but I am worry about the performance. Does anyone have other
ideas?


Thanks


DDL:


create table pat01
(PatientID int,
StartDate datetime,
EndDate datetime,
Status1 varchar(20)
)

insert into pat01 values (1, '2000-01-01','2000-01-25', 'Good')
insert into pat01 values (1, '2000-01-26','2000-02-25', 'not so Good')
insert into pat01 values (1, '2000-02-26','2000-03-01', 'BAD')
insert into pat01 values (2, '2000-01-01','2000-04-01', 'BAD')

create table pat02
(PatientID int,
StartDate2 datetime,
EndDate2 datetime,
Status2 varchar(20)
)

insert into pat01 values (1, '2000-01-01','2000-01-10', 'CHECKED')
insert into pat01 values (1, '2000-01-11','2000-02-01', 'UNCHECKED')

Author
20 Jul 2006 3:56 PM
Alexander Kuznetsov
In SQL Puzzles, Puzzle #3, Anesthesia, might give you a clue.
Author
20 Jul 2006 4:01 PM
Roy Harvey
Thanks for the table definitions and test data.  I did have to fix the
second set of INSERTS to point to the second table.

The use of a view is optional, but I thought it made it easier to read
if nothing else.  My first version used a view to gather the start
dates and another to gather the end dates, but I rolled them into the
code of the view that remained.

CREATE VIEW PatientRanges
AS
SELECT PatientID, StartDate,
       (select min(EndDate)
          from (select PatientID, EndDate from pat01
                union
                select PatientID, EndDate2 FROM pat02) as E
         where S.StartDate < E.EndDate) as EndDate
  FROM (select PatientID, StartDate from pat01
        union
        select PatientID, StartDate2 from pat02) as S
GO

SELECT R.PatientId, R.StartDate, R.EndDate,
       A.Status1, B.Status2
  FROM PatientRanges as R
  LEFT OUTER --???
  JOIN pat01 as A
    on R.PatientID = A.PatientID
   and R.StartDate BETWEEN A.StartDate and A.EndDate
  LEFT OUTER --???
  JOIN pat02 as B
    on R.PatientID = B.PatientID
   and R.StartDate BETWEEN B.StartDate2 and B.EndDate2

Roy Harvey
Beacon Falls, CT

Show quote
On 20 Jul 2006 08:28:14 -0700, "Jason" <mailtoja***@gmail.com> wrote:

>Hi everyone:
>
>   I have two tables:
>
>pat01 record patient status1 from date to date.
>
>patientID       startdate             enddate          status1
>1                2000-01-01           2000-01-25      Good
>1               2000-01-26           2000-02-25       not so Good
>1               2000-02-26            2000-03-01      BAD
>2               2000-01-01            2000-04-01      BAD
>
>
>and another table pat02 record patient status2 from date to date
>
>
>
>patientID       startdate2        enddate2          status2
>1               2000-01-01         2000-01-10      CHECKED
>1               2000-01-11        2000-02-01       UNCHECKED
>
>
>Please note that there are overlap between date range in two tables for
>the same patient.Now I want to merge those two tables into one:
>
>
>
>PATIENTID  STARTDATE      ENDDATE       STATUS1        STATUS2
>1           2000-01-01    2000-01-10                 Good
>    CHECKED
>1           2000-01-11    2000-01-25                 Good
>    UNCHECKED
>1           2000-01-26    2000-02-01             NOT SO GOOD
>unchecked
>1           2000-02-02    2000-02-25                not so good
>null
>1           2000-02-26    2000-03-01                   BAD
> null
>2           2000-01-01    2000-04-01                     BAD
> null
>
>
>Both table have about 1 million rows, more than 10 years data. I know I
>can use number table  to create row for each patient each day then
>merge them but I am worry about the performance. Does anyone have other
>ideas?
>
>
>Thanks
>
>
>DDL:
>
>
>create table pat01
>(PatientID int,
> StartDate datetime,
> EndDate datetime,
> Status1 varchar(20)
>)
>
>insert into pat01 values (1, '2000-01-01','2000-01-25', 'Good')
>insert into pat01 values (1, '2000-01-26','2000-02-25', 'not so Good')
>insert into pat01 values (1, '2000-02-26','2000-03-01', 'BAD')
>insert into pat01 values (2, '2000-01-01','2000-04-01', 'BAD')
>
>create table pat02
>(PatientID int,
> StartDate2 datetime,
> EndDate2 datetime,
> Status2 varchar(20)
>)
>
>insert into pat01 values (1, '2000-01-01','2000-01-10', 'CHECKED')
>insert into pat01 values (1, '2000-01-11','2000-02-01', 'UNCHECKED')
Author
20 Jul 2006 4:29 PM
Jason
Roy:
      Thanks for the quick reply. You solution missed one row:

1           2000-02-02    2000-02-25    not so good     null


However, I can fix that by add row into pat02

insert into pat02 values (1, '2000-02-02','2078-06-06', null)

Thanks a lot for your help.

Jason



Roy Harvey wrote:
Show quote
> Thanks for the table definitions and test data.  I did have to fix the
> second set of INSERTS to point to the second table.
>
> The use of a view is optional, but I thought it made it easier to read
> if nothing else.  My first version used a view to gather the start
> dates and another to gather the end dates, but I rolled them into the
> code of the view that remained.
>
> CREATE VIEW PatientRanges
> AS
> SELECT PatientID, StartDate,
>        (select min(EndDate)
>           from (select PatientID, EndDate from pat01
>                 union
>                 select PatientID, EndDate2 FROM pat02) as E
>          where S.StartDate < E.EndDate) as EndDate
>   FROM (select PatientID, StartDate from pat01
>         union
>         select PatientID, StartDate2 from pat02) as S
> GO
>
> SELECT R.PatientId, R.StartDate, R.EndDate,
>        A.Status1, B.Status2
>   FROM PatientRanges as R
>   LEFT OUTER --???
>   JOIN pat01 as A
>     on R.PatientID = A.PatientID
>    and R.StartDate BETWEEN A.StartDate and A.EndDate
>   LEFT OUTER --???
>   JOIN pat02 as B
>     on R.PatientID = B.PatientID
>    and R.StartDate BETWEEN B.StartDate2 and B.EndDate2
>
> Roy Harvey
> Beacon Falls, CT
>
> On 20 Jul 2006 08:28:14 -0700, "Jason" <mailtoja***@gmail.com> wrote:
>
> >Hi everyone:
> >
> >   I have two tables:
> >
> >pat01 record patient status1 from date to date.
> >
> >patientID       startdate             enddate          status1
> >1                2000-01-01           2000-01-25      Good
> >1               2000-01-26           2000-02-25       not so Good
> >1               2000-02-26            2000-03-01      BAD
> >2               2000-01-01            2000-04-01      BAD
> >
> >
> >and another table pat02 record patient status2 from date to date
> >
> >
> >
> >patientID       startdate2        enddate2          status2
> >1               2000-01-01         2000-01-10      CHECKED
> >1               2000-01-11        2000-02-01       UNCHECKED
> >
> >
> >Please note that there are overlap between date range in two tables for
> >the same patient.Now I want to merge those two tables into one:
> >
> >
> >
> >PATIENTID  STARTDATE      ENDDATE       STATUS1        STATUS2
> >1           2000-01-01    2000-01-10                 Good
> >    CHECKED
> >1           2000-01-11    2000-01-25                 Good
> >    UNCHECKED
> >1           2000-01-26    2000-02-01             NOT SO GOOD
> >unchecked
> >1           2000-02-02    2000-02-25                not so good
> >null
> >1           2000-02-26    2000-03-01                   BAD
> > null
> >2           2000-01-01    2000-04-01                     BAD
> > null
> >
> >
> >Both table have about 1 million rows, more than 10 years data. I know I
> >can use number table  to create row for each patient each day then
> >merge them but I am worry about the performance. Does anyone have other
> >ideas?
> >
> >
> >Thanks
> >
> >
> >DDL:
> >
> >
> >create table pat01
> >(PatientID int,
> > StartDate datetime,
> > EndDate datetime,
> > Status1 varchar(20)
> >)
> >
> >insert into pat01 values (1, '2000-01-01','2000-01-25', 'Good')
> >insert into pat01 values (1, '2000-01-26','2000-02-25', 'not so Good')
> >insert into pat01 values (1, '2000-02-26','2000-03-01', 'BAD')
> >insert into pat01 values (2, '2000-01-01','2000-04-01', 'BAD')
> >
> >create table pat02
> >(PatientID int,
> > StartDate2 datetime,
> > EndDate2 datetime,
> > Status2 varchar(20)
> >)
> >
> >insert into pat01 values (1, '2000-01-01','2000-01-10', 'CHECKED')
> >insert into pat01 values (1, '2000-01-11','2000-02-01', 'UNCHECKED')
Author
20 Jul 2006 4:59 PM
Roy Harvey
The problem with my code is bigger than that.  It has problems with
non-continuous ranges that partially overlap.  The problem is in the
view, I will give it more thought.

Sorry about that.

Roy

Show quote
On 20 Jul 2006 09:29:43 -0700, "Jason" <mailtoja***@gmail.com> wrote:

>Roy:
>      Thanks for the quick reply. You solution missed one row:
>
>1           2000-02-02    2000-02-25    not so good     null
>
>
>However, I can fix that by add row into pat02
>
>insert into pat02 values (1, '2000-02-02','2078-06-06', null)
>
>Thanks a lot for your help.
>
>Jason
>
>
>
>Roy Harvey wrote:
>> Thanks for the table definitions and test data.  I did have to fix the
>> second set of INSERTS to point to the second table.
>>
>> The use of a view is optional, but I thought it made it easier to read
>> if nothing else.  My first version used a view to gather the start
>> dates and another to gather the end dates, but I rolled them into the
>> code of the view that remained.
>>
>> CREATE VIEW PatientRanges
>> AS
>> SELECT PatientID, StartDate,
>>        (select min(EndDate)
>>           from (select PatientID, EndDate from pat01
>>                 union
>>                 select PatientID, EndDate2 FROM pat02) as E
>>          where S.StartDate < E.EndDate) as EndDate
>>   FROM (select PatientID, StartDate from pat01
>>         union
>>         select PatientID, StartDate2 from pat02) as S
>> GO
>>
>> SELECT R.PatientId, R.StartDate, R.EndDate,
>>        A.Status1, B.Status2
>>   FROM PatientRanges as R
>>   LEFT OUTER --???
>>   JOIN pat01 as A
>>     on R.PatientID = A.PatientID
>>    and R.StartDate BETWEEN A.StartDate and A.EndDate
>>   LEFT OUTER --???
>>   JOIN pat02 as B
>>     on R.PatientID = B.PatientID
>>    and R.StartDate BETWEEN B.StartDate2 and B.EndDate2
>>
>> Roy Harvey
>> Beacon Falls, CT
>>
>> On 20 Jul 2006 08:28:14 -0700, "Jason" <mailtoja***@gmail.com> wrote:
>>
>> >Hi everyone:
>> >
>> >   I have two tables:
>> >
>> >pat01 record patient status1 from date to date.
>> >
>> >patientID       startdate             enddate          status1
>> >1                2000-01-01           2000-01-25      Good
>> >1               2000-01-26           2000-02-25       not so Good
>> >1               2000-02-26            2000-03-01      BAD
>> >2               2000-01-01            2000-04-01      BAD
>> >
>> >
>> >and another table pat02 record patient status2 from date to date
>> >
>> >
>> >
>> >patientID       startdate2        enddate2          status2
>> >1               2000-01-01         2000-01-10      CHECKED
>> >1               2000-01-11        2000-02-01       UNCHECKED
>> >
>> >
>> >Please note that there are overlap between date range in two tables for
>> >the same patient.Now I want to merge those two tables into one:
>> >
>> >
>> >
>> >PATIENTID  STARTDATE      ENDDATE       STATUS1        STATUS2
>> >1           2000-01-01    2000-01-10                 Good
>> >    CHECKED
>> >1           2000-01-11    2000-01-25                 Good
>> >    UNCHECKED
>> >1           2000-01-26    2000-02-01             NOT SO GOOD
>> >unchecked
>> >1           2000-02-02    2000-02-25                not so good
>> >null
>> >1           2000-02-26    2000-03-01                   BAD
>> > null
>> >2           2000-01-01    2000-04-01                     BAD
>> > null
>> >
>> >
>> >Both table have about 1 million rows, more than 10 years data. I know I
>> >can use number table  to create row for each patient each day then
>> >merge them but I am worry about the performance. Does anyone have other
>> >ideas?
>> >
>> >
>> >Thanks
>> >
>> >
>> >DDL:
>> >
>> >
>> >create table pat01
>> >(PatientID int,
>> > StartDate datetime,
>> > EndDate datetime,
>> > Status1 varchar(20)
>> >)
>> >
>> >insert into pat01 values (1, '2000-01-01','2000-01-25', 'Good')
>> >insert into pat01 values (1, '2000-01-26','2000-02-25', 'not so Good')
>> >insert into pat01 values (1, '2000-02-26','2000-03-01', 'BAD')
>> >insert into pat01 values (2, '2000-01-01','2000-04-01', 'BAD')
>> >
>> >create table pat02
>> >(PatientID int,
>> > StartDate2 datetime,
>> > EndDate2 datetime,
>> > Status2 varchar(20)
>> >)
>> >
>> >insert into pat01 values (1, '2000-01-01','2000-01-10', 'CHECKED')
>> >insert into pat01 values (1, '2000-01-11','2000-02-01', 'UNCHECKED')
Author
20 Jul 2006 5:13 PM
Alexander Kuznetsov
How about this:

CREATE VIEW PatientRanges
AS
SELECT PatientID, StartDate,
       (select min(EndDate)
          from (select PatientID, EndDate from pat01
                union
                select PatientID, EndDate2 FROM pat02
                union
                select PatientID, dateadd(d,-1, StartDate) FROM pat01 p

                union
                select PatientID, dateadd(d,-1, StartDate2) FROM pat02
p
        ) as E
         where S.StartDate <= E.EndDate and s.PatientID = e.PatientID)
as EndDate
  FROM (select PatientID, StartDate from pat01
        union
        select PatientID, StartDate2 from pat02
        union
        select PatientID, dateadd(d,1, EndDate) FROM pat01 p
--        where not exists(select 1 from pat01 p1 where p1.PatientID =
p.PatientID and p1.)
        union
        select PatientID, dateadd(d,1, EndDate2) FROM pat02 p
    ) as S
GO

SELECT R.PatientId, R.StartDate, R.EndDate,
       A.Status1, B.Status2
  FROM PatientRanges as R
  LEFT OUTER --???
  JOIN pat01 as A
    on R.PatientID = A.PatientID
   and R.StartDate BETWEEN A.StartDate and A.EndDate
  LEFT OUTER --???
  JOIN pat02 as B
    on R.PatientID = B.PatientID
   and R.StartDate BETWEEN B.StartDate2 and B.EndDate2
where R.EndDate is not null
Author
20 Jul 2006 5:49 PM
Roy Harvey
Yes, that is essentially what I came up with the second time around. I
just ended up going back to the three-view version as I fiddled
around, but that is not significant.

Roy

On 20 Jul 2006 10:13:26 -0700, "Alexander Kuznetsov"
<AK_TIREDOFSPAM@hotmail.COM> wrote:

Show quote
>How about this:
>
>CREATE VIEW PatientRanges
>AS
>SELECT PatientID, StartDate,
>       (select min(EndDate)
>          from (select PatientID, EndDate from pat01
>                union
>                select PatientID, EndDate2 FROM pat02
>                union
>                select PatientID, dateadd(d,-1, StartDate) FROM pat01 p
>
>                union
>                select PatientID, dateadd(d,-1, StartDate2) FROM pat02
>p
>        ) as E
>         where S.StartDate <= E.EndDate and s.PatientID = e.PatientID)
>as EndDate
>  FROM (select PatientID, StartDate from pat01
>        union
>        select PatientID, StartDate2 from pat02
>        union
>        select PatientID, dateadd(d,1, EndDate) FROM pat01 p
>--        where not exists(select 1 from pat01 p1 where p1.PatientID =
>p.PatientID and p1.)
>        union
>        select PatientID, dateadd(d,1, EndDate2) FROM pat02 p
>    ) as S
>GO
>
>SELECT R.PatientId, R.StartDate, R.EndDate,
>       A.Status1, B.Status2
>  FROM PatientRanges as R
>  LEFT OUTER --???
>  JOIN pat01 as A
>    on R.PatientID = A.PatientID
>   and R.StartDate BETWEEN A.StartDate and A.EndDate
>  LEFT OUTER --???
>  JOIN pat02 as B
>    on R.PatientID = B.PatientID
>   and R.StartDate BETWEEN B.StartDate2 and B.EndDate2
>where R.EndDate is not null
Author
20 Jul 2006 5:46 PM
Roy Harvey
OK, lets start over.  New views, three this time, and a revised query.

CREATE VIEW PatientStarts_V
AS
SELECT PatientID, StartDate
  FROM pat01
UNION
SELECT PatientID, StartDate2
  FROM pat02
UNION
SELECT PatientID, dateadd(day,+1,EndDate)
  FROM pat01
UNION
SELECT PatientID, dateadd(day,+1,EndDate2)
  FROM pat02
GO

CREATE VIEW PatientEnds_V
AS
SELECT PatientID, EndDate
  FROM pat01
UNION
SELECT PatientID, EndDate2
  FROM pat02
UNION
SELECT PatientID, dateadd(day,-1,StartDate)
  FROM pat01
UNION
SELECT PatientID, dateadd(day,-1,StartDate2)
  FROM pat02
GO

CREATE VIEW PatientRanges_V
AS
SELECT PatientID, StartDate,
       (select min(EndDate)
          from PatientEnds_V as E
         where S.PatientID = E.PatientID
           and S.StartDate <= E.EndDate) as EndDate
  FROM PatientStarts_V as S
GO

SELECT R.PatientId, R.StartDate, R.EndDate,
       A.Status1, B.Status2
  FROM PatientRanges_V as R
  LEFT OUTER --???
  JOIN pat01 as A
    on R.PatientID = A.PatientID
   and R.StartDate BETWEEN A.StartDate and A.EndDate
  LEFT OUTER --???
  JOIN pat02 as B
    on R.PatientID = B.PatientID
   and R.StartDate BETWEEN B.StartDate2 and B.EndDate2
WHERE R.EndDate IS NOT NULL
-- WHERE (A.PatientID IS NOT NULL
--    OR  B.PatientID IS NOT NULL)
ORDER BY 1, 2, 3

The commented WHERE clause would get rid of ranges where there was no
status in either table, probably a good idea.

PatientId   StartDate   EndDate     Status1              Status2
----------- ----------- ----------- -------------------- ----------
          1 2000-01-01  2000-01-10  Good                 CHECKED
          1 2000-01-11  2000-01-25  Good                 UNCHECKED
          1 2000-01-26  2000-02-01  not so Good          UNCHECKED
          1 2000-02-02  2000-02-25  not so Good          NULL
          1 2000-02-26  2000-03-01  BAD                  NULL
          2 2000-01-01  2000-04-01  BAD                  NULL

Roy Harvey
Beacon Falls, CT
Author
20 Jul 2006 5:01 PM
Alexander Kuznetsov
Roy,

Consider this slightly changed data:

insert into pat01 values (1, '2000-01-01','2000-01-25', 'Good')
insert into pat01 values (1, '2000-01-26','2000-02-25', 'not so Good')
insert into pat01 values (1, '2000-02-26','2000-03-01', 'BAD')
insert into pat01 values (2, '2000-01-01','2000-04-01', 'BAD')

--- AK: starts on 01-02 rather than on 01-01
insert into pat02 values (1, '2000-01-02','2000-01-10', 'CHECKED')
insert into pat02 values (1, '2000-01-11','2000-02-01', 'UNCHECKED')

The result:

PatientId   StartDate
EndDate                                                Status1
    Status2
----------- ------------------------------------------------------
------------------------------------------------------
-------------------- --------------------
1           2000-01-01 00:00:00.000
2000-01-10 00:00:00.000                                Good
    NULL
1           2000-01-02 00:00:00.000
2000-01-10 00:00:00.000                                Good
    CHECKED
1           2000-01-11 00:00:00.000
2000-01-25 00:00:00.000                                Good
    UNCHECKED
1           2000-01-26 00:00:00.000
2000-02-01 00:00:00.000                                not so Good
    UNCHECKED
1           2000-02-26 00:00:00.000
2000-03-01 00:00:00.000                                BAD
    NULL
2           2000-01-01 00:00:00.000
2000-01-10 00:00:00.000                                BAD
    NULL

is incorrect. the first row should not have Enddate 2000-01-10, it
should be 2000-01-01.
Also the interval 02-01 to 02-25 is missing
Author
21 Jul 2006 2:55 PM
--CELKO--
>> Both table have about 1 million rows, more than 10 years data. I know I can use number table  to create row for each patient each day then merge them but I am worry about the performance. <<

So basically something like this:

SELECT Pat01.patient_id, C1.cal_date,  Pat01.status1,  Pat02.status2
   FROM  Pat01, Pat02, Calendar AS C1
WHERE Pat01.patient_id = Pat02.patient_id
  AND C1.cal_date BETWEEN  Pat01.start_date AND Pat01.end_date
  AND C1.cal_date BETWEEN  Pat02.start_date AND Pat02.end_date
  AND C1.cal_date BETWEEN  @my_start_date AND @my_end_date;

so figure we have 3652+ rows per patient_id, assuming they all stay in
the study for 10 years.    How many patients?  Could be ugly.  But I
also assume this is a one-shot data warehousing kind of thing, so this
might not be a bad way to go.

I cannot think of an answer right now, but I have the feeling that you
can use the new OLAP functions to build the ranges..

AddThis Social Bookmark Button