|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
merge two date ranges with overlap from two tablesI have two tables: pat01 record patient status1 from date to date. patientID startdate enddate status1 1 2000-01-01 2000-01-25 Good 1 2000-01-26 2000-02-25 not so Good 1 2000-02-26 2000-03-01 BAD 2 2000-01-01 2000-04-01 BAD and another table pat02 record patient status2 from date to date patientID startdate2 enddate2 status2 1 2000-01-01 2000-01-10 CHECKED 1 2000-01-11 2000-02-01 UNCHECKED Please note that there are overlap between date range in two tables for the same patient.Now I want to merge those two tables into one: PATIENTID STARTDATE ENDDATE STATUS1 STATUS2 1 2000-01-01 2000-01-10 Good CHECKED 1 2000-01-11 2000-01-25 Good UNCHECKED 1 2000-01-26 2000-02-01 NOT SO GOOD unchecked 1 2000-02-02 2000-02-25 not so good null 1 2000-02-26 2000-03-01 BAD null 2 2000-01-01 2000-04-01 BAD null Both table have about 1 million rows, more than 10 years data. I know I can use number table to create row for each patient each day then merge them but I am worry about the performance. Does anyone have other ideas? Thanks DDL: create table pat01 (PatientID int, StartDate datetime, EndDate datetime, Status1 varchar(20) ) insert into pat01 values (1, '2000-01-01','2000-01-25', 'Good') insert into pat01 values (1, '2000-01-26','2000-02-25', 'not so Good') insert into pat01 values (1, '2000-02-26','2000-03-01', 'BAD') insert into pat01 values (2, '2000-01-01','2000-04-01', 'BAD') create table pat02 (PatientID int, StartDate2 datetime, EndDate2 datetime, Status2 varchar(20) ) insert into pat01 values (1, '2000-01-01','2000-01-10', 'CHECKED') insert into pat01 values (1, '2000-01-11','2000-02-01', 'UNCHECKED') In SQL Puzzles, Puzzle #3, Anesthesia, might give you a clue.
Thanks for the table definitions and test data. I did have to fix the
second set of INSERTS to point to the second table. The use of a view is optional, but I thought it made it easier to read if nothing else. My first version used a view to gather the start dates and another to gather the end dates, but I rolled them into the code of the view that remained. CREATE VIEW PatientRanges AS SELECT PatientID, StartDate, (select min(EndDate) from (select PatientID, EndDate from pat01 union select PatientID, EndDate2 FROM pat02) as E where S.StartDate < E.EndDate) as EndDate FROM (select PatientID, StartDate from pat01 union select PatientID, StartDate2 from pat02) as S GO SELECT R.PatientId, R.StartDate, R.EndDate, A.Status1, B.Status2 FROM PatientRanges as R LEFT OUTER --??? JOIN pat01 as A on R.PatientID = A.PatientID and R.StartDate BETWEEN A.StartDate and A.EndDate LEFT OUTER --??? JOIN pat02 as B on R.PatientID = B.PatientID and R.StartDate BETWEEN B.StartDate2 and B.EndDate2 Roy Harvey Beacon Falls, CT Show quote On 20 Jul 2006 08:28:14 -0700, "Jason" <mailtoja***@gmail.com> wrote: >Hi everyone: > > I have two tables: > >pat01 record patient status1 from date to date. > >patientID startdate enddate status1 >1 2000-01-01 2000-01-25 Good >1 2000-01-26 2000-02-25 not so Good >1 2000-02-26 2000-03-01 BAD >2 2000-01-01 2000-04-01 BAD > > >and another table pat02 record patient status2 from date to date > > > >patientID startdate2 enddate2 status2 >1 2000-01-01 2000-01-10 CHECKED >1 2000-01-11 2000-02-01 UNCHECKED > > >Please note that there are overlap between date range in two tables for >the same patient.Now I want to merge those two tables into one: > > > >PATIENTID STARTDATE ENDDATE STATUS1 STATUS2 >1 2000-01-01 2000-01-10 Good > CHECKED >1 2000-01-11 2000-01-25 Good > UNCHECKED >1 2000-01-26 2000-02-01 NOT SO GOOD >unchecked >1 2000-02-02 2000-02-25 not so good >null >1 2000-02-26 2000-03-01 BAD > null >2 2000-01-01 2000-04-01 BAD > null > > >Both table have about 1 million rows, more than 10 years data. I know I >can use number table to create row for each patient each day then >merge them but I am worry about the performance. Does anyone have other >ideas? > > >Thanks > > >DDL: > > >create table pat01 >(PatientID int, > StartDate datetime, > EndDate datetime, > Status1 varchar(20) >) > >insert into pat01 values (1, '2000-01-01','2000-01-25', 'Good') >insert into pat01 values (1, '2000-01-26','2000-02-25', 'not so Good') >insert into pat01 values (1, '2000-02-26','2000-03-01', 'BAD') >insert into pat01 values (2, '2000-01-01','2000-04-01', 'BAD') > >create table pat02 >(PatientID int, > StartDate2 datetime, > EndDate2 datetime, > Status2 varchar(20) >) > >insert into pat01 values (1, '2000-01-01','2000-01-10', 'CHECKED') >insert into pat01 values (1, '2000-01-11','2000-02-01', 'UNCHECKED') Roy:
Thanks for the quick reply. You solution missed one row: 1 2000-02-02 2000-02-25 not so good null However, I can fix that by add row into pat02 insert into pat02 values (1, '2000-02-02','2078-06-06', null) Thanks a lot for your help. Jason Roy Harvey wrote: Show quote > Thanks for the table definitions and test data. I did have to fix the > second set of INSERTS to point to the second table. > > The use of a view is optional, but I thought it made it easier to read > if nothing else. My first version used a view to gather the start > dates and another to gather the end dates, but I rolled them into the > code of the view that remained. > > CREATE VIEW PatientRanges > AS > SELECT PatientID, StartDate, > (select min(EndDate) > from (select PatientID, EndDate from pat01 > union > select PatientID, EndDate2 FROM pat02) as E > where S.StartDate < E.EndDate) as EndDate > FROM (select PatientID, StartDate from pat01 > union > select PatientID, StartDate2 from pat02) as S > GO > > SELECT R.PatientId, R.StartDate, R.EndDate, > A.Status1, B.Status2 > FROM PatientRanges as R > LEFT OUTER --??? > JOIN pat01 as A > on R.PatientID = A.PatientID > and R.StartDate BETWEEN A.StartDate and A.EndDate > LEFT OUTER --??? > JOIN pat02 as B > on R.PatientID = B.PatientID > and R.StartDate BETWEEN B.StartDate2 and B.EndDate2 > > Roy Harvey > Beacon Falls, CT > > On 20 Jul 2006 08:28:14 -0700, "Jason" <mailtoja***@gmail.com> wrote: > > >Hi everyone: > > > > I have two tables: > > > >pat01 record patient status1 from date to date. > > > >patientID startdate enddate status1 > >1 2000-01-01 2000-01-25 Good > >1 2000-01-26 2000-02-25 not so Good > >1 2000-02-26 2000-03-01 BAD > >2 2000-01-01 2000-04-01 BAD > > > > > >and another table pat02 record patient status2 from date to date > > > > > > > >patientID startdate2 enddate2 status2 > >1 2000-01-01 2000-01-10 CHECKED > >1 2000-01-11 2000-02-01 UNCHECKED > > > > > >Please note that there are overlap between date range in two tables for > >the same patient.Now I want to merge those two tables into one: > > > > > > > >PATIENTID STARTDATE ENDDATE STATUS1 STATUS2 > >1 2000-01-01 2000-01-10 Good > > CHECKED > >1 2000-01-11 2000-01-25 Good > > UNCHECKED > >1 2000-01-26 2000-02-01 NOT SO GOOD > >unchecked > >1 2000-02-02 2000-02-25 not so good > >null > >1 2000-02-26 2000-03-01 BAD > > null > >2 2000-01-01 2000-04-01 BAD > > null > > > > > >Both table have about 1 million rows, more than 10 years data. I know I > >can use number table to create row for each patient each day then > >merge them but I am worry about the performance. Does anyone have other > >ideas? > > > > > >Thanks > > > > > >DDL: > > > > > >create table pat01 > >(PatientID int, > > StartDate datetime, > > EndDate datetime, > > Status1 varchar(20) > >) > > > >insert into pat01 values (1, '2000-01-01','2000-01-25', 'Good') > >insert into pat01 values (1, '2000-01-26','2000-02-25', 'not so Good') > >insert into pat01 values (1, '2000-02-26','2000-03-01', 'BAD') > >insert into pat01 values (2, '2000-01-01','2000-04-01', 'BAD') > > > >create table pat02 > >(PatientID int, > > StartDate2 datetime, > > EndDate2 datetime, > > Status2 varchar(20) > >) > > > >insert into pat01 values (1, '2000-01-01','2000-01-10', 'CHECKED') > >insert into pat01 values (1, '2000-01-11','2000-02-01', 'UNCHECKED') The problem with my code is bigger than that. It has problems with
non-continuous ranges that partially overlap. The problem is in the view, I will give it more thought. Sorry about that. Roy Show quote On 20 Jul 2006 09:29:43 -0700, "Jason" <mailtoja***@gmail.com> wrote: >Roy: > Thanks for the quick reply. You solution missed one row: > >1 2000-02-02 2000-02-25 not so good null > > >However, I can fix that by add row into pat02 > >insert into pat02 values (1, '2000-02-02','2078-06-06', null) > >Thanks a lot for your help. > >Jason > > > >Roy Harvey wrote: >> Thanks for the table definitions and test data. I did have to fix the >> second set of INSERTS to point to the second table. >> >> The use of a view is optional, but I thought it made it easier to read >> if nothing else. My first version used a view to gather the start >> dates and another to gather the end dates, but I rolled them into the >> code of the view that remained. >> >> CREATE VIEW PatientRanges >> AS >> SELECT PatientID, StartDate, >> (select min(EndDate) >> from (select PatientID, EndDate from pat01 >> union >> select PatientID, EndDate2 FROM pat02) as E >> where S.StartDate < E.EndDate) as EndDate >> FROM (select PatientID, StartDate from pat01 >> union >> select PatientID, StartDate2 from pat02) as S >> GO >> >> SELECT R.PatientId, R.StartDate, R.EndDate, >> A.Status1, B.Status2 >> FROM PatientRanges as R >> LEFT OUTER --??? >> JOIN pat01 as A >> on R.PatientID = A.PatientID >> and R.StartDate BETWEEN A.StartDate and A.EndDate >> LEFT OUTER --??? >> JOIN pat02 as B >> on R.PatientID = B.PatientID >> and R.StartDate BETWEEN B.StartDate2 and B.EndDate2 >> >> Roy Harvey >> Beacon Falls, CT >> >> On 20 Jul 2006 08:28:14 -0700, "Jason" <mailtoja***@gmail.com> wrote: >> >> >Hi everyone: >> > >> > I have two tables: >> > >> >pat01 record patient status1 from date to date. >> > >> >patientID startdate enddate status1 >> >1 2000-01-01 2000-01-25 Good >> >1 2000-01-26 2000-02-25 not so Good >> >1 2000-02-26 2000-03-01 BAD >> >2 2000-01-01 2000-04-01 BAD >> > >> > >> >and another table pat02 record patient status2 from date to date >> > >> > >> > >> >patientID startdate2 enddate2 status2 >> >1 2000-01-01 2000-01-10 CHECKED >> >1 2000-01-11 2000-02-01 UNCHECKED >> > >> > >> >Please note that there are overlap between date range in two tables for >> >the same patient.Now I want to merge those two tables into one: >> > >> > >> > >> >PATIENTID STARTDATE ENDDATE STATUS1 STATUS2 >> >1 2000-01-01 2000-01-10 Good >> > CHECKED >> >1 2000-01-11 2000-01-25 Good >> > UNCHECKED >> >1 2000-01-26 2000-02-01 NOT SO GOOD >> >unchecked >> >1 2000-02-02 2000-02-25 not so good >> >null >> >1 2000-02-26 2000-03-01 BAD >> > null >> >2 2000-01-01 2000-04-01 BAD >> > null >> > >> > >> >Both table have about 1 million rows, more than 10 years data. I know I >> >can use number table to create row for each patient each day then >> >merge them but I am worry about the performance. Does anyone have other >> >ideas? >> > >> > >> >Thanks >> > >> > >> >DDL: >> > >> > >> >create table pat01 >> >(PatientID int, >> > StartDate datetime, >> > EndDate datetime, >> > Status1 varchar(20) >> >) >> > >> >insert into pat01 values (1, '2000-01-01','2000-01-25', 'Good') >> >insert into pat01 values (1, '2000-01-26','2000-02-25', 'not so Good') >> >insert into pat01 values (1, '2000-02-26','2000-03-01', 'BAD') >> >insert into pat01 values (2, '2000-01-01','2000-04-01', 'BAD') >> > >> >create table pat02 >> >(PatientID int, >> > StartDate2 datetime, >> > EndDate2 datetime, >> > Status2 varchar(20) >> >) >> > >> >insert into pat01 values (1, '2000-01-01','2000-01-10', 'CHECKED') >> >insert into pat01 values (1, '2000-01-11','2000-02-01', 'UNCHECKED') How about this:
CREATE VIEW PatientRanges AS SELECT PatientID, StartDate, (select min(EndDate) from (select PatientID, EndDate from pat01 union select PatientID, EndDate2 FROM pat02 union select PatientID, dateadd(d,-1, StartDate) FROM pat01 p union select PatientID, dateadd(d,-1, StartDate2) FROM pat02 p ) as E where S.StartDate <= E.EndDate and s.PatientID = e.PatientID) as EndDate FROM (select PatientID, StartDate from pat01 union select PatientID, StartDate2 from pat02 union select PatientID, dateadd(d,1, EndDate) FROM pat01 p -- where not exists(select 1 from pat01 p1 where p1.PatientID = p.PatientID and p1.) union select PatientID, dateadd(d,1, EndDate2) FROM pat02 p ) as S GO SELECT R.PatientId, R.StartDate, R.EndDate, A.Status1, B.Status2 FROM PatientRanges as R LEFT OUTER --??? JOIN pat01 as A on R.PatientID = A.PatientID and R.StartDate BETWEEN A.StartDate and A.EndDate LEFT OUTER --??? JOIN pat02 as B on R.PatientID = B.PatientID and R.StartDate BETWEEN B.StartDate2 and B.EndDate2 where R.EndDate is not null Yes, that is essentially what I came up with the second time around. I
just ended up going back to the three-view version as I fiddled around, but that is not significant. Roy On 20 Jul 2006 10:13:26 -0700, "Alexander Kuznetsov" <AK_TIREDOFSPAM@hotmail.COM> wrote: Show quote >How about this: > >CREATE VIEW PatientRanges >AS >SELECT PatientID, StartDate, > (select min(EndDate) > from (select PatientID, EndDate from pat01 > union > select PatientID, EndDate2 FROM pat02 > union > select PatientID, dateadd(d,-1, StartDate) FROM pat01 p > > union > select PatientID, dateadd(d,-1, StartDate2) FROM pat02 >p > ) as E > where S.StartDate <= E.EndDate and s.PatientID = e.PatientID) >as EndDate > FROM (select PatientID, StartDate from pat01 > union > select PatientID, StartDate2 from pat02 > union > select PatientID, dateadd(d,1, EndDate) FROM pat01 p >-- where not exists(select 1 from pat01 p1 where p1.PatientID = >p.PatientID and p1.) > union > select PatientID, dateadd(d,1, EndDate2) FROM pat02 p > ) as S >GO > >SELECT R.PatientId, R.StartDate, R.EndDate, > A.Status1, B.Status2 > FROM PatientRanges as R > LEFT OUTER --??? > JOIN pat01 as A > on R.PatientID = A.PatientID > and R.StartDate BETWEEN A.StartDate and A.EndDate > LEFT OUTER --??? > JOIN pat02 as B > on R.PatientID = B.PatientID > and R.StartDate BETWEEN B.StartDate2 and B.EndDate2 >where R.EndDate is not null OK, lets start over. New views, three this time, and a revised query.
CREATE VIEW PatientStarts_V AS SELECT PatientID, StartDate FROM pat01 UNION SELECT PatientID, StartDate2 FROM pat02 UNION SELECT PatientID, dateadd(day,+1,EndDate) FROM pat01 UNION SELECT PatientID, dateadd(day,+1,EndDate2) FROM pat02 GO CREATE VIEW PatientEnds_V AS SELECT PatientID, EndDate FROM pat01 UNION SELECT PatientID, EndDate2 FROM pat02 UNION SELECT PatientID, dateadd(day,-1,StartDate) FROM pat01 UNION SELECT PatientID, dateadd(day,-1,StartDate2) FROM pat02 GO CREATE VIEW PatientRanges_V AS SELECT PatientID, StartDate, (select min(EndDate) from PatientEnds_V as E where S.PatientID = E.PatientID and S.StartDate <= E.EndDate) as EndDate FROM PatientStarts_V as S GO SELECT R.PatientId, R.StartDate, R.EndDate, A.Status1, B.Status2 FROM PatientRanges_V as R LEFT OUTER --??? JOIN pat01 as A on R.PatientID = A.PatientID and R.StartDate BETWEEN A.StartDate and A.EndDate LEFT OUTER --??? JOIN pat02 as B on R.PatientID = B.PatientID and R.StartDate BETWEEN B.StartDate2 and B.EndDate2 WHERE R.EndDate IS NOT NULL -- WHERE (A.PatientID IS NOT NULL -- OR B.PatientID IS NOT NULL) ORDER BY 1, 2, 3 The commented WHERE clause would get rid of ranges where there was no status in either table, probably a good idea. PatientId StartDate EndDate Status1 Status2 ----------- ----------- ----------- -------------------- ---------- 1 2000-01-01 2000-01-10 Good CHECKED 1 2000-01-11 2000-01-25 Good UNCHECKED 1 2000-01-26 2000-02-01 not so Good UNCHECKED 1 2000-02-02 2000-02-25 not so Good NULL 1 2000-02-26 2000-03-01 BAD NULL 2 2000-01-01 2000-04-01 BAD NULL Roy Harvey Beacon Falls, CT Roy,
Consider this slightly changed data: insert into pat01 values (1, '2000-01-01','2000-01-25', 'Good') insert into pat01 values (1, '2000-01-26','2000-02-25', 'not so Good') insert into pat01 values (1, '2000-02-26','2000-03-01', 'BAD') insert into pat01 values (2, '2000-01-01','2000-04-01', 'BAD') --- AK: starts on 01-02 rather than on 01-01 insert into pat02 values (1, '2000-01-02','2000-01-10', 'CHECKED') insert into pat02 values (1, '2000-01-11','2000-02-01', 'UNCHECKED') The result: PatientId StartDate EndDate Status1 Status2 ----------- ------------------------------------------------------ ------------------------------------------------------ -------------------- -------------------- 1 2000-01-01 00:00:00.000 2000-01-10 00:00:00.000 Good NULL 1 2000-01-02 00:00:00.000 2000-01-10 00:00:00.000 Good CHECKED 1 2000-01-11 00:00:00.000 2000-01-25 00:00:00.000 Good UNCHECKED 1 2000-01-26 00:00:00.000 2000-02-01 00:00:00.000 not so Good UNCHECKED 1 2000-02-26 00:00:00.000 2000-03-01 00:00:00.000 BAD NULL 2 2000-01-01 00:00:00.000 2000-01-10 00:00:00.000 BAD NULL is incorrect. the first row should not have Enddate 2000-01-10, it should be 2000-01-01. Also the interval 02-01 to 02-25 is missing >> Both table have about 1 million rows, more than 10 years data. I know I can use number table to create row for each patient each day then merge them but I am worry about the performance. << So basically something like this:SELECT Pat01.patient_id, C1.cal_date, Pat01.status1, Pat02.status2 FROM Pat01, Pat02, Calendar AS C1 WHERE Pat01.patient_id = Pat02.patient_id AND C1.cal_date BETWEEN Pat01.start_date AND Pat01.end_date AND C1.cal_date BETWEEN Pat02.start_date AND Pat02.end_date AND C1.cal_date BETWEEN @my_start_date AND @my_end_date; so figure we have 3652+ rows per patient_id, assuming they all stay in the study for 10 years. How many patients? Could be ugly. But I also assume this is a one-shot data warehousing kind of thing, so this might not be a bad way to go. I cannot think of an answer right now, but I have the feeling that you can use the new OLAP functions to build the ranges.. |
|||||||||||||||||||||||