SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Monday, July 24, 2006

merge two date ranges with overlap from two tables

SQL Apprentice Question
I have two tables:


pat01 record patient status1 from date to date.


patientID startdate enddate status1
1 2000-01-01 2000-01-25 Good
1 2000-01-26 2000-02-25 not so Good
1 2000-02-26 2000-03-01 BAD
2 2000-01-01 2000-04-01 BAD


and another table pat02 record patient status2 from date to date


patientID startdate2 enddate2 status2
1 2000-01-01 2000-01-10 CHECKED
1 2000-01-11 2000-02-01 UNCHECKED


Please note that there are overlap between date range in two tables for
the same patient.Now I want to merge those two tables into one:


PATIENTID STARTDATE ENDDATE STATUS1 STATUS2
1 2000-01-01 2000-01-10 Good
CHECKED
1 2000-01-11 2000-01-25 Good
UNCHECKED
1 2000-01-26 2000-02-01 NOT SO GOOD
unchecked
1 2000-02-02 2000-02-25 not so good
null
1 2000-02-26 2000-03-01 BAD
null
2 2000-01-01 2000-04-01 BAD
null


Both table have about 1 million rows, more than 10 years data. I know I
can use number table to create row for each patient each day then
merge them but I am worry about the performance. Does anyone have other
ideas?


Thanks


DDL:


create table pat01
(PatientID int,
StartDate datetime,
EndDate datetime,
Status1 varchar(20)
)


insert into pat01 values (1, '2000-01-01','2000-01-25', 'Good')
insert into pat01 values (1, '2000-01-26','2000-02-25', 'not so Good')
insert into pat01 values (1, '2000-02-26','2000-03-01', 'BAD')
insert into pat01 values (2, '2000-01-01','2000-04-01', 'BAD')


create table pat02
(PatientID int,
StartDate2 datetime,
EndDate2 datetime,
Status2 varchar(20)
)


insert into pat01 values (1, '2000-01-01','2000-01-10', 'CHECKED')
insert into pat01 values (1, '2000-01-11','2000-02-01', 'UNCHECKED')


Celko Answers

>> Both table have about 1 million rows, more than 10 years data. I know I can use number table to create row for each patient each day then merge them but I am worry about the performance. <<


So basically something like this:

SELECT Pat01.patient_id, C1.cal_date, Pat01.status1, Pat02.status2
FROM Pat01, Pat02, Calendar AS C1
WHERE Pat01.patient_id = Pat02.patient_id
AND C1.cal_date BETWEEN Pat01.start_date AND Pat01.end_date
AND C1.cal_date BETWEEN Pat02.start_date AND Pat02.end_date
AND C1.cal_date BETWEEN @my_start_date AND @my_end_date;


so figure we have 3652+ rows per patient_id, assuming they all stay in
the study for 10 years. How many patients? Could be ugly. But I
also assume this is a one-shot data warehousing kind of thing, so this
might not be a bad way to go.


I cannot think of an answer right now, but I have the feeling that you
can use the new OLAP functions to build the ranges..

No comments: