SQL for Smarties | SQL Programming Style | Trees and Hierarchies in SQL | SQL Puzzles and Answers | Data and Databases


Sunday, September 10, 2006

working with gaps in date series

SQL Apprentice Question
I've seen some pretty creative SQL statements that locate
first/last/missing elements in a series but I haven't been able to
adapt any of them to work speedily with my data set.

Here's the problem:


We have a table of services for our clients (about 2 million rows).
The rows are simply the client's ID and the date of service. Rows may
dupe as clients may have more than one service per date.


We need to be able to define a case start and end date for each client
ID. The end date is defined as the last service date with no further
activity within some number (usually 180) of days. Furthermore, I'd
need to enumerate the cases per client.


So using the example below, I'd be looking for results such as:


Client Start End Case
55577 2/01/2004 5/11/2004 1
55577 1/09/2005 1/09/2005 2
55577 3/04/2006 OPEN 3
72395 4/04/2006 OPEN 1


In these cases, the OPEN dates indicate that there has not been a 180
day period of inactivity since the most recent date.


I currently do this in VB, looping through the ServList dataset and
populating a CaseList recordset. It takes about 30mins to run the job.
However, I'd prefer doing it all in a stored procedure and I'd prefer
to do it without use of cursors. Possible?


Thanks for any help/thoughts,
Steve


CREATE TABLE #ServList (
ClientID int
, ServDate smalldatetime)


INSERT INTO #Servlist VALUES (55577, '2/01/2004')
INSERT INTO #Servlist VALUES (55577, '2/01/2004')
INSERT INTO #Servlist VALUES (55577, '5/11/2004')
INSERT INTO #Servlist VALUES (55577, '1/09/2005')
INSERT INTO #Servlist VALUES (55577, '3/04/2006')
INSERT INTO #Servlist VALUES (55577, '5/17/2006')
INSERT INTO #Servlist VALUES (72395, '4/04/2006')
INSERT INTO #Servlist VALUES (72395, '4/05/2006')
INSERT INTO #Servlist VALUES (72395, '4/06/2006')


Celko Answers
>> We have a table of services for our clients (about 2 million rows). The rows are simply the client's ID and the date of service. Rows may dupe as clients may have more than one service per date. <<


Clear specs, thank you!! But weak DDL and you do not seem to know that
SQL uses ISO-8601 date formats, like all other ISO standards do, in the
sample data. Here is my guess at the real DDL:

CREATE TABLE ServiceTickets
(client_id CHAR(5) NOT NULL
REFERENCES Clients(client_id)
ON UPDATE CASCADE
ON DELETE CASCADE,
service_code CHAR(5) NOT NULL
REFERENCES Services(service_code)
ON UPDATE CASCADE,
service_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL,
service_period INTEGER DEFAULT 0 NOT NULL,
PRIMARY KEY (client_id, service_code, service_date));



>> We need to be able to define a case start and end date for each client ID. The end date is defined as the last service date with no further activity within some number (usually 180) of days. Furthermore, I'd need to enumerate the cases per client. <<


Change the way you think for a minute. Data and declarations, not
procedures and computations. SQL, not VB.


>> I currently do this in VB, looping through the ServList dataset and populating a CaseList recordset. It takes about 30 mins to run the job. <<


Build a table with (n) years of these reporting periods of 180 (or
whatever) days. A spreadsheet is great for this kind of thing.

CREATE TABLE CasePeriods -- needs better name
(service_period INTEGER NOT NULL PRIMARY KEY
CHECK (case_period_nbr > 0),
start_date DATETIME NOT NULL,
end_date DATETIME NOT NULL);


Now we can find the first service_period that a client has.


SELECT T.client_id, MIN(C.service_period)
FROM ServiceTickets AS T, CasePeriods AS C
WHERE T.service_date BETWEEN C.start_date AND C.end_date
AND end_date < CURRENT_TIMESTAMP -- completed periods only
AND service_period = 0 -- unassigned periods
AND ??
GROUP BY T.client_id
HAVING COUNT(*) > 0;


I have a question about the rules. If the guy comes in on day 1, then
comes back on day 180 and 181, are we in the same case or not? If the
guy comes in on day 1, then comes back on day 180 and 182, are we in
the same case or not?


Test this, then use it in an UPDATE statement to put a value in the
service_period column for those rows that are within your (n) day
range. We could hide this in a VIEW, but it looks complex enough to
materialize the service_period values.



>> In these cases, the OPEN dates indicate that there has not been a 180 day period of inactivity since the most recent date. <<


Ther ain't no such date as "Open"; this is why I used a zero reporting
period for the things in process.

No comments: