|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
working with gaps in date seriesfirst/last/missing elements in a series but I haven't been able to adapt any of them to work speedily with my data set. Here's the problem: We have a table of services for our clients (about 2 million rows). The rows are simply the client's ID and the date of service. Rows may dupe as clients may have more than one service per date. We need to be able to define a case start and end date for each client ID. The end date is defined as the last service date with no further activity within some number (usually 180) of days. Furthermore, I'd need to enumerate the cases per client. So using the example below, I'd be looking for results such as: Client Start End Case 55577 2/01/2004 5/11/2004 1 55577 1/09/2005 1/09/2005 2 55577 3/04/2006 OPEN 3 72395 4/04/2006 OPEN 1 In these cases, the OPEN dates indicate that there has not been a 180 day period of inactivity since the most recent date. I currently do this in VB, looping through the ServList dataset and populating a CaseList recordset. It takes about 30mins to run the job. However, I'd prefer doing it all in a stored procedure and I'd prefer to do it without use of cursors. Possible? Thanks for any help/thoughts, Steve CREATE TABLE #ServList ( ClientID int , ServDate smalldatetime) INSERT INTO #Servlist VALUES (55577, '2/01/2004') INSERT INTO #Servlist VALUES (55577, '2/01/2004') INSERT INTO #Servlist VALUES (55577, '5/11/2004') INSERT INTO #Servlist VALUES (55577, '1/09/2005') INSERT INTO #Servlist VALUES (55577, '3/04/2006') INSERT INTO #Servlist VALUES (55577, '5/17/2006') INSERT INTO #Servlist VALUES (72395, '4/04/2006') INSERT INTO #Servlist VALUES (72395, '4/05/2006') INSERT INTO #Servlist VALUES (72395, '4/06/2006') Steve,
start from this: SELECT DISTINCT ClientID, ServDate AS Start , ( SELECT MIN(s1.ServDate) FROM #ServList s1 WHERE s1.ClientID = s.ClientID AND s1.ServDate >= s.ServDate ---------- instead of getdate(), use last midnight AND s1.ServDate < DATEADD(DAY, -179, GETDATE()) AND NOT EXISTS(SELECT 1 FROM #ServList s2 WHERE s2.ClientID = s1.ClientID AND s2.ServDate > s1.ServDate AND s2.ServDate <= DATEADD(DAY, 179, s1.ServDate)) ) FROM #ServList s WHERE NOT EXISTS(SELECT 1 FROM #ServList s1 WHERE s1.ClientID = s.ClientID AND s1.ServDate >= DATEADD(DAY, -179, s.ServDate) AND s1.ServDate < s.ServDate) ClientID Start ----------- ------------------------------------------------------ ------------------------------------------------------ 55577 2004-02-01 00:00:00 2004-05-11 00:00:00 55577 2005-01-09 00:00:00 2005-01-09 00:00:00 55577 2006-03-04 00:00:00 NULL 72395 2006-04-04 00:00:00 NULL (4 row(s) affected) Enumerating should be easy. Alexander,
Perfect! Against 2 million rows, yours ran in just over 3 minutes. I'd been trying something similar but clearly missed the boat as mine was 45 mins on only 100,000 rows. Thanks for setting me straight. -Steve Alexander Kuznetsov wrote: Show quote > Steve, > > start from this: > > SELECT DISTINCT ClientID, ServDate AS Start > , ( > SELECT MIN(s1.ServDate) > FROM #ServList s1 > WHERE s1.ClientID = s.ClientID > AND s1.ServDate >= s.ServDate > ---------- instead of getdate(), use last midnight > AND s1.ServDate < DATEADD(DAY, -179, GETDATE()) > AND NOT EXISTS(SELECT 1 FROM #ServList s2 WHERE s2.ClientID = > s1.ClientID AND s2.ServDate > s1.ServDate AND s2.ServDate <= > DATEADD(DAY, 179, s1.ServDate)) > ) > FROM #ServList s > WHERE NOT EXISTS(SELECT 1 FROM #ServList s1 WHERE s1.ClientID = > s.ClientID AND s1.ServDate >= DATEADD(DAY, -179, s.ServDate) AND > s1.ServDate < s.ServDate) > > ClientID Start > > ----------- ------------------------------------------------------ > ------------------------------------------------------ > 55577 2004-02-01 00:00:00 > 2004-05-11 00:00:00 > 55577 2005-01-09 00:00:00 > 2005-01-09 00:00:00 > 55577 2006-03-04 00:00:00 NULL > 72395 2006-04-04 00:00:00 NULL > > (4 row(s) affected) > > Enumerating should be easy. In SQL 2005, you can use the ROW_NUMBER() function to derive the case
column. For the remainder of the query, you can try: SELECT clientid, CONVERT( VARCHAR, MIN( start_dt ), 101 ) AS "start", CASE WHEN DATEDIFF( d, end_dt, CURRENT_TIMESTAMP ) > 180 THEN CONVERT( VARCHAR, end_dt, 101 ) ELSE 'OPEN' END AS "end" FROM ( SELECT s1.ClientId, s1.ServDate, MAX( s2.ServDate ) FROM #ServList s1 JOIN #ServList s2 ON s2.ClientID = s1.ClientID AND s2.ServDate <= DATEADD( d, 180, s1.ServDate ) GROUP BY s1.ClientId, s1.ServDate ) D ( clientid, start_dt, end_dt ) GROUP BY clientid, end_dt ORDER BY clientid, end_dt ; -- Anith Anith,
because of this line in your query: AND s2.ServDate <= DATEADD( d, 180, s1.ServDate ) your intervals cannot be longer than 181 days. Consider this sample data, where service dates are 5 months apart: INSERT INTO #Servlist VALUES (55577, '1/01/2004') INSERT INTO #Servlist VALUES (55577, '6/01/2004') INSERT INTO #Servlist VALUES (55577, '11/01/2004') INSERT INTO #Servlist VALUES (55577, '4/01/2005') INSERT INTO #Servlist VALUES (55577, '9/01/2005') INSERT INTO #Servlist VALUES (55577, '2/01/2006') Your query returns a set of intervals with common boundaries clientid start end ----------- ------------------------------ ------------------------------ 55577 01/01/2004 06/01/2004 55577 06/01/2004 11/01/2004 55577 11/01/2004 04/01/2005 55577 04/01/2005 09/01/2005 55577 09/01/2005 02/01/2006 (5 row(s) affected) I believe it should return just one interval: ClientID Start ----------- ------------------------------------------------------ ------------------------------------------------------ 55577 2004-01-01 00:00:00 2006-02-01 00:00:00 (1 row(s) affected) Steve, did I get the requirements right? >> your intervals cannot be longer than 181 days. I thought that is the requirement when he meant "with no further activity within some number (usually 180) of days.". Perhaps, I might have misread it, but your response seems to satisfy his needs. -- Anith Alexander,
Yes, you've got it exactly. -Thanks again! Alexander Kuznetsov wrote: Show quote > Anith, > > because of this line in your query: > > AND s2.ServDate <= DATEADD( d, 180, s1.ServDate ) > > your intervals cannot be longer than 181 days. > Consider this sample data, where service dates are 5 months apart: > > > INSERT INTO #Servlist VALUES (55577, '1/01/2004') > INSERT INTO #Servlist VALUES (55577, '6/01/2004') > INSERT INTO #Servlist VALUES (55577, '11/01/2004') > INSERT INTO #Servlist VALUES (55577, '4/01/2005') > INSERT INTO #Servlist VALUES (55577, '9/01/2005') > INSERT INTO #Servlist VALUES (55577, '2/01/2006') > > Your query returns a set of intervals with common boundaries > > clientid start end > > ----------- ------------------------------ > ------------------------------ > 55577 01/01/2004 06/01/2004 > 55577 06/01/2004 11/01/2004 > 55577 11/01/2004 04/01/2005 > 55577 04/01/2005 09/01/2005 > 55577 09/01/2005 02/01/2006 > > (5 row(s) affected) > > I believe it should return just one interval: > > ClientID Start > > ----------- ------------------------------------------------------ > ------------------------------------------------------ > 55577 2004-01-01 00:00:00 > 2006-02-01 00:00:00 > > (1 row(s) affected) > > Steve, did I get the requirements right? >> We have a table of services for our clients (about 2 million rows). The rows are simply the client's ID and the date of service. Rows may dupe as clients may have more than one service per date. << Clear specs, thank you!! But weak DDL and you do not seem to know thatSQL uses ISO-8601 date formats, like all other ISO standards do, in the sample data. Here is my guess at the real DDL: CREATE TABLE ServiceTickets (client_id CHAR(5) NOT NULL REFERENCES Clients(client_id) ON UPDATE CASCADE ON DELETE CASCADE, service_code CHAR(5) NOT NULL REFERENCES Services(service_code) ON UPDATE CASCADE, service_date DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL, service_period INTEGER DEFAULT 0 NOT NULL, PRIMARY KEY (client_id, service_code, service_date)); >> We need to be able to define a case start and end date for each client ID. The end date is defined as the last service date with no further activity within some number (usually 180) of days. Furthermore, I'd need to enumerate the cases per client. << Change the way you think for a minute. Data and declarations, notprocedures and computations. SQL, not VB. >> I currently do this in VB, looping through the ServList dataset and populating a CaseList recordset. It takes about 30 mins to run the job. << Build a table with (n) years of these reporting periods of 180 (orwhatever) days. A spreadsheet is great for this kind of thing. CREATE TABLE CasePeriods -- needs better name (service_period INTEGER NOT NULL PRIMARY KEY CHECK (case_period_nbr > 0), start_date DATETIME NOT NULL, end_date DATETIME NOT NULL); Now we can find the first service_period that a client has. SELECT T.client_id, MIN(C.service_period) FROM ServiceTickets AS T, CasePeriods AS C WHERE T.service_date BETWEEN C.start_date AND C.end_date AND end_date < CURRENT_TIMESTAMP -- completed periods only AND service_period = 0 -- unassigned periods AND ?? GROUP BY T.client_id HAVING COUNT(*) > 0; I have a question about the rules. If the guy comes in on day 1, then comes back on day 180 and 181, are we in the same case or not? If the guy comes in on day 1, then comes back on day 180 and 182, are we in the same case or not? Test this, then use it in an UPDATE statement to put a value in the service_period column for those rows that are within your (n) day range. We could hide this in a VIEW, but it looks complex enough to materialize the service_period values. >> In these cases, the OPEN dates indicate that there has not been a 180 day period of inactivity since the most recent date. << Ther ain't no such date as "Open"; this is why I used a zero reportingperiod for the things in process. Joe,
Thanks for the response - it looks like you've guessed that we're talking about medical claims data. However, in my case I think that it may be irrelevant as to the type of service_code. In our DDL, the primary key would actually be based on (client_id, service_code, service_code_modifier_1, servdate, provider_id). For simplicity, we use a view that's keyed on distinct client_id, servdate. What we're needing to do is look for periods of inactivity and declare episodes of care. So, as an example, If the guy comes in on day 1 and the next time is day 180; that's in the same case. And then if he show's up on day 181, that's still in the same case (because he showed up on day 180). If a guy shows on day 1 and then the next day is 181, those are 2 different cases. What will break the series into the next case is when a servdate (S2) is more than 180 days from the most recent date (S1). Case 1 then is defined as the client_id, the first date (whatever that was, as long as it was within the past 180 days), and S1 as the closing date. Case 2 will be defined as the client_id, S2 as the startdate, and some new close date (or open if most recent date within the past 6 months). And, you are most correct in my need for a shift in thinking from procedural-based execution vs set-based!! Learning ... Thanks very much for taking the time to assist! -Steve Lord --CELKO-- wrote: -- snip -- Show quote > > I have a question about the rules. If the guy comes in on day 1, then > comes back on day 180 and 181, are we in the same case or not? If the > guy comes in on day 1, then comes back on day 180 and 182, are we in > the same case or not? > > Test this, then use it in an UPDATE statement to put a value in the > service_period column for those rows that are within your (n) day > range. We could hide this in a VIEW, but it looks complex enough to > materialize the service_period values. > > >> In these cases, the OPEN dates indicate that there has not been a 180 day period of inactivity since the most recent date. << > > Ther ain't no such date as "Open"; this is why I used a zero reporting > period for the things in process. |
|||||||||||||||||||||||