|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Self-join based on datetimefrom a table alongside the previous record (based on a timestamp) for each employee. I created a small mock-up table and wrote a query that almost works, but not quite. Here are some details: Table: CREATE TABLE #TMPSALES ( TID INT IDENTITY(1,1), EMPLOYEE VARCHAR(30), TSTAMP DATETIME, SALES INT ) Sample table data: TID EMPLOYEE TSTAMP SALES 1 bob 2006-07-01 09:15:08 100 2 ed 2006-07-01 09:32:19 200 3 jack 2006-07-01 09:41:30 300 4 bob 2006-07-01 11:08:51 400 5 jack 2006-07-01 11:10:12 500 6 ed 2006-07-01 11:15:41 600 7 bob 2006-07-02 10:03:14 700 8 jack 2006-07-02 10:05:19 800 9 ed 2006-07-02 10:34:46 900 10 bob 2006-07-03 09:24:13 700 11 jack 2006-07-03 10:02:09 800 12 ed 2006-07-03 10:06:20 900 Desired query output: TID EMP TSTAMP_CUR SALES_CUR TID_PREV TSTAMP_PREV SALES_PREV 4 bob 2006-07-01 11:08:51 400 1 2006-07-01 09:15:08 100 5 jack 2006-07-01 11:10:12 500 3 2006-07-01 09:41:30 300 6 ed 2006-07-01 11:15:41 600 2 2006-07-01 09:32:19 200 7 bob 2006-07-02 10:03:14 700 4 2006-07-01 11:08:51 400 8 jack 2006-07-02 10:05:19 800 5 2006-07-01 11:10:12 500 9 ed 2006-07-02 10:34:46 900 6 2006-07-01 11:15:41 600 10 bob 2006-07-03 09:24:13 700 7 2006-07-02 10:03:14 700 11 jack 2006-07-03 10:02:09 800 8 2006-07-02 10:05:19 800 12 ed 2006-07-03 10:06:20 900 9 2006-07-02 10:34:46 900 My failed query attempt: ------- select p1.*, p2.* from #tmpsales p1 join #tmpsales p2 on p1.employee = p2.employee and p1.tstamp = ( select max(tstamp) from #tmpsales where tstamp < p2.tstamp ) order by p1.employee, p1.tstamp -------- (My query's column selection and order doesn't match the desired output, but you get the idea.) For some reason the query has trouble with the datetime comparisons, unless I do something like use CONVERT to lop off part of the time portion, or make sure the sample data is in even seconds. Also, I suspect this is going to perform very poorly over a huge table, with hundreds of employees joining over thousands of days. Is there a better approach I can use here? stav***@mailinator.com wrote:
> For some reason the query has trouble with the datetime comparisons, Try this. It's untested as I don't have time right now to try it> unless I do something like use CONVERT to lop off part of the time > portion, or make sure the sample data is in even seconds. Also, I against your sample data (thanks for providing that by the way) but I think it should solve your datetime comparison problem. select p1.*, p2.* from #tmpsales p1 join #tmpsales p2 on p2.employee = p1.employee and p2.tstamp = ( select max(tstamp) from #tmpsales where employee = p2.employee and tstamp < p1.tstamp ) order by p1.employee, p1.tstamp Make sure you have an unique index on employee and tstamp since you are joining on them. > suspect this is going to perform very poorly over a huge table, with If your TID's are in chronological order, at least within an employee,> hundreds of employees joining over thousands of days. > > Is there a better approach I can use here? you could do this. SELECT p1.*, p2.* FROM #tmpsales p1 INNER JOIN ( SELECT p1.TID, prevTID = MAX(p2.TID) FROM #tmpsales p1 INNER JOIN #tmpsales p2 ON p2.employee = p1.employee AND p2.tstamp < p1.tstamp GROUP BY p1.TID ) tmp ON tmp.TID = p1.TID INNER JOIN #tmpsales p2 ON p2.TID = tmp.prevTID ORDER BY p1.employee, p1.tstamp This avoids having a correlated sub-query which should improve performance when your table is large. In SQL Server 2005 you can use the OVER() clause to simplify the queries. Chris Thanks, that worked great!
Chris Lim wrote: Show quote > stav***@mailinator.com wrote: > > For some reason the query has trouble with the datetime comparisons, > > unless I do something like use CONVERT to lop off part of the time > > portion, or make sure the sample data is in even seconds. Also, I > > Try this. It's untested as I don't have time right now to try it > against your sample data (thanks for providing that by the way) but I > think it should solve your datetime comparison problem. > > select p1.*, p2.* > from #tmpsales p1 > join #tmpsales p2 > on p2.employee = p1.employee > and p2.tstamp = ( > select max(tstamp) > from #tmpsales > where employee = p2.employee > and tstamp < p1.tstamp > ) > order by p1.employee, p1.tstamp > > Make sure you have an unique index on employee and tstamp since you are > joining on them. > > > suspect this is going to perform very poorly over a huge table, with > > hundreds of employees joining over thousands of days. > > > > Is there a better approach I can use here? > > If your TID's are in chronological order, at least within an employee, > you could do this. > > SELECT p1.*, p2.* > FROM #tmpsales p1 > INNER JOIN ( SELECT p1.TID, prevTID = MAX(p2.TID) > FROM #tmpsales p1 > INNER JOIN #tmpsales p2 > ON p2.employee = p1.employee > AND p2.tstamp < p1.tstamp > GROUP BY p1.TID > ) tmp > ON tmp.TID = p1.TID > INNER JOIN #tmpsales p2 > ON p2.TID = tmp.prevTID > ORDER BY p1.employee, p1.tstamp > > This avoids having a correlated sub-query which should improve > performance when your table is large. > > In SQL Server 2005 you can use the OVER() clause to simplify the > queries. > > Chris You were very close...
SELECT t1.TID , t1.Employee , TStamp_Cur = t1.TStamp , Sales_Cur = t1.Sales , TID_Prev = t2.TID , TStamp_Prev = t2.TStamp , Sales_Prev = t2.Sales FROM #TmpSales t1 JOIN #TmpSales t2 ON ( t1.Employee = t2.Employee AND t2.TStamp = ( SELECT max( TStamp ) FROM #TmpSales WHERE ( Employee = t1.Employee AND TStamp < t1.TStamp ) ) ) ORDER BY t1.Employee , t1.TStamp -- Show quoteArnie Rowland, Ph.D. Westwood Consulting, Inc Most good judgment comes from experience. Most experience comes from bad judgment. - Anonymous <stav***@mailinator.com> wrote in message news:1154730289.226075.308300@p79g2000cwp.googlegroups.com... > I'm trying to write a self-joining query that simply returns records > from a table alongside the previous record (based on a timestamp) for > each employee. I created a small mock-up table and wrote a query that > almost works, but not quite. Here are some details: > > Table: > > CREATE TABLE #TMPSALES ( > TID INT IDENTITY(1,1), > EMPLOYEE VARCHAR(30), > TSTAMP DATETIME, > SALES INT > ) > > Sample table data: > > TID EMPLOYEE TSTAMP SALES > 1 bob 2006-07-01 09:15:08 100 > 2 ed 2006-07-01 09:32:19 200 > 3 jack 2006-07-01 09:41:30 300 > 4 bob 2006-07-01 11:08:51 400 > 5 jack 2006-07-01 11:10:12 500 > 6 ed 2006-07-01 11:15:41 600 > 7 bob 2006-07-02 10:03:14 700 > 8 jack 2006-07-02 10:05:19 800 > 9 ed 2006-07-02 10:34:46 900 > 10 bob 2006-07-03 09:24:13 700 > 11 jack 2006-07-03 10:02:09 800 > 12 ed 2006-07-03 10:06:20 900 > > > Desired query output: > > TID EMP TSTAMP_CUR SALES_CUR TID_PREV TSTAMP_PREV > SALES_PREV > 4 bob 2006-07-01 11:08:51 400 1 2006-07-01 > 09:15:08 100 > 5 jack 2006-07-01 11:10:12 500 3 2006-07-01 > 09:41:30 300 > 6 ed 2006-07-01 11:15:41 600 2 2006-07-01 > 09:32:19 200 > 7 bob 2006-07-02 10:03:14 700 4 2006-07-01 > 11:08:51 400 > 8 jack 2006-07-02 10:05:19 800 5 2006-07-01 > 11:10:12 500 > 9 ed 2006-07-02 10:34:46 900 6 2006-07-01 > 11:15:41 600 > 10 bob 2006-07-03 09:24:13 700 7 2006-07-02 > 10:03:14 700 > 11 jack 2006-07-03 10:02:09 800 8 2006-07-02 > 10:05:19 800 > 12 ed 2006-07-03 10:06:20 900 9 2006-07-02 > 10:34:46 900 > > My failed query attempt: > > ------- > select p1.*, p2.* > from #tmpsales p1 > join #tmpsales p2 > on p1.employee = p2.employee > and p1.tstamp = ( > select max(tstamp) > from #tmpsales > where tstamp < p2.tstamp > ) > order by p1.employee, p1.tstamp > -------- > > (My query's column selection and order doesn't match the desired > output, but you get the idea.) > > For some reason the query has trouble with the datetime comparisons, > unless I do something like use CONVERT to lop off part of the time > portion, or make sure the sample data is in even seconds. Also, I > suspect this is going to perform very poorly over a huge table, with > hundreds of employees joining over thousands of days. > > Is there a better approach I can use here? > >> Is there a better approach I can use here? << Your real problem is that your DDL stinks. You have no idea that rowsare not records, how to name a data element, what a relational key is (never IDENTITY by definition!!) or how to model time in SQL. Or even what a NULL means. Your table mimics a paper list on a clipboard and not an RDBMS. Let's get this right: CREATE TABLE Sales (emp_name VARCHAR(30) NOT NULL, sales_start_time DATETIME DEFAULT CURRENT_TIMESTAMP NOT NULL, sales_end_time DATETIME, -- null means currently open CHECK (sales_start_time, sales_end_time), sales_cnt INTEGER NOT NULL CHECK (sales_cnt > 0), PRIMARY KEY (emp_name, sales_time)); 1) a relational key is never, never, never IDENTITY by definition!! That is how newbies mimic record number or the line numbers on a clipboard -- your case. 2) Remember Zeno and Einstein? Time is always in durations modeled as half-open intervals, so you should have that atomic fact in two scalar (start, end) columns. When your event is not finished, use (<start value>, NULL) 3) Rows and columns are nothing like records and fields. We have constraints, data types and things that to guarantee the data is good. We do not use temp table; we use VIEWs on base tables, derived tables or CTEs. What you have done is called attribute splitting. You took the single fact about an event and split the (start, end) pair across two rows. You could also screw up by splitting the values of an attribute into tables, tables into separate DBs, etc. |
|||||||||||||||||||||||