|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
help with querywant to get only one row based on a date. I tried this: select distinct T.numCompanyId, (select top 1 tblCompanyFranchise.numProgramId ProgramId, tblCompanyFranchise.numCompanyStatusId StatusId, numTaskId, dtTaskCreationDate from tblTasks, tblCompanyFranchise Where tblTasks.numCompanyId = tblCompanyFranchise.numCompanyId AND tblTasks.numCompanyId = T.numCompanyId order by tblCompanyFranchise.ProgramId, tblCompanyFranchise.numCompanyId, dtTaskCreationDate desc) from tblTasks T but I get this error: "Only one expression can be specified in the select list when the subquery is not introduced with EXISTS." I tried with "case when exists" but couldn't figure out how to get that to work either. Any ideas? Thanks, -- Dan D. Please post DDL for your tables, and sample data and show the result you
would want for the sample data you give us. Otherwise we are just guessing at what your table structure is and the result you want. See www.aspfaq.com/5006 for help on how to provide that. Tom Show quote "Dan D." <D***@discussions.microsoft.com> wrote in message news:0A323244-D18A-4D38-B65B-5601A2F3CD6F@microsoft.com... > Using SS2000 SP4. I have a table with many instances of each company id. I > want to get only one row based on a date. I tried this: > select distinct T.numCompanyId, > (select top 1 tblCompanyFranchise.numProgramId ProgramId, > tblCompanyFranchise.numCompanyStatusId StatusId, numTaskId, > dtTaskCreationDate > from tblTasks, tblCompanyFranchise Where tblTasks.numCompanyId = > tblCompanyFranchise.numCompanyId > AND tblTasks.numCompanyId = T.numCompanyId > order by tblCompanyFranchise.ProgramId, tblCompanyFranchise.numCompanyId, > dtTaskCreationDate desc) > from tblTasks T > > but I get this error: "Only one expression can be specified in the select > list when the subquery is not introduced with EXISTS." I tried with "case > when exists" but couldn't figure out how to get that to work either. > > Any ideas? > > Thanks, > -- > Dan D. CREATE TABLE [tblTasks] (
[numTaskId] [decimal](18, 0) NOT NULL , [numUserId] [decimal](18, 0) NULL , [numCompanyId] [decimal](18, 0) NULL , [numProgramId] [decimal](18, 0) NULL , [numFranchiseId] [decimal](18, 0) NULL , [numCorporateId] [decimal](18, 0) NULL , [dtTaskCreationDate] [datetime] NOT NULL , [dtTaskCreationTime] [datetime] NOT NULL , [Priority] [int] NULL ) ON [PRIMARY] GO CREATE TABLE [tblCompanyFranchise] ( [ID] [numeric](18, 0) NOT NULL , [numFranchiseId] [numeric](18, 0) NOT NULL , [numCompanyId] [numeric](18, 0) NOT NULL , [numCompanyStatusId] [numeric](18, 0) NOT NULL , [numProgramId] [numeric](18, 0) NOT NULL , [pinId] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , CONSTRAINT [PK_tblCompanyFranchise] PRIMARY KEY CLUSTERED ( [ID] ) WITH FILLFACTOR = 90 ON [PRIMARY] , CONSTRAINT [CK_tblCompanyFranchise_Pinid] UNIQUE NONCLUSTERED ( [ID] ) WITH FILLFACTOR = 90 ON [PRIMARY] ) ON [PRIMARY] GO In my original email I should have said that I need one row for each company. This is what I came up with. It seems to work. select distinct T.numCompanyId CompanyId, (select top 1 tblTasks.numProgramId from tblTasks where tblTasks.numCompanyId = T.numCompanyId order by dtTaskCreationDate desc) AS ProgramId, (select top 1 tblCompanyFranchise.numCompanyStatusId from tblCompanyFranchise where tblCompanyFranchise.numCompanyId = T.numCompanyId ) AS StatusId, (select top 1 dtTaskCreationDate from tblTasks where tblTasks.numCompanyId = T.numCompanyId order by dtTaskCreationDate desc) AS dtTaskCreationDate, (select top 1 numTaskId from tblTasks where tblTasks.numCompanyId = T.numCompanyId order by dtTaskCreationDate desc) AS numTaskId, (select top 1 tblCompanyFranchise.numFranchiseId from tblCompanyFranchise where tblCompanyFranchise.numCompanyId = T.numCompanyId ) AS StatusId from tblTasks T order by tblCompanyFranchise.ProgramId, tblCompanyFranchise.CompanyId, dtTaskCreationDate desc -- Show quoteDan D. "Tom Cooper" wrote: > Please post DDL for your tables, and sample data and show the result you > would want for the sample data you give us. Otherwise we are just guessing > at what your table structure is and the result you want. See > www.aspfaq.com/5006 for help on how to provide that. > > Tom > > "Dan D." <D***@discussions.microsoft.com> wrote in message > news:0A323244-D18A-4D38-B65B-5601A2F3CD6F@microsoft.com... > > Using SS2000 SP4. I have a table with many instances of each company id. I > > want to get only one row based on a date. I tried this: > > select distinct T.numCompanyId, > > (select top 1 tblCompanyFranchise.numProgramId ProgramId, > > tblCompanyFranchise.numCompanyStatusId StatusId, numTaskId, > > dtTaskCreationDate > > from tblTasks, tblCompanyFranchise Where tblTasks.numCompanyId = > > tblCompanyFranchise.numCompanyId > > AND tblTasks.numCompanyId = T.numCompanyId > > order by tblCompanyFranchise.ProgramId, tblCompanyFranchise.numCompanyId, > > dtTaskCreationDate desc) > > from tblTasks T > > > > but I get this error: "Only one expression can be specified in the select > > list when the subquery is not introduced with EXISTS." I tried with "case > > when exists" but couldn't figure out how to get that to work either. > > > > Any ideas? > > > > Thanks, > > -- > > Dan D. > > > A subquery in the SELECT list can only return one column. The one in
the example tries to return four. Not that it would work as you wish even if there were only one, as there is no correlation between the outer query and the inner query. Without the DDL for the tables it isn't even practical to guess how it can be rewritten. In particular there is no indication what table dtTaskCreationDate comes from, and the whole query is supposed to depend on that column. Roy Harvey Beacon Falls, CT On Tue, 29 Aug 2006 12:57:02 -0700, Dan D. <D***@discussions.microsoft.com> wrote: Show quote >Using SS2000 SP4. I have a table with many instances of each company id. I >want to get only one row based on a date. I tried this: > select distinct T.numCompanyId, > (select top 1 tblCompanyFranchise.numProgramId ProgramId, >tblCompanyFranchise.numCompanyStatusId StatusId, numTaskId, >dtTaskCreationDate >from tblTasks, tblCompanyFranchise Where tblTasks.numCompanyId = >tblCompanyFranchise.numCompanyId >AND tblTasks.numCompanyId = T.numCompanyId >order by tblCompanyFranchise.ProgramId, tblCompanyFranchise.numCompanyId, >dtTaskCreationDate desc) >from tblTasks T > >but I get this error: "Only one expression can be specified in the select >list when the subquery is not introduced with EXISTS." I tried with "case >when exists" but couldn't figure out how to get that to work either. > >Any ideas? > >Thanks, See other post for schemas. I think this works.
select distinct T.numCompanyId CompanyId, (select top 1 tblTasks.numProgramId from tblTasks where tblTasks.numCompanyId = T.numCompanyId order by dtTaskCreationDate desc) AS ProgramId, (select top 1 tblCompanyFranchise.numCompanyStatusId from tblCompanyFranchise where tblCompanyFranchise.numCompanyId = T.numCompanyId ) AS StatusId, (select top 1 dtTaskCreationDate from tblTasks where tblTasks.numCompanyId = T.numCompanyId order by dtTaskCreationDate desc) AS dtTaskCreationDate, (select top 1 numTaskId from tblTasks where tblTasks.numCompanyId = T.numCompanyId order by dtTaskCreationDate desc) AS numTaskId, (select top 1 tblCompanyFranchise.numFranchiseId from tblCompanyFranchise where tblCompanyFranchise.numCompanyId = T.numCompanyId ) AS StatusId from tblTasks T order by tblCompanyFranchise.ProgramId, tblCompanyFranchise.CompanyId, dtTaskCreationDate desc -- Show quoteDan D. "Roy Harvey" wrote: > A subquery in the SELECT list can only return one column. The one in > the example tries to return four. Not that it would work as you wish > even if there were only one, as there is no correlation between the > outer query and the inner query. > > Without the DDL for the tables it isn't even practical to guess how it > can be rewritten. In particular there is no indication what table > dtTaskCreationDate comes from, and the whole query is supposed to > depend on that column. > > Roy Harvey > Beacon Falls, CT > > On Tue, 29 Aug 2006 12:57:02 -0700, Dan D. > <D***@discussions.microsoft.com> wrote: > > >Using SS2000 SP4. I have a table with many instances of each company id. I > >want to get only one row based on a date. I tried this: > > select distinct T.numCompanyId, > > (select top 1 tblCompanyFranchise.numProgramId ProgramId, > >tblCompanyFranchise.numCompanyStatusId StatusId, numTaskId, > >dtTaskCreationDate > >from tblTasks, tblCompanyFranchise Where tblTasks.numCompanyId = > >tblCompanyFranchise.numCompanyId > >AND tblTasks.numCompanyId = T.numCompanyId > >order by tblCompanyFranchise.ProgramId, tblCompanyFranchise.numCompanyId, > >dtTaskCreationDate desc) > >from tblTasks T > > > >but I get this error: "Only one expression can be specified in the select > >list when the subquery is not introduced with EXISTS." I tried with "case > >when exists" but couldn't figure out how to get that to work either. > > > >Any ideas? > > > >Thanks, > Three subqueriesin the SELECT list, all against the same table, all
with the same criteria, generally means that a different approach can be used with less redundancy. Also, using TOP without any ORDER BY, as in two other subqueries, gives a random result from among the set of possible results. I would expect something more along these lines: SELECT T.numCompanyId as CompanyId, T.numProgramId as ProgramId, C.numCompanyStatusId as StatusId, T.dtTaskCreationDate as TaskCreationDate, T.numTaskId as TaskId, C.numFranchiseId as FranchiseId FROM tblCompanyFranchise as C JOIN tblTasks as T ON C.numCompanyId = T.numCompanyId WHERE T.dtTaskCreationDate = (select max(dtTaskCreationDate) from tblTasks as T2 where C.numCompanyId = T2.numCompanyId) This may not be what you need - it is still unclear to me what constitutes a unique row in the input data. Hopefully you will be able to get the idea in any case. One possible change, if this provides more than one row per company: SELECT T.numCompanyId as CompanyId, T.numProgramId as ProgramId, MAX(C.numCompanyStatusId) as StatusId, T.dtTaskCreationDate as TaskCreationDate, T.numTaskId as TaskId, MAX(C.numFranchiseId) as FranchiseId FROM tblCompanyFranchise as C JOIN tblTasks as T ON C.numCompanyId = T.numCompanyId WHERE T.dtTaskCreationDate = (select max(dtTaskCreationDate) from tblTasks as T2 where C.numCompanyId = T2.numCompanyId) GROUP BY T.numCompanyId, T.numProgramId, T.dtTaskCreationDate, T.numTaskId A couple of notes about the table definitions are in order. First, it is generally considered poor practice to prefix column names with their data types, and the same goes for prefixing table names with tbl. Second, looking at the definition of tblCompanyFranchise, the key is the ID column, but I find it hard to believe that there is not a natural key among the other columns, either a single column or combination of columns. In that case the use of the ID column seems redundant, but in any case IF there is a natural key that too should have a unique constraint. Roy Harvey Beacon Falls, CT On Wed, 30 Aug 2006 07:35:02 -0700, Dan D. <D***@discussions.microsoft.com> wrote: Show quote >See other post for schemas. I think this works. > >select distinct T.numCompanyId CompanyId, > (select top 1 tblTasks.numProgramId > from tblTasks where tblTasks.numCompanyId = T.numCompanyId > order by dtTaskCreationDate desc) >AS ProgramId, > (select top 1 tblCompanyFranchise.numCompanyStatusId > from tblCompanyFranchise where tblCompanyFranchise.numCompanyId = >T.numCompanyId ) >AS StatusId, > (select top 1 dtTaskCreationDate > from tblTasks where tblTasks.numCompanyId = T.numCompanyId > order by dtTaskCreationDate desc) >AS dtTaskCreationDate, > (select top 1 numTaskId > from tblTasks where tblTasks.numCompanyId = T.numCompanyId > order by dtTaskCreationDate desc) >AS numTaskId, > (select top 1 tblCompanyFranchise.numFranchiseId > from tblCompanyFranchise where tblCompanyFranchise.numCompanyId = >T.numCompanyId ) >AS StatusId >from tblTasks T >order by tblCompanyFranchise.ProgramId, tblCompanyFranchise.CompanyId, >dtTaskCreationDate desc I'm all for doing it more efficiently if possible. Your queries ran a lot
faster than mine (8 to 10 secs vs 37 secs). They didn't give too many rows. They gave too few rows. Your queries both returned 125269 rows and mine returned 128803. If I just run "select distinct numcompanid from tbltasks" I get 128803 rows. I'm trying to figure out why. There is a one to many relationship between tblCompanyFranchise and tblTasks. The data comes from a telemarketing center. For each contact with a company a row gets added to tblTasks. The purpose of the query is to only get the data for the last contact with the company. Hope that helps. Thanks. -- Show quoteDan D. "Roy Harvey" wrote: > Three subqueriesin the SELECT list, all against the same table, all > with the same criteria, generally means that a different approach can > be used with less redundancy. > > Also, using TOP without any ORDER BY, as in two other subqueries, > gives a random result from among the set of possible results. > > I would expect something more along these lines: > > SELECT T.numCompanyId as CompanyId, > T.numProgramId as ProgramId, > C.numCompanyStatusId as StatusId, > T.dtTaskCreationDate as TaskCreationDate, > T.numTaskId as TaskId, > C.numFranchiseId as FranchiseId > FROM tblCompanyFranchise as C > JOIN tblTasks as T > ON C.numCompanyId = T.numCompanyId > WHERE T.dtTaskCreationDate = > (select max(dtTaskCreationDate) from tblTasks as T2 > where C.numCompanyId = T2.numCompanyId) > > This may not be what you need - it is still unclear to me what > constitutes a unique row in the input data. Hopefully you will be > able to get the idea in any case. One possible change, if this > provides more than one row per company: > > SELECT T.numCompanyId as CompanyId, > T.numProgramId as ProgramId, > MAX(C.numCompanyStatusId) as StatusId, > T.dtTaskCreationDate as TaskCreationDate, > T.numTaskId as TaskId, > MAX(C.numFranchiseId) as FranchiseId > FROM tblCompanyFranchise as C > JOIN tblTasks as T > ON C.numCompanyId = T.numCompanyId > WHERE T.dtTaskCreationDate = > (select max(dtTaskCreationDate) from tblTasks as T2 > where C.numCompanyId = T2.numCompanyId) > GROUP BY T.numCompanyId, T.numProgramId, T.dtTaskCreationDate, > T.numTaskId > > A couple of notes about the table definitions are in order. First, it > is generally considered poor practice to prefix column names with > their data types, and the same goes for prefixing table names with > tbl. Second, looking at the definition of tblCompanyFranchise, the > key is the ID column, but I find it hard to believe that there is not > a natural key among the other columns, either a single column or > combination of columns. In that case the use of the ID column seems > redundant, but in any case IF there is a natural key that too should > have a unique constraint. > > Roy Harvey > Beacon Falls, CT > > On Wed, 30 Aug 2006 07:35:02 -0700, Dan D. > <D***@discussions.microsoft.com> wrote: > > >See other post for schemas. I think this works. > > > >select distinct T.numCompanyId CompanyId, > > (select top 1 tblTasks.numProgramId > > from tblTasks where tblTasks.numCompanyId = T.numCompanyId > > order by dtTaskCreationDate desc) > >AS ProgramId, > > (select top 1 tblCompanyFranchise.numCompanyStatusId > > from tblCompanyFranchise where tblCompanyFranchise.numCompanyId = > >T.numCompanyId ) > >AS StatusId, > > (select top 1 dtTaskCreationDate > > from tblTasks where tblTasks.numCompanyId = T.numCompanyId > > order by dtTaskCreationDate desc) > >AS dtTaskCreationDate, > > (select top 1 numTaskId > > from tblTasks where tblTasks.numCompanyId = T.numCompanyId > > order by dtTaskCreationDate desc) > >AS numTaskId, > > (select top 1 tblCompanyFranchise.numFranchiseId > > from tblCompanyFranchise where tblCompanyFranchise.numCompanyId = > >T.numCompanyId ) > >AS StatusId > >from tblTasks T > >order by tblCompanyFranchise.ProgramId, tblCompanyFranchise.CompanyId, > >dtTaskCreationDate desc > Do you have null values in dtTaskCreationDate or numCompanyId?
Show quote "Dan D." <D***@discussions.microsoft.com> wrote in message news:A339228E-87C2-456F-9921-9BF6DCD30C87@microsoft.com... > I'm all for doing it more efficiently if possible. Your queries ran a lot > faster than mine (8 to 10 secs vs 37 secs). They didn't give too many rows. > They gave too few rows. Your queries both returned 125269 rows and mine > returned 128803. If I just run "select distinct numcompanid from tbltasks" I > get 128803 rows. I'm trying to figure out why. > > There is a one to many relationship between tblCompanyFranchise and > tblTasks. The data comes from a telemarketing center. For each contact with a > company a row gets added to tblTasks. The purpose of the query is to only get > the data for the last contact with the company. Hope that helps. Thanks. > -- > Dan D. > > > "Roy Harvey" wrote: > > > Three subqueriesin the SELECT list, all against the same table, all > > with the same criteria, generally means that a different approach can > > be used with less redundancy. > > > > Also, using TOP without any ORDER BY, as in two other subqueries, > > gives a random result from among the set of possible results. > > > > I would expect something more along these lines: > > > > SELECT T.numCompanyId as CompanyId, > > T.numProgramId as ProgramId, > > C.numCompanyStatusId as StatusId, > > T.dtTaskCreationDate as TaskCreationDate, > > T.numTaskId as TaskId, > > C.numFranchiseId as FranchiseId > > FROM tblCompanyFranchise as C > > JOIN tblTasks as T > > ON C.numCompanyId = T.numCompanyId > > WHERE T.dtTaskCreationDate = > > (select max(dtTaskCreationDate) from tblTasks as T2 > > where C.numCompanyId = T2.numCompanyId) > > > > This may not be what you need - it is still unclear to me what > > constitutes a unique row in the input data. Hopefully you will be > > able to get the idea in any case. One possible change, if this > > provides more than one row per company: > > > > SELECT T.numCompanyId as CompanyId, > > T.numProgramId as ProgramId, > > MAX(C.numCompanyStatusId) as StatusId, > > T.dtTaskCreationDate as TaskCreationDate, > > T.numTaskId as TaskId, > > MAX(C.numFranchiseId) as FranchiseId > > FROM tblCompanyFranchise as C > > JOIN tblTasks as T > > ON C.numCompanyId = T.numCompanyId > > WHERE T.dtTaskCreationDate = > > (select max(dtTaskCreationDate) from tblTasks as T2 > > where C.numCompanyId = T2.numCompanyId) > > GROUP BY T.numCompanyId, T.numProgramId, T.dtTaskCreationDate, > > T.numTaskId > > > > A couple of notes about the table definitions are in order. First, it > > is generally considered poor practice to prefix column names with > > their data types, and the same goes for prefixing table names with > > tbl. Second, looking at the definition of tblCompanyFranchise, the > > key is the ID column, but I find it hard to believe that there is not > > a natural key among the other columns, either a single column or > > combination of columns. In that case the use of the ID column seems > > redundant, but in any case IF there is a natural key that too should > > have a unique constraint. > > > > Roy Harvey > > Beacon Falls, CT > > > > On Wed, 30 Aug 2006 07:35:02 -0700, Dan D. > > <D***@discussions.microsoft.com> wrote: > > > > >See other post for schemas. I think this works. > > > > > >select distinct T.numCompanyId CompanyId, > > > (select top 1 tblTasks.numProgramId > > > from tblTasks where tblTasks.numCompanyId = T.numCompanyId > > > order by dtTaskCreationDate desc) > > >AS ProgramId, > > > (select top 1 tblCompanyFranchise.numCompanyStatusId > > > from tblCompanyFranchise where tblCompanyFranchise.numCompanyId = > > >T.numCompanyId ) > > >AS StatusId, > > > (select top 1 dtTaskCreationDate > > > from tblTasks where tblTasks.numCompanyId = T.numCompanyId > > > order by dtTaskCreationDate desc) > > >AS dtTaskCreationDate, > > > (select top 1 numTaskId > > > from tblTasks where tblTasks.numCompanyId = T.numCompanyId > > > order by dtTaskCreationDate desc) > > >AS numTaskId, > > > (select top 1 tblCompanyFranchise.numFranchiseId > > > from tblCompanyFranchise where tblCompanyFranchise.numCompanyId = > > >T.numCompanyId ) > > >AS StatusId > > >from tblTasks T > > >order by tblCompanyFranchise.ProgramId, tblCompanyFranchise.CompanyId, > > >dtTaskCreationDate desc > > Do you have duplicate values in dtTaskCreationDate for the same company?
And why do you have a dtTaskCreationDate and a dtTaskCreationTime. If you are storing date in one column and time in the other, it is usually better to store both date and time in the same column. Also, in your schema you have a unique constraint name CK_tblCompanyFranchise_Pinid which sounds like you wanted that constraint to ensure that PinId is unique. But, in fact, that constraint is ensuring that ID is unique, and you already know that is true since you have your primary key on ID. Finally, do you have a primary key and a clustered index on tblTasks? You'll want to choose the clustered index very carefully (possibly the primary key, but probably on some other column(s) like numCompanyId) if you have multiple tasks for 125,000+ companies. Tom Show quote "Jim Underwood" <james.underwoodATfallonclinic.com> wrote in message news:eCnkm0EzGHA.2516@TK2MSFTNGP06.phx.gbl... > Do you have null values in dtTaskCreationDate or numCompanyId? > > "Dan D." <D***@discussions.microsoft.com> wrote in message > news:A339228E-87C2-456F-9921-9BF6DCD30C87@microsoft.com... >> I'm all for doing it more efficiently if possible. Your queries ran a lot >> faster than mine (8 to 10 secs vs 37 secs). They didn't give too many > rows. >> They gave too few rows. Your queries both returned 125269 rows and mine >> returned 128803. If I just run "select distinct numcompanid from >> tbltasks" > I >> get 128803 rows. I'm trying to figure out why. >> >> There is a one to many relationship between tblCompanyFranchise and >> tblTasks. The data comes from a telemarketing center. For each contact > with a >> company a row gets added to tblTasks. The purpose of the query is to only > get >> the data for the last contact with the company. Hope that helps. Thanks. >> -- >> Dan D. >> >> >> "Roy Harvey" wrote: >> >> > Three subqueriesin the SELECT list, all against the same table, all >> > with the same criteria, generally means that a different approach can >> > be used with less redundancy. >> > >> > Also, using TOP without any ORDER BY, as in two other subqueries, >> > gives a random result from among the set of possible results. >> > >> > I would expect something more along these lines: >> > >> > SELECT T.numCompanyId as CompanyId, >> > T.numProgramId as ProgramId, >> > C.numCompanyStatusId as StatusId, >> > T.dtTaskCreationDate as TaskCreationDate, >> > T.numTaskId as TaskId, >> > C.numFranchiseId as FranchiseId >> > FROM tblCompanyFranchise as C >> > JOIN tblTasks as T >> > ON C.numCompanyId = T.numCompanyId >> > WHERE T.dtTaskCreationDate = >> > (select max(dtTaskCreationDate) from tblTasks as T2 >> > where C.numCompanyId = T2.numCompanyId) >> > >> > This may not be what you need - it is still unclear to me what >> > constitutes a unique row in the input data. Hopefully you will be >> > able to get the idea in any case. One possible change, if this >> > provides more than one row per company: >> > >> > SELECT T.numCompanyId as CompanyId, >> > T.numProgramId as ProgramId, >> > MAX(C.numCompanyStatusId) as StatusId, >> > T.dtTaskCreationDate as TaskCreationDate, >> > T.numTaskId as TaskId, >> > MAX(C.numFranchiseId) as FranchiseId >> > FROM tblCompanyFranchise as C >> > JOIN tblTasks as T >> > ON C.numCompanyId = T.numCompanyId >> > WHERE T.dtTaskCreationDate = >> > (select max(dtTaskCreationDate) from tblTasks as T2 >> > where C.numCompanyId = T2.numCompanyId) >> > GROUP BY T.numCompanyId, T.numProgramId, T.dtTaskCreationDate, >> > T.numTaskId >> > >> > A couple of notes about the table definitions are in order. First, it >> > is generally considered poor practice to prefix column names with >> > their data types, and the same goes for prefixing table names with >> > tbl. Second, looking at the definition of tblCompanyFranchise, the >> > key is the ID column, but I find it hard to believe that there is not >> > a natural key among the other columns, either a single column or >> > combination of columns. In that case the use of the ID column seems >> > redundant, but in any case IF there is a natural key that too should >> > have a unique constraint. >> > >> > Roy Harvey >> > Beacon Falls, CT >> > >> > On Wed, 30 Aug 2006 07:35:02 -0700, Dan D. >> > <D***@discussions.microsoft.com> wrote: >> > >> > >See other post for schemas. I think this works. >> > > >> > >select distinct T.numCompanyId CompanyId, >> > > (select top 1 tblTasks.numProgramId >> > > from tblTasks where tblTasks.numCompanyId = T.numCompanyId >> > > order by dtTaskCreationDate desc) >> > >AS ProgramId, >> > > (select top 1 tblCompanyFranchise.numCompanyStatusId >> > > from tblCompanyFranchise where tblCompanyFranchise.numCompanyId = >> > >T.numCompanyId ) >> > >AS StatusId, >> > > (select top 1 dtTaskCreationDate >> > > from tblTasks where tblTasks.numCompanyId = T.numCompanyId >> > > order by dtTaskCreationDate desc) >> > >AS dtTaskCreationDate, >> > > (select top 1 numTaskId >> > > from tblTasks where tblTasks.numCompanyId = T.numCompanyId >> > > order by dtTaskCreationDate desc) >> > >AS numTaskId, >> > > (select top 1 tblCompanyFranchise.numFranchiseId >> > > from tblCompanyFranchise where tblCompanyFranchise.numCompanyId = >> > >T.numCompanyId ) >> > >AS StatusId >> > >from tblTasks T >> > >order by tblCompanyFranchise.ProgramId, tblCompanyFranchise.CompanyId, >> > >dtTaskCreationDate desc >> > > > Yes there are duplicate values in dtTaskCreationDate for the same company. I
wouldn't have thought so but its true. As to why a dtTaskCreationDate and a dtTaskCreationTime column, I don't know. We inherited this app and I don't know what the intended purpose of those two columns was supposed to be. The CK_tblCompanyFranchise_Pinid problem is on the list to be fixed. There is a clustered index on numTaskId and numCompanyId (separate indexes) in tblTasks but no primary key. Another thing that will change in the future. -- Show quoteDan D. "Tom Cooper" wrote: > Do you have duplicate values in dtTaskCreationDate for the same company? > And why do you have a dtTaskCreationDate and a dtTaskCreationTime. If you > are storing date in one column and time in the other, it is usually better > to store both date and time in the same column. > > Also, in your schema you have a unique constraint name > CK_tblCompanyFranchise_Pinid which sounds like you wanted that constraint to > ensure that PinId is unique. But, in fact, that constraint is ensuring that > ID is unique, and you already know that is true since you have your primary > key on ID. > > Finally, do you have a primary key and a clustered index on tblTasks? > You'll want to choose the clustered index very carefully (possibly the > primary key, but probably on some other column(s) like numCompanyId) if you > have multiple tasks for 125,000+ companies. > > Tom > > "Jim Underwood" <james.underwoodATfallonclinic.com> wrote in message > news:eCnkm0EzGHA.2516@TK2MSFTNGP06.phx.gbl... > > Do you have null values in dtTaskCreationDate or numCompanyId? > > > > "Dan D." <D***@discussions.microsoft.com> wrote in message > > news:A339228E-87C2-456F-9921-9BF6DCD30C87@microsoft.com... > >> I'm all for doing it more efficiently if possible. Your queries ran a lot > >> faster than mine (8 to 10 secs vs 37 secs). They didn't give too many > > rows. > >> They gave too few rows. Your queries both returned 125269 rows and mine > >> returned 128803. If I just run "select distinct numcompanid from > >> tbltasks" > > I > >> get 128803 rows. I'm trying to figure out why. > >> > >> There is a one to many relationship between tblCompanyFranchise and > >> tblTasks. The data comes from a telemarketing center. For each contact > > with a > >> company a row gets added to tblTasks. The purpose of the query is to only > > get > >> the data for the last contact with the company. Hope that helps. Thanks. > >> -- > >> Dan D. > >> > >> > >> "Roy Harvey" wrote: > >> > >> > Three subqueriesin the SELECT list, all against the same table, all > >> > with the same criteria, generally means that a different approach can > >> > be used with less redundancy. > >> > > >> > Also, using TOP without any ORDER BY, as in two other subqueries, > >> > gives a random result from among the set of possible results. > >> > > >> > I would expect something more along these lines: > >> > > >> > SELECT T.numCompanyId as CompanyId, > >> > T.numProgramId as ProgramId, > >> > C.numCompanyStatusId as StatusId, > >> > T.dtTaskCreationDate as TaskCreationDate, > >> > T.numTaskId as TaskId, > >> > C.numFranchiseId as FranchiseId > >> > FROM tblCompanyFranchise as C > >> > JOIN tblTasks as T > >> > ON C.numCompanyId = T.numCompanyId > >> > WHERE T.dtTaskCreationDate = > >> > (select max(dtTaskCreationDate) from tblTasks as T2 > >> > where C.numCompanyId = T2.numCompanyId) > >> > > >> > This may not be what you need - it is still unclear to me what > >> > constitutes a unique row in the input data. Hopefully you will be > >> > able to get the idea in any case. One possible change, if this > >> > provides more than one row per company: > >> > > >> > SELECT T.numCompanyId as CompanyId, > >> > T.numProgramId as ProgramId, > >> > MAX(C.numCompanyStatusId) as StatusId, > >> > T.dtTaskCreationDate as TaskCreationDate, > >> > T.numTaskId as TaskId, > >> > MAX(C.numFranchiseId) as FranchiseId > >> > FROM tblCompanyFranchise as C > >> > JOIN tblTasks as T > >> > ON C.numCompanyId = T.numCompanyId > >> > WHERE T.dtTaskCreationDate = > >> > (select max(dtTaskCreationDate) from tblTasks as T2 > >> > where C.numCompanyId = T2.numCompanyId) > >> > GROUP BY T.numCompanyId, T.numProgramId, T.dtTaskCreationDate, > >> > T.numTaskId > >> > > >> > A couple of notes about the table definitions are in order. First, it > >> > is generally considered poor practice to prefix column names with > >> > their data types, and the same goes for prefixing table names with > >> > tbl. Second, looking at the definition of tblCompanyFranchise, the > >> > key is the ID column, but I find it hard to believe that there is not > >> > a natural key among the other columns, either a single column or > >> > combination of columns. In that case the use of the ID column seems > >> > redundant, but in any case IF there is a natural key that too should > >> > have a unique constraint. > >> > > >> > Roy Harvey > >> > Beacon Falls, CT > >> > > >> > On Wed, 30 Aug 2006 07:35:02 -0700, Dan D. > >> > <D***@discussions.microsoft.com> wrote: > >> > > >> > >See other post for schemas. I think this works. > >> > > > >> > >select distinct T.numCompanyId CompanyId, > >> > > (select top 1 tblTasks.numProgramId > >> > > from tblTasks where tblTasks.numCompanyId = T.numCompanyId > >> > > order by dtTaskCreationDate desc) > >> > >AS ProgramId, > >> > > (select top 1 tblCompanyFranchise.numCompanyStatusId > >> > > from tblCompanyFranchise where tblCompanyFranchise.numCompanyId = > >> > >T.numCompanyId ) > >> > >AS StatusId, > >> > > (select top 1 dtTaskCreationDate > >> > > from tblTasks where tblTasks.numCompanyId = T.numCompanyId > >> > > order by dtTaskCreationDate desc) > >> > >AS dtTaskCreationDate, > >> > > (select top 1 numTaskId > >> > > from tblTasks where tblTasks.numCompanyId = T.numCompanyId > >> > > order by dtTaskCreationDate desc) > >> > >AS numTaskId, > >> > > (select top 1 tblCompanyFranchise.numFranchiseId > >> > > from tblCompanyFranchise where tblCompanyFranchise.numCompanyId = > >> > >T.numCompanyId ) > >> > >AS StatusId > >> > >from tblTasks T > >> > >order by tblCompanyFranchise.ProgramId, tblCompanyFranchise.CompanyId, > >> > >dtTaskCreationDate desc > >> > > > > > > > > There are no nulls in either column.
-- Show quoteDan D. "Jim Underwood" wrote: > Do you have null values in dtTaskCreationDate or numCompanyId? > > "Dan D." <D***@discussions.microsoft.com> wrote in message > news:A339228E-87C2-456F-9921-9BF6DCD30C87@microsoft.com... > > I'm all for doing it more efficiently if possible. Your queries ran a lot > > faster than mine (8 to 10 secs vs 37 secs). They didn't give too many > rows. > > They gave too few rows. Your queries both returned 125269 rows and mine > > returned 128803. If I just run "select distinct numcompanid from tbltasks" > I > > get 128803 rows. I'm trying to figure out why. > > > > There is a one to many relationship between tblCompanyFranchise and > > tblTasks. The data comes from a telemarketing center. For each contact > with a > > company a row gets added to tblTasks. The purpose of the query is to only > get > > the data for the last contact with the company. Hope that helps. Thanks. > > -- > > Dan D. > > > > > > "Roy Harvey" wrote: > > > > > Three subqueriesin the SELECT list, all against the same table, all > > > with the same criteria, generally means that a different approach can > > > be used with less redundancy. > > > > > > Also, using TOP without any ORDER BY, as in two other subqueries, > > > gives a random result from among the set of possible results. > > > > > > I would expect something more along these lines: > > > > > > SELECT T.numCompanyId as CompanyId, > > > T.numProgramId as ProgramId, > > > C.numCompanyStatusId as StatusId, > > > T.dtTaskCreationDate as TaskCreationDate, > > > T.numTaskId as TaskId, > > > C.numFranchiseId as FranchiseId > > > FROM tblCompanyFranchise as C > > > JOIN tblTasks as T > > > ON C.numCompanyId = T.numCompanyId > > > WHERE T.dtTaskCreationDate = > > > (select max(dtTaskCreationDate) from tblTasks as T2 > > > where C.numCompanyId = T2.numCompanyId) > > > > > > This may not be what you need - it is still unclear to me what > > > constitutes a unique row in the input data. Hopefully you will be > > > able to get the idea in any case. One possible change, if this > > > provides more than one row per company: > > > > > > SELECT T.numCompanyId as CompanyId, > > > T.numProgramId as ProgramId, > > > MAX(C.numCompanyStatusId) as StatusId, > > > T.dtTaskCreationDate as TaskCreationDate, > > > T.numTaskId as TaskId, > > > MAX(C.numFranchiseId) as FranchiseId > > > FROM tblCompanyFranchise as C > > > JOIN tblTasks as T > > > ON C.numCompanyId = T.numCompanyId > > > WHERE T.dtTaskCreationDate = > > > (select max(dtTaskCreationDate) from tblTasks as T2 > > > where C.numCompanyId = T2.numCompanyId) > > > GROUP BY T.numCompanyId, T.numProgramId, T.dtTaskCreationDate, > > > T.numTaskId > > > > > > A couple of notes about the table definitions are in order. First, it > > > is generally considered poor practice to prefix column names with > > > their data types, and the same goes for prefixing table names with > > > tbl. Second, looking at the definition of tblCompanyFranchise, the > > > key is the ID column, but I find it hard to believe that there is not > > > a natural key among the other columns, either a single column or > > > combination of columns. In that case the use of the ID column seems > > > redundant, but in any case IF there is a natural key that too should > > > have a unique constraint. > > > > > > Roy Harvey > > > Beacon Falls, CT > > > > > > On Wed, 30 Aug 2006 07:35:02 -0700, Dan D. > > > <D***@discussions.microsoft.com> wrote: > > > > > > >See other post for schemas. I think this works. > > > > > > > >select distinct T.numCompanyId CompanyId, > > > > (select top 1 tblTasks.numProgramId > > > > from tblTasks where tblTasks.numCompanyId = T.numCompanyId > > > > order by dtTaskCreationDate desc) > > > >AS ProgramId, > > > > (select top 1 tblCompanyFranchise.numCompanyStatusId > > > > from tblCompanyFranchise where tblCompanyFranchise.numCompanyId = > > > >T.numCompanyId ) > > > >AS StatusId, > > > > (select top 1 dtTaskCreationDate > > > > from tblTasks where tblTasks.numCompanyId = T.numCompanyId > > > > order by dtTaskCreationDate desc) > > > >AS dtTaskCreationDate, > > > > (select top 1 numTaskId > > > > from tblTasks where tblTasks.numCompanyId = T.numCompanyId > > > > order by dtTaskCreationDate desc) > > > >AS numTaskId, > > > > (select top 1 tblCompanyFranchise.numFranchiseId > > > > from tblCompanyFranchise where tblCompanyFranchise.numCompanyId = > > > >T.numCompanyId ) > > > >AS StatusId > > > >from tblTasks T > > > >order by tblCompanyFranchise.ProgramId, tblCompanyFranchise.CompanyId, > > > >dtTaskCreationDate desc > > > > > > Try a LEFT OUTER JOIN:
SELECT T.numCompanyId as CompanyId, T.numProgramId as ProgramId, C.numCompanyStatusId as StatusId, T.dtTaskCreationDate as TaskCreationDate, T.numTaskId as TaskId, C.numFranchiseId as FranchiseId FROM tblCompanyFranchise as C LEFT OUTER JOIN tblTasks as T ON C.numCompanyId = T.numCompanyId WHERE T.dtTaskCreationDate = (select max(dtTaskCreationDate) from tblTasks as T2 where C.numCompanyId = T2.numCompanyId) Roy Harvey Beacon Falls, CT On Wed, 30 Aug 2006 08:51:01 -0700, Dan D. <D***@discussions.microsoft.com> wrote: Show quote >I'm all for doing it more efficiently if possible. Your queries ran a lot >faster than mine (8 to 10 secs vs 37 secs). They didn't give too many rows. >They gave too few rows. Your queries both returned 125269 rows and mine >returned 128803. If I just run "select distinct numcompanid from tbltasks" I >get 128803 rows. I'm trying to figure out why. > >There is a one to many relationship between tblCompanyFranchise and >tblTasks. The data comes from a telemarketing center. For each contact with a >company a row gets added to tblTasks. The purpose of the query is to only get >the data for the last contact with the company. Hope that helps. Thanks. >-- >Dan D. Roy,
Won't the where clause negate the left outer join? Shouldn't it be: LEFT OUTER JOIN tblTasks as T ON C.numCompanyId = T.numCompanyId AND T.dtTaskCreationDate = (select max(dtTaskCreationDate) from tblTasks as T2 where C.numCompanyId = T2.numCompanyId) Show quote "Roy Harvey" <roy_har***@snet.net> wrote in message news:ntdbf29uh5p1k42gelm4l3b10echojm8fk@4ax.com... > Try a LEFT OUTER JOIN: > > SELECT T.numCompanyId as CompanyId, > T.numProgramId as ProgramId, > C.numCompanyStatusId as StatusId, > T.dtTaskCreationDate as TaskCreationDate, > T.numTaskId as TaskId, > C.numFranchiseId as FranchiseId > FROM tblCompanyFranchise as C > LEFT OUTER > JOIN tblTasks as T > ON C.numCompanyId = T.numCompanyId > WHERE T.dtTaskCreationDate = > (select max(dtTaskCreationDate) from tblTasks as T2 > where C.numCompanyId = T2.numCompanyId) > > Roy Harvey > Beacon Falls, CT > > On Wed, 30 Aug 2006 08:51:01 -0700, Dan D. > <D***@discussions.microsoft.com> wrote: > > >I'm all for doing it more efficiently if possible. Your queries ran a lot > >faster than mine (8 to 10 secs vs 37 secs). They didn't give too many rows. > >They gave too few rows. Your queries both returned 125269 rows and mine > >returned 128803. If I just run "select distinct numcompanid from tbltasks" I > >get 128803 rows. I'm trying to figure out why. > > > >There is a one to many relationship between tblCompanyFranchise and > >tblTasks. The data comes from a telemarketing center. For each contact with a > >company a row gets added to tblTasks. The purpose of the query is to only get > >the data for the last contact with the company. Hope that helps. Thanks. > >-- > >Dan D. On Wed, 30 Aug 2006 12:32:28 -0400, "Jim Underwood"
<james.underwoodATfallonclinic.com> wrote: >Roy, Yes, you are correct that I blew the WHERE clause. Thanks for>Won't the where clause negate the left outer join? Shouldn't it be: > > LEFT OUTER > JOIN tblTasks as T > ON C.numCompanyId = T.numCompanyId > AND T.dtTaskCreationDate = > (select max(dtTaskCreationDate) from tblTasks as T2 > where C.numCompanyId = T2.numCompanyId) catching that! Roy That returns too many rows - 143380.
-- Show quoteDan D. "Jim Underwood" wrote: > Roy, > Won't the where clause negate the left outer join? Shouldn't it be: > > LEFT OUTER > JOIN tblTasks as T > ON C.numCompanyId = T.numCompanyId > AND T.dtTaskCreationDate = > (select max(dtTaskCreationDate) from tblTasks as T2 > where C.numCompanyId = T2.numCompanyId) > > "Roy Harvey" <roy_har***@snet.net> wrote in message > news:ntdbf29uh5p1k42gelm4l3b10echojm8fk@4ax.com... > > Try a LEFT OUTER JOIN: > > > > SELECT T.numCompanyId as CompanyId, > > T.numProgramId as ProgramId, > > C.numCompanyStatusId as StatusId, > > T.dtTaskCreationDate as TaskCreationDate, > > T.numTaskId as TaskId, > > C.numFranchiseId as FranchiseId > > FROM tblCompanyFranchise as C > > LEFT OUTER > > JOIN tblTasks as T > > ON C.numCompanyId = T.numCompanyId > > WHERE T.dtTaskCreationDate = > > (select max(dtTaskCreationDate) from tblTasks as T2 > > where C.numCompanyId = T2.numCompanyId) > > > > Roy Harvey > > Beacon Falls, CT > > > > On Wed, 30 Aug 2006 08:51:01 -0700, Dan D. > > <D***@discussions.microsoft.com> wrote: > > > > >I'm all for doing it more efficiently if possible. Your queries ran a lot > > >faster than mine (8 to 10 secs vs 37 secs). They didn't give too many > rows. > > >They gave too few rows. Your queries both returned 125269 rows and mine > > >returned 128803. If I just run "select distinct numcompanid from > tbltasks" I > > >get 128803 rows. I'm trying to figure out why. > > > > > >There is a one to many relationship between tblCompanyFranchise and > > >tblTasks. The data comes from a telemarketing center. For each contact > with a > > >company a row gets added to tblTasks. The purpose of the query is to only > get > > >the data for the last contact with the company. Hope that helps. Thanks. > > >-- > > >Dan D. > > > Try these two queries to see where you have duplicates...
SELECT T.numCompanyId as CompanyId, T.numProgramId as ProgramId, T.dtTaskCreationDate as TaskCreationDate, T.numTaskId as TaskId FROM tblTasks as T where exists ( select 1 from tblTasks as T1 where T.numCompanyId = T1.numCompanyId and T.dtTaskCreationDate = T1.dtTaskCreationDate having count(1) > 1 ); SELECT C.numCompanyStatusId as StatusId, C.numFranchiseId as FranchiseId FROM tblCompanyFranchise as C where exists ( select 1 from tblCompanyFranchise as C1 where C.numCompanyId = C1.numCompanyId having count(1) > 1 ); Also, try adding dtTaskCreationTime to the join and see if it gets the right number of rows... SELECT T.numCompanyId as CompanyId, T.numProgramId as ProgramId, C.numCompanyStatusId as StatusId, T.dtTaskCreationDate as TaskCreationDate, T.numTaskId as TaskId, C.numFranchiseId as FranchiseId FROM tblCompanyFranchise as C LEFT OUTER JOIN tblTasks as T ON C.numCompanyId = T.numCompanyId AND T.dtTaskCreationDate = (select max(dtTaskCreationDate) from tblTasks as T2 where C.numCompanyId = T2.numCompanyId) AND T.dtTaskCreationTime = (select max(dtTaskCreationTime) from tblTasks as T2 where C.numCompanyId = T2.numCompanyId and T2.dtTaskCreationDate = T.dtTaskCreationDate ) Show quote "Dan D." <D***@discussions.microsoft.com> wrote in message news:5ED843C2-9CC1-4C02-99DD-AF2B6C8AA8A9@microsoft.com... > That returns too many rows - 143380. > -- > Dan D. > > > "Jim Underwood" wrote: > > > Roy, > > Won't the where clause negate the left outer join? Shouldn't it be: > > > > LEFT OUTER > > JOIN tblTasks as T > > ON C.numCompanyId = T.numCompanyId > > AND T.dtTaskCreationDate = > > (select max(dtTaskCreationDate) from tblTasks as T2 > > where C.numCompanyId = T2.numCompanyId) > > > > "Roy Harvey" <roy_har***@snet.net> wrote in message > > news:ntdbf29uh5p1k42gelm4l3b10echojm8fk@4ax.com... > > > Try a LEFT OUTER JOIN: > > > > > > SELECT T.numCompanyId as CompanyId, > > > T.numProgramId as ProgramId, > > > C.numCompanyStatusId as StatusId, > > > T.dtTaskCreationDate as TaskCreationDate, > > > T.numTaskId as TaskId, > > > C.numFranchiseId as FranchiseId > > > FROM tblCompanyFranchise as C > > > LEFT OUTER > > > JOIN tblTasks as T > > > ON C.numCompanyId = T.numCompanyId > > > WHERE T.dtTaskCreationDate = > > > (select max(dtTaskCreationDate) from tblTasks as T2 > > > where C.numCompanyId = T2.numCompanyId) > > > > > > Roy Harvey > > > Beacon Falls, CT > > > > > > On Wed, 30 Aug 2006 08:51:01 -0700, Dan D. > > > <D***@discussions.microsoft.com> wrote: > > > > > > >I'm all for doing it more efficiently if possible. Your queries ran a lot > > > >faster than mine (8 to 10 secs vs 37 secs). They didn't give too many > > rows. > > > >They gave too few rows. Your queries both returned 125269 rows and mine > > > >returned 128803. If I just run "select distinct numcompanid from > > tbltasks" I > > > >get 128803 rows. I'm trying to figure out why. > > > > > > > >There is a one to many relationship between tblCompanyFranchise and > > > >tblTasks. The data comes from a telemarketing center. For each contact > > with a > > > >company a row gets added to tblTasks. The purpose of the query is to only > > get > > > >the data for the last contact with the company. Hope that helps. Thanks. > > > >-- > > > >Dan D. > > > > > > The first query returns 5046 and the second one zero. I had done a slightly
different query earlier and I think there were 1600+ companies that had at least one dupe. We discovered a batch process that was timing out sometimes and the user would try again and apparently some of the rows would get inserted before it timed out and then they would get inserted again. We're working on fixing that process now and then we'll have to clean up the data. The third query returned 143380 rows. -- Show quoteDan D. "Jim Underwood" wrote: > Try these two queries to see where you have duplicates... > > SELECT T.numCompanyId as CompanyId, > T.numProgramId as ProgramId, > T.dtTaskCreationDate as TaskCreationDate, > T.numTaskId as TaskId > FROM tblTasks as T > where exists > ( > select 1 > from tblTasks as T1 > where T.numCompanyId = T1.numCompanyId > and T.dtTaskCreationDate = T1.dtTaskCreationDate > having count(1) > 1 > ); > > SELECT C.numCompanyStatusId as StatusId, > C.numFranchiseId as FranchiseId > FROM tblCompanyFranchise as C > where exists > ( > select 1 > from tblCompanyFranchise as C1 > where C.numCompanyId = C1.numCompanyId > having count(1) > 1 > ); > > Also, try adding dtTaskCreationTime to the join and see if it gets the right > number of rows... > > SELECT T.numCompanyId as CompanyId, > T.numProgramId as ProgramId, > C.numCompanyStatusId as StatusId, > T.dtTaskCreationDate as TaskCreationDate, > T.numTaskId as TaskId, > C.numFranchiseId as FranchiseId > FROM tblCompanyFranchise as C > LEFT OUTER > JOIN tblTasks as T > ON C.numCompanyId = T.numCompanyId > AND T.dtTaskCreationDate = > (select max(dtTaskCreationDate) from tblTasks as T2 > where C.numCompanyId = T2.numCompanyId) > > AND T.dtTaskCreationTime = > (select max(dtTaskCreationTime) from tblTasks as T2 > where C.numCompanyId = T2.numCompanyId > and T2.dtTaskCreationDate = T.dtTaskCreationDate ) > > "Dan D." <D***@discussions.microsoft.com> wrote in message > news:5ED843C2-9CC1-4C02-99DD-AF2B6C8AA8A9@microsoft.com... > > That returns too many rows - 143380. > > -- > > Dan D. > > > > > > "Jim Underwood" wrote: > > > > > Roy, > > > Won't the where clause negate the left outer join? Shouldn't it be: > > > > > > LEFT OUTER > > > JOIN tblTasks as T > > > ON C.numCompanyId = T.numCompanyId > > > AND T.dtTaskCreationDate = > > > (select max(dtTaskCreationDate) from tblTasks as T2 > > > where C.numCompanyId = T2.numCompanyId) > > > > > > "Roy Harvey" <roy_har***@snet.net> wrote in message > > > news:ntdbf29uh5p1k42gelm4l3b10echojm8fk@4ax.com... > > > > Try a LEFT OUTER JOIN: > > > > > > > > SELECT T.numCompanyId as CompanyId, > > > > T.numProgramId as ProgramId, > > > > C.numCompanyStatusId as StatusId, > > > > T.dtTaskCreationDate as TaskCreationDate, > > > > T.numTaskId as TaskId, > > > > C.numFranchiseId as FranchiseId > > > > FROM tblCompanyFranchise as C > > > > LEFT OUTER > > > > JOIN tblTasks as T > > > > ON C.numCompanyId = T.numCompanyId > > > > WHERE T.dtTaskCreationDate = > > > > (select max(dtTaskCreationDate) from tblTasks as T2 > > > > where C.numCompanyId = T2.numCompanyId) > > > > > > > > Roy Harvey > > > > Beacon Falls, CT > > > > > > > > On Wed, 30 Aug 2006 08:51:01 -0700, Dan D. > > > > <D***@discussions.microsoft.com> wrote: > > > > > > > > >I'm all for doing it more efficiently if possible. Your queries ran a > lot > > > > >faster than mine (8 to 10 secs vs 37 secs). They didn't give too many > > > rows. > > > > >They gave too few rows. Your queries both returned 125269 rows and > mine > > > > >returned 128803. If I just run "select distinct numcompanid from > > > tbltasks" I > > > > >get 128803 rows. I'm trying to figure out why. > > > > > > > > > >There is a one to many relationship between tblCompanyFranchise and > > > > >tblTasks. The data comes from a telemarketing center. For each > contact > > > with a > > > > >company a row gets added to tblTasks. The purpose of the query is to > only > > > get > > > > >the data for the last contact with the company. Hope that helps. > Thanks. > > > > >-- > > > > >Dan D. > > > > > > > > > > > > A primary key / unique constraint on the proper columns will make it
impossible for duplicates to be inserted. Of course the data needs to be cleaned up first, and you will want to build error handling into your application so the users get a friendly message when this happens (assuming you don't code the application to prevent it as well).. Out of curiosity, do you have any rows in tblTasks where TaskCreationDate <> TaskCreationTime? Show quote "Dan D." <D***@discussions.microsoft.com> wrote in message news:479B64B7-FFF7-47B3-A0D5-2E17D24AEB10@microsoft.com... > The first query returns 5046 and the second one zero. I had done a slightly > different query earlier and I think there were 1600+ companies that had at > least one dupe. We discovered a batch process that was timing out sometimes > and the user would try again and apparently some of the rows would get > inserted before it timed out and then they would get inserted again. We're > working on fixing that process now and then we'll have to clean up the data. > > The third query returned 143380 rows. > -- > Dan D. > > > "Jim Underwood" wrote: > > > Try these two queries to see where you have duplicates... > > > > SELECT T.numCompanyId as CompanyId, > > T.numProgramId as ProgramId, > > T.dtTaskCreationDate as TaskCreationDate, > > T.numTaskId as TaskId > > FROM tblTasks as T > > where exists > > ( > > select 1 > > from tblTasks as T1 > > where T.numCompanyId = T1.numCompanyId > > and T.dtTaskCreationDate = T1.dtTaskCreationDate > > having count(1) > 1 > > ); > > > > SELECT C.numCompanyStatusId as StatusId, > > C.numFranchiseId as FranchiseId > > FROM tblCompanyFranchise as C > > where exists > > ( > > select 1 > > from tblCompanyFranchise as C1 > > where C.numCompanyId = C1.numCompanyId > > having count(1) > 1 > > ); > > > > Also, try adding dtTaskCreationTime to the join and see if it gets the right > > number of rows... > > > > SELECT T.numCompanyId as CompanyId, > > T.numProgramId as ProgramId, > > C.numCompanyStatusId as StatusId, > > T.dtTaskCreationDate as TaskCreationDate, > > T.numTaskId as TaskId, > > C.numFranchiseId as FranchiseId > > FROM tblCompanyFranchise as C > > LEFT OUTER > > JOIN tblTasks as T > > ON C.numCompanyId = T.numCompanyId > > AND T.dtTaskCreationDate = > > (select max(dtTaskCreationDate) from tblTasks as T2 > > where C.numCompanyId = T2.numCompanyId) > > > > AND T.dtTaskCreationTime = > > (select max(dtTaskCreationTime) from tblTasks as T2 > > where C.numCompanyId = T2.numCompanyId > > and T2.dtTaskCreationDate = T.dtTaskCreationDate ) > > > > "Dan D." <D***@discussions.microsoft.com> wrote in message > > news:5ED843C2-9CC1-4C02-99DD-AF2B6C8AA8A9@microsoft.com... > > > That returns too many rows - 143380. > > > -- > > > Dan D. > > > > > > > > > "Jim Underwood" wrote: > > > > > > > Roy, > > > > Won't the where clause negate the left outer join? Shouldn't it be: > > > > > > > > LEFT OUTER > > > > JOIN tblTasks as T > > > > ON C.numCompanyId = T.numCompanyId > > > > AND T.dtTaskCreationDate = > > > > (select max(dtTaskCreationDate) from tblTasks as T2 > > > > where C.numCompanyId = T2.numCompanyId) > > > > > > > > "Roy Harvey" <roy_har***@snet.net> wrote in message > > > > news:ntdbf29uh5p1k42gelm4l3b10echojm8fk@4ax.com... > > > > > Try a LEFT OUTER JOIN: > > > > > > > > > > SELECT T.numCompanyId as CompanyId, > > > > > T.numProgramId as ProgramId, > > > > > C.numCompanyStatusId as StatusId, > > > > > T.dtTaskCreationDate as TaskCreationDate, > > > > > T.numTaskId as TaskId, > > > > > C.numFranchiseId as FranchiseId > > > > > FROM tblCompanyFranchise as C > > > > > LEFT OUTER > > > > > JOIN tblTasks as T > > > > > ON C.numCompanyId = T.numCompanyId > > > > > WHERE T.dtTaskCreationDate = > > > > > (select max(dtTaskCreationDate) from tblTasks as T2 > > > > > where C.numCompanyId = T2.numCompanyId) > > > > > > > > > > Roy Harvey > > > > > Beacon Falls, CT > > > > > > > > > > On Wed, 30 Aug 2006 08:51:01 -0700, Dan D. > > > > > <D***@discussions.microsoft.com> wrote: > > > > > > > > > > >I'm all for doing it more efficiently if possible. Your queries ran a > > lot > > > > > >faster than mine (8 to 10 secs vs 37 secs). They didn't give too many > > > > rows. > > > > > >They gave too few rows. Your queries both returned 125269 rows and > > mine > > > > > >returned 128803. If I just run "select distinct numcompanid from > > > > tbltasks" I > > > > > >get 128803 rows. I'm trying to figure out why. > > > > > > > > > > > >There is a one to many relationship between tblCompanyFranchise and > > > > > >tblTasks. The data comes from a telemarketing center. For each > > contact > > > > with a > > > > > >company a row gets added to tblTasks. The purpose of the query is to > > only > > > > get > > > > > >the data for the last contact with the company. Hope that helps. > > Thanks. > > > > > >-- > > > > > >Dan D. > > > > > > > > > > > > > > > > > > That's as fast as your other queries but still returns 125269 rows.
-- Show quoteDan D. "Roy Harvey" wrote: > Try a LEFT OUTER JOIN: > > SELECT T.numCompanyId as CompanyId, > T.numProgramId as ProgramId, > C.numCompanyStatusId as StatusId, > T.dtTaskCreationDate as TaskCreationDate, > T.numTaskId as TaskId, > C.numFranchiseId as FranchiseId > FROM tblCompanyFranchise as C > LEFT OUTER > JOIN tblTasks as T > ON C.numCompanyId = T.numCompanyId > WHERE T.dtTaskCreationDate = > (select max(dtTaskCreationDate) from tblTasks as T2 > where C.numCompanyId = T2.numCompanyId) > > Roy Harvey > Beacon Falls, CT > > On Wed, 30 Aug 2006 08:51:01 -0700, Dan D. > <D***@discussions.microsoft.com> wrote: > > >I'm all for doing it more efficiently if possible. Your queries ran a lot > >faster than mine (8 to 10 secs vs 37 secs). They didn't give too many rows. > >They gave too few rows. Your queries both returned 125269 rows and mine > >returned 128803. If I just run "select distinct numcompanid from tbltasks" I > >get 128803 rows. I'm trying to figure out why. > > > >There is a one to many relationship between tblCompanyFranchise and > >tblTasks. The data comes from a telemarketing center. For each contact with a > >company a row gets added to tblTasks. The purpose of the query is to only get > >the data for the last contact with the company. Hope that helps. Thanks. > >-- > >Dan D. > On Wed, 30 Aug 2006 10:45:01 -0700, Dan D.
<D***@discussions.microsoft.com> wrote: >That's as fast as your other queries but still returns 125269 rows. Please run:SELECT COUNT(distinct numcompanid) FROM tblCompanyFranchise SELECT COUNT(distinct numcompanid) FROM tbltasks SELECT numcompanid FROM tbltasks as T WHERE NOT EXISTS (select * from tblCompanyFranchise as C where T.numcompanid = C.numcompanid) and post the results. Roy Harvey Beacon Falls, CT 143366
128803 8092 - 3548 unique companies So there are quite a few companies with no rows in tblTasks and quite a few companies that don't even exist.:) -- Show quoteDan D. "Roy Harvey" wrote: > On Wed, 30 Aug 2006 10:45:01 -0700, Dan D. > <D***@discussions.microsoft.com> wrote: > > >That's as fast as your other queries but still returns 125269 rows. > > Please run: > > SELECT COUNT(distinct numcompanid) FROM tblCompanyFranchise > SELECT COUNT(distinct numcompanid) FROM tbltasks > > SELECT numcompanid > FROM tbltasks as T > WHERE NOT EXISTS > (select * from tblCompanyFranchise as C > where T.numcompanid = C.numcompanid) > > and post the results. > > Roy Harvey > Beacon Falls, CT > On Wed, 30 Aug 2006 11:56:01 -0700, Dan D.
<D***@discussions.microsoft.com> wrote: >143366 Sounds like you have some decisions to make about just how the>128803 >8092 - 3548 unique companies > >So there are quite a few companies with no rows in tblTasks and quite a few >companies that don't even exist.:) contradictions in the data need to be treated in your query. Roy Yes we do. Thanks for your help Roy.
-- Show quoteDan D. "Roy Harvey" wrote: > On Wed, 30 Aug 2006 11:56:01 -0700, Dan D. > <D***@discussions.microsoft.com> wrote: > > >143366 > >128803 > >8092 - 3548 unique companies > > > >So there are quite a few companies with no rows in tblTasks and quite a few > >companies that don't even exist.:) > > Sounds like you have some decisions to make about just how the > contradictions in the data need to be treated in your query. > > Roy > Someone who knows that data better than I do, said that even though the
company is removed from tblCompanyFranchise, which means the company is no longer associated with a franchise, the company still exists in tblCompany and is shown as inactive. -- Show quoteDan D. "Roy Harvey" wrote: > On Wed, 30 Aug 2006 10:45:01 -0700, Dan D. > <D***@discussions.microsoft.com> wrote: > > >That's as fast as your other queries but still returns 125269 rows. > > Please run: > > SELECT COUNT(distinct numcompanid) FROM tblCompanyFranchise > SELECT COUNT(distinct numcompanid) FROM tbltasks > > SELECT numcompanid > FROM tbltasks as T > WHERE NOT EXISTS > (select * from tblCompanyFranchise as C > where T.numcompanid = C.numcompanid) > > and post the results. > > Roy Harvey > Beacon Falls, CT > So I guess that could explain the difference between my count of 128803 and
your count of 125269. Your count is only for those companies that exist in tblCompanyFranchise and my count would include companies that are not in tblCompanyFranchise. -- Show quoteDan D. "Roy Harvey" wrote: > On Wed, 30 Aug 2006 10:45:01 -0700, Dan D. > <D***@discussions.microsoft.com> wrote: > > >That's as fast as your other queries but still returns 125269 rows. > > Please run: > > SELECT COUNT(distinct numcompanid) FROM tblCompanyFranchise > SELECT COUNT(distinct numcompanid) FROM tbltasks > > SELECT numcompanid > FROM tbltasks as T > WHERE NOT EXISTS > (select * from tblCompanyFranchise as C > where T.numcompanid = C.numcompanid) > > and post the results. > > Roy Harvey > Beacon Falls, CT > I double checked the StatusId column in my query and there are 3548 nulls. So
those are the companies that don't exist in tblCompanyFranchise. I'll use your query. It's also faster. Thanks Roy. -- Show quoteDan D. "Roy Harvey" wrote: > On Wed, 30 Aug 2006 10:45:01 -0700, Dan D. > <D***@discussions.microsoft.com> wrote: > > >That's as fast as your other queries but still returns 125269 rows. > > Please run: > > SELECT COUNT(distinct numcompanid) FROM tblCompanyFranchise > SELECT COUNT(distinct numcompanid) FROM tbltasks > > SELECT numcompanid > FROM tbltasks as T > WHERE NOT EXISTS > (select * from tblCompanyFranchise as C > where T.numcompanid = C.numcompanid) > > and post the results. > > Roy Harvey > Beacon Falls, CT > |
|||||||||||||||||||||||