|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
300 choose 10 - doing math with SQL Server, need some help...I have a question that may be more Math related than SQL Server related. But maybe it's a simple thing & I'm missing it 'cause I'm trying too hard... Here's the scenario: I have 300 products in a table. I also have last year's sales figures, and this year's to-date sales figures. I want to figure out which group of 10 products have the highest year-to-date sales AND last year's sales combined to less than 10,000. I have written a program that basically uses Where statements to iterate through the 300 items, 10 times, calculating the best sales for this year where last years sales are less than 10,000. The problem is that there are 300 * 299 * 298 * 297 * 296 * 295 * 294 * 293 * 292 * 291 different combinations of products. That's a big freakin' number! I've put in some minor checks to help speed things up slightly - things like breaking where last years total is more than 10,000, etc. But she's still gonna take about 42,000 years to get my result. So what am I missing? What's the best way to accomplish what I'm trying to do? Any help would be appreciated... Thanks, Greg Greg,
Could you post some minimal DDL for us to see? It could be a simple solution .... > I've put in some minor checks to help speed things up slightly - things like As long as your billing for the processing time, I'm not sure I see the > breaking where last years total is more than 10,000, etc. But she's still > gonna take about 42,000 years to get my result. problem with that ;-). -- Alex OK - here's what I've got... I've taken out half the code to reduce the post
- this assumes I'm only looking for the best THREE products. In fact, I'm looking for ten... I have a three user-defined functions that do basically the same thing - look up a value in a table & return it. CREATE FUNCTION LookupLastYear(@ProdID int) RETURNS money AS BEGIN DECLARE @out money SET @out = (SELECT LastYear FROM tblCurrentStats WHERE Prod=@ProdID) RETURN @out END The code as it stands: (& if I screwed up with the flow by removing a huge chunk of code, forgive me...) (As an aside, I'm sure this could have been done using recursion, but I'm not sure how... Any tips would be appreciated...) Thanks in advance, Greg The Code: CREATE PROCEDURE USP_Figure_Out_Best_Team AS DECLARE @Prod1 int DECLARE @Prod2 int DECLARE @Prod3 int DECLARE @LastYear1 money DECLARE @LastYear2 money DECLARE @LastYear3 money DECLARE @ThisYear1 money DECLARE @ThisYear2 money DECLARE @ThisYear3 money DECLARE @BestProd1 int DECLARE @BestProd2 int DECLARE @BestProd3 int DECLARE @BestTotal money DECLARE @Best2004 money DECLARE @TempTotal money DECLARE @TempLastYear money SET @BestTotal = 0 SET @Best2004 = 0 SET @TempTotal = 0 SET @TempLastYear = 0 DECLARE @Counter1 int DECLARE @Counter2 int DECLARE @Counter3 int set @Counter1 = 1 while @Counter1 <=297 begin SET @ThisYear1 = dbo.lookupthisyear(@counter1) SET @LastYear1 = dbo.lookuplastyear(@counter1) SET @TempTotal = @ThisYear1 SET @TempLastYear = @LastYear1 PRINT '1 ' + CONVERT(varchar, @Counter1) IF @TempLastYear <=10000 BEGIN SET @Counter2 = @Counter1 + 1 WHILE @Counter2 <=297 BEGIN SET @ThisYear2 = dbo.lookupthisyear(@counter2) SET @LastYear2 = dbo.lookuplastyear(@counter2) SET @TempTotal = @TempTotal + @ThisYear2 SET @TempLastYear = @TempLastYear + @LastYear2 PRINT '2: ' + CONVERT(varchar, @Counter2) IF @TempLastYear <=10000 BEGIN SET @Counter3 = @Counter2 + 1 WHILE @Counter3 <=297 BEGIN SET @ThisYear3 = dbo.lookupthisyear(@counter3) SET @LastYear3 = dbo.lookuplastyear(@counter3) SET @TempTotal = @TempTotal + @ThisYear3 SET @TempLastYear = @TempLastYear + @LastYear3 PRINT '3: ' + CONVERT(varchar, @Counter3) IF @TempLastYear <=10000 BEGIN if @temptotal > @BestTotal AND @TempLastYear <=10000 BEGIN PRINT '' PRINT 'New Best' PRINT dbo.lookupProd(@Counter1) PRINT dbo.lookupProd(@Counter2) PRINT dbo.lookupProd(@Counter3) PRINT '2004 Total:' + CONVERT(varchar,@TempLastYear) PRINT '2005 Total:' + CONVERT(varchar,@TempTotal) SET @BestProd1 = @Counter1 SET @BestProd2 = @Counter2 SET @BestProd3 = @Counter3 SET @BestTotal = @TempTotal SET @Best2004 = @TempLastYear INSERT INTO tblOptimal VALUES (GETDATE(), dbo.lookupProd(@Counter1), dbo.lookupProd(@Counter2), dbo.lookupProd(@Counter3), @Best2004, @BestTotal) END SET @Counter3 = @Counter3 + 1 SET @TempTotal = @TempTotal - @ThisYear3 SET @TempLastYear = @TempLastYear - @LastYear3 END END SET @Counter2 = @Counter2 + 1 SET @TempTotal = @TempTotal - @ThisYear2 SET @TempLastYear = @TempLastYear - @LastYear2 end END SET @Counter1 = @Counter1 + 1 end PRINT 'Optimal:' PRINT dbo.lookupProd(@BestProd1) PRINT dbo.lookupProd(@BestProd2) PRINT dbo.lookupProd(@BestProd3) PRINT '2004: ' + CONVERT(varchar, @Best2004) PRINT '2005: ' + CONVERT(varchar,@BestTotal) GO Show quote "Alex Papadimoulis" wrote: > Greg, > > Could you post some minimal DDL for us to see? It could be a simple solution > ... > > > I've put in some minor checks to help speed things up slightly - things like > > breaking where last years total is more than 10,000, etc. But she's still > > gonna take about 42,000 years to get my result. > > As long as your billing for the processing time, I'm not sure I see the > problem with that ;-). > > -- Alex I'm still having a hard time understanding your schema ... post DDL for the
tables. BUT, from what I can tell so far, it looks selecting a single value at a time, comparing, and moving to the next row -- three levels deep. Have you tried doing a JOIN? It would seem like a join would solve this and let the DB optimize ... -- Alex Show quote "Alex Papadimoulis" wrote: > Greg, > > Could you post some minimal DDL for us to see? It could be a simple solution > ... > > > I've put in some minor checks to help speed things up slightly - things like > > breaking where last years total is more than 10,000, etc. But she's still > > gonna take about 42,000 years to get my result. > > As long as your billing for the processing time, I'm not sure I see the > problem with that ;-). > > -- Alex If I understand you correctly, why not just
Select Top 10 ProductID From (Select ProductID, Sum(SalesAmt) TotalSales From SalesTable Where SalesDate >= '20040101' Group By ProductID) Where TotalSales < 10000 Order By TotalSales Desc Show quote "Greg Toronto" wrote: > Hey gang, > > I have a question that may be more Math related than SQL Server related. > But maybe it's a simple thing & I'm missing it 'cause I'm trying too hard... > > Here's the scenario: I have 300 products in a table. I also have last > year's sales figures, and this year's to-date sales figures. I want to > figure out which group of 10 products have the highest year-to-date sales AND > last year's sales combined to less than 10,000. > > I have written a program that basically uses Where statements to iterate > through the 300 items, 10 times, calculating the best sales for this year > where last years sales are less than 10,000. > > The problem is that there are 300 * 299 * 298 * 297 * 296 * 295 * 294 * 293 > * 292 * 291 different combinations of products. That's a big freakin' number! > > I've put in some minor checks to help speed things up slightly - things like > breaking where last years total is more than 10,000, etc. But she's still > gonna take about 42,000 years to get my result. > > So what am I missing? What's the best way to accomplish what I'm trying to > do? > > Any help would be appreciated... > > Thanks, > > Greg Greg,
Please post DDL, sample data and expected results, as documented in http://www.aspfaq.com/etiquette.asp?id=5006 I will assume some simple DDL and sample data: CREATE TABLE SalesTotals ( ProductID int, TheYear smallint, Total money NOT NULL, PRIMARY KEY (TheYear, ProductID) ) INSERT INTO SalesTotals VALUES (1, 2004, 1000) INSERT INTO SalesTotals VALUES (2, 2004, 2000) INSERT INTO SalesTotals VALUES (3, 2004, 3000) INSERT INTO SalesTotals VALUES (4, 2004, 4000) INSERT INTO SalesTotals VALUES (5, 2004, 5000) INSERT INTO SalesTotals VALUES (6, 2004, 500) INSERT INTO SalesTotals VALUES (7, 2004, 3500) INSERT INTO SalesTotals VALUES (8, 2004, 4000) INSERT INTO SalesTotals VALUES (9, 2004, 5500) INSERT INTO SalesTotals VALUES (10,2004, 5000) INSERT INTO SalesTotals VALUES (1, 2005,11000) INSERT INTO SalesTotals VALUES (2, 2005, 7500) INSERT INTO SalesTotals VALUES (3, 2005, 5000) INSERT INTO SalesTotals VALUES (4, 2005,16000) INSERT INTO SalesTotals VALUES (5, 2005,17000) INSERT INTO SalesTotals VALUES (6, 2005, 2500) INSERT INTO SalesTotals VALUES (7, 2005,23500) INSERT INTO SalesTotals VALUES (8, 2005,25500) INSERT INTO SalesTotals VALUES (9, 2005,11500) INSERT INTO SalesTotals VALUES (10,2005,27500) One solution is: SELECT TOP 1 A1.ProductID, A2.ProductID, A3.ProductID, A1.Total+A2.Total+A3.Total as Total2005, B1.Total+B2.Total+B3.Total as Total2004 FROM SalesTotals A1, SalesTotals A2, SalesTotals A3, SalesTotals B1, SalesTotals B2, SalesTotals B3 WHERE A1.ProductID=B1.ProductID AND A2.ProductID=B2.ProductID AND A3.ProductID=B3.ProductID AND A1.ProductID<A2.ProductID AND A2.ProductID<A3.ProductID AND A1.TheYear=2005 AND A2.TheYear=2005 AND A3.TheYear=2005 AND B1.TheYear=2004 AND B2.TheYear=2004 AND B3.TheYear=2004 AND B1.Total+B2.Total+B3.Total<10000 ORDER BY Total2005 DESC For 3 products out of 10, this works instantaneous, but I would not dare to think how much time and memory it would require for 10 products out of 300. The way you put it, this is the kind of problem that cannot be computed in a reasonable time (this class of problems is called "NP-complete", and this problem is called "knapsack problem"). The best thing you could do, is change the requirements, for example: a) say "last year's sales for each product are less than 2,000" instead of "the combined last year's sales are less than 10,000" b) say "among the highest" instead of "highest" For the second option, one algorithm that works in a reasonable time is: DECLARE @Temp1 TABLE ( RowNum int NOT NULL IDENTITY UNIQUE, ProductID int PRIMARY KEY, Value2004 money NOT NULL, Value2005 money NOT NULL ) DECLARE @Temp2 TABLE ( ProductID int PRIMARY KEY ) INSERT INTO @Temp1 (ProductID, Value2004, Value2005) SELECT A.ProductID, B.Total, A.Total FROM SalesTotals A INNER JOIN SalesTotals B ON A.ProductID=B.ProductID AND A.TheYear=2005 AND B.TheYear=2004 ORDER BY A.Total-B.Total DESC, B.Total ASC DECLARE @P int, @N tinyint, @S money SET @N=0 SET @S=0 WHILE @N<3 BEGIN SELECT TOP 1 @P=ProductID, @N=@N+1, @S=@S+Value2004 FROM @Temp1 WHERE @S+Value2004<10000 AND ProductID NOT IN (SELECT ProductID FROM @Temp2) ORDER BY RowNum INSERT INTO @Temp2 VALUES (@P) END SELECT ProductID, Value2004, Value2005 FROM @Temp1 WHERE ProductID IN (SELECT ProductID FROM @Temp2) For this sample data, this algorithm returns the set {6, 8, 10}, which has a total of 55,500, but the best set would have been {1, 7, 10} with a total of 62,000. However, for other input data, this algorithm may happen to guess the best set. Razvan PS. Again, I have written non-portable code, using Microsoft-specific extensions... I apologize for doing this and I invite other posters to write these queries in ANSI-SQL. At this time, I could not find a ANSI-SQL solution that has the same or better performance in SQL Server 2000 as this non-portable solution. I'm not saying that it doesn't exist, I just wasn't able to find it now. Greg,
To add to what Razvan said, this also sounds to me like a variation of a knapsack problem, and while such problems have no known efficient algorithms, a technique that can be useful is called "dynamic programming". One optimistic detail is that you want the highest year-to-date sales. If the secondary condition were not in place, this would be easy (find the 10 products with the highest year-to-date sales). You might get lucky if you do something like this, which tries finding the answer within groups of 12 products with high YTD sales but low last-year sales, adjusting what "low" means until something is found. declare @criterion int set @criterion = 5000 -- something <= 10000 to start while 1=1 begin select top 1 subset of size 10 from ( select top 12 ProductID where last-year-sales < @criterion order by YTD-sales desc ) as Lucky having sum(last-year-sales) < 10000 order by sum(YTD-sales) desc if @@rowcount = 0 set @criterion = @criterion*0.95 else return what you found if @criterion < 1 break end Steve Kass Drew University Greg Toronto wrote: Show quote >Hey gang, > >I have a question that may be more Math related than SQL Server related. >But maybe it's a simple thing & I'm missing it 'cause I'm trying too hard... > >Here's the scenario: I have 300 products in a table. I also have last >year's sales figures, and this year's to-date sales figures. I want to >figure out which group of 10 products have the highest year-to-date sales AND >last year's sales combined to less than 10,000. > >I have written a program that basically uses Where statements to iterate >through the 300 items, 10 times, calculating the best sales for this year >where last years sales are less than 10,000. > >The problem is that there are 300 * 299 * 298 * 297 * 296 * 295 * 294 * 293 >* 292 * 291 different combinations of products. That's a big freakin' number! > >I've put in some minor checks to help speed things up slightly - things like >breaking where last years total is more than 10,000, etc. But she's still >gonna take about 42,000 years to get my result. > >So what am I missing? What's the best way to accomplish what I'm trying to >do? > >Any help would be appreciated... > >Thanks, > >Greg > > Steve, why not just create an aggregate SubQuery, that calculates the
PriorYears Sales amnd the current Years sales, grouped on Product ID, Select ProductID, Sum(Case When SalesDate >= '20040101' And SalesDate < '20050101' ThenSalesAmt End) As PriorYearSales, Sum(Case When SalesDate >= '20050101' ThenSalesAmt End) As CurrYearSales From SalesTable Where SalesDate >= '20040101' -- Not even necessary Group By ProductID And Then Select from this subquery the top 10 Records after filtering on PriorYearSales < 10000, and ordering By CurrYearSales -- ******************* Select Top 10 ProductID From (Select ProductID, Sum(Case When SalesDate >= '20040101' And SalesDate < '20050101' ThenSalesAmt End) As PriorYearSales, Sum(Case When SalesDate >= '20050101' ThenSalesAmt End) As CurrYearSales From SalesTable Where SalesDate >= '20040101' Group By ProductID) As Sales Where PriorYearSales < 10000 Order By CurrYearSales Desc Show quote "Steve Kass" wrote: > Greg, > > To add to what Razvan said, this also sounds to me like a variation of > a knapsack problem, and while such problems have no known efficient > algorithms, a technique that can be useful is called "dynamic programming". > > One optimistic detail is that you want the highest year-to-date sales. If > the secondary condition were not in place, this would be easy (find the 10 > products with the highest year-to-date sales). You might get lucky if you > do something like this, which tries finding the answer within groups of 12 > products with high YTD sales but low last-year sales, adjusting what > "low" means until something is found. > > declare @criterion int > set @criterion = 5000 -- something <= 10000 to start > > while 1=1 begin > select top 1 subset of size 10 from ( > select top 12 ProductID > where last-year-sales < @criterion > order by YTD-sales desc > ) as Lucky > having sum(last-year-sales) < 10000 > order by sum(YTD-sales) desc > if @@rowcount = 0 set @criterion = @criterion*0.95 > else return what you found > if @criterion < 1 break > end > > Steve Kass > Drew University > > > Greg Toronto wrote: > > >Hey gang, > > > >I have a question that may be more Math related than SQL Server related. > >But maybe it's a simple thing & I'm missing it 'cause I'm trying too hard... > > > >Here's the scenario: I have 300 products in a table. I also have last > >year's sales figures, and this year's to-date sales figures. I want to > >figure out which group of 10 products have the highest year-to-date sales AND > >last year's sales combined to less than 10,000. > > > >I have written a program that basically uses Where statements to iterate > >through the 300 items, 10 times, calculating the best sales for this year > >where last years sales are less than 10,000. > > > >The problem is that there are 300 * 299 * 298 * 297 * 296 * 295 * 294 * 293 > >* 292 * 291 different combinations of products. That's a big freakin' number! > > > >I've put in some minor checks to help speed things up slightly - things like > >breaking where last years total is more than 10,000, etc. But she's still > >gonna take about 42,000 years to get my result. > > > >So what am I missing? What's the best way to accomplish what I'm trying to > >do? > > > >Any help would be appreciated... > > > >Thanks, > > > >Greg > > > > > If you want the Products with the Highest CurrentYear Sales, which had
PriorYearSales < 10000, then this should work: Select Top 10 ProductID From (Select ProductID, Sum(Case When SalesDate >= '20040101' And SalesDate < '20050101' ThenSalesAmt End) As PriorYearSales, Sum(Case When SalesDate >= '20050101' ThenSalesAmt End) As CurrYearSales From SalesTable Where SalesDate >= '20040101' Group By ProductID) As Sales Where PriorYearSales < 10000 Order By CurrYearSales Desc Show quote "Greg Toronto" wrote: > Hey gang, > > I have a question that may be more Math related than SQL Server related. > But maybe it's a simple thing & I'm missing it 'cause I'm trying too hard... > > Here's the scenario: I have 300 products in a table. I also have last > year's sales figures, and this year's to-date sales figures. I want to > figure out which group of 10 products have the highest year-to-date sales AND > last year's sales combined to less than 10,000. > > I have written a program that basically uses Where statements to iterate > through the 300 items, 10 times, calculating the best sales for this year > where last years sales are less than 10,000. > > The problem is that there are 300 * 299 * 298 * 297 * 296 * 295 * 294 * 293 > * 292 * 291 different combinations of products. That's a big freakin' number! > > I've put in some minor checks to help speed things up slightly - things like > breaking where last years total is more than 10,000, etc. But she's still > gonna take about 42,000 years to get my result. > > So what am I missing? What's the best way to accomplish what I'm trying to > do? > > Any help would be appreciated... > > Thanks, > > Greg >> I have 300 products in a table. I also have last year's sales figures, and this year's to-date sales figures. I want to figure outwhich group of 10 products have the highest year-to-date sales and last year's sales combined to less than 10,000. << I am little confused on this one. One way to read this is that each of the products in the basket of ten items had less than $10,000 in sales last year; the other way is that the entire basket had less than $10,000 in sales last year. First way is fairly easy: SELECT X.product_id, X.this_yr_amt FROM (SELECT product_id SUM (CASE WHEN sales_date BETWEEN '2004-01-01'AND '2004-12-31' THEN sales_amt ELSE 0.00 END), SUM (CASE WHEN sales_date > '2004-12-31' THEN sales_amt ELSE 0.00 END) FROM Sales GROUP BY product_id) AS X(product_id, last_yr_amt, this_yr_amt) WHERE X.last_yr_amt < 10000.00; The second way is a screaming pain because it is a "Bin Packing Problem", which are known to be NP-complete. Google it. Then we can have ties for two baskets to consider. Then we can have to handle baskets with less than ten items. What if no such baskets exist in the data? It gets ugly fast, and I do mean NP fast. In SQL because it is a set-oriented language, you tend to get the set of ALL answers, which mean you have to try ALL combinations. In the real world, getting the first workable answer and quitting is often good enough. I would put my X() derived table into a VIEW, use a Cursor and a back tracking algorithm in a procedural language. 1) Grab the top ten sellers for this year; if two products are tied, then use the one with the lowest sale for last year. Make a Greedy start. 2) Check the basket total for last year and stop if it is < $10000. 3) If not, replace one of the basket items with the item on the list which has the highest this_year_amt and a last_year_amt lower than the item it is replacing. 4) Repeat until all 300 items have been inspected or until you get a halt condition. This is probably not optimal and it depends on step #3, rule for replacement. |
|||||||||||||||||||||||