|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Table design (search with ranking)I am trying to design some tables to solve this problem. The main goal is to have a very good performance, but still have a good design. I am trying to build a search function with a ranking system. The search function can be used to search for cars/hotels or whatever. Each car/hotel have x numbers of attributes and each attribute can have a value. For example Attributes for a car can be: make, year, type. Values for the attribute make can be: Audi, BMW, Ford Values for the attribute type can be: Cab, Coupé, SUV It should be possible to make a decision of how important a attribute is. These values can be. - Not at all - Some - A lot - Required If the choice for a specific attribute is ‘not at all’ it should not affect the points at all. If the choice for a specific attribute is ‘some’ it should give 50% points if the car has that attribute value. If the choice for a specific attribute is ‘A lot’ it should give 100% points if the car has that attribute value. If the choice for a specific attribute is ‘Required‘ it should vive 100% points and it will require that the car has this attribute value to even show in the result list. If we have 3 attributes and the one that makes the search do something like this: Make: Audi, Ford (a lot) Year: 01,02,03 (some) Type: (not at all) The max points in the result should be 100. And since it is more important (for the searcher) that the make is correct than that the year is, that will generate 50% more points. That will give us that if the make is a audi or a ford it will give that post 75 points. If the year is right (01,02,03) it will give the post 25 points. So if the car is a audi or a ford and is created in either 01,02 or 03 it will give that car 100 points. Since the one that makes the search has set the type to ‘not at all’, that will not affect the points at all. This point calculation also means that this: Make: Audi (a lot) – 33.3 points Year: 01,02 (a lot) – 33.3 points Type: SUV (a lot) – 33.3 points Is the same thing as Make: Audi (some) – 33.3 points Year: 01,02 (some) – 33.3 points Type: SUV (some) – 33.3 points Since we will always return 100 points. But if we change the importance of any of the attributes, it will also affect the possible points. Make: Audi (some) – 25 points Year: 01,02 (a lot) – 50 points Type: SUV (some) – 25 points Anyone have any ideas of how to design the tables? Remember that it should be simple to add new attributes and attribute values (prefer that this is done with DML instead of DDL, but I can probably accept the requirement of DDL). The performance is very very important (more important than DML to add attributes/attribute values). From a database design standpoint, your tables would be pretty straight
forward. CREATE TABLE cars ( vin char(17) NOT NULL PRIMARY KEY, manufacturerid int REFERENCES manufacturers (manufacturerid), model ... color ... ) Since the ranking and weighting is something that is decided by a (ephemeral) user, it's not something that you would want to store in the database itself, but rather dynamically calculate with your application. Show quote "Patrik" wrote: > Hello! > > I am trying to design some tables to solve this problem. The main goal is to > have a very good performance, but still have a good design. > > I am trying to build a search function with a ranking system. The search > function can be used to search for cars/hotels or whatever. Each car/hotel > have x numbers of attributes and each attribute can have a value. > > For example > > Attributes for a car can be: make, year, type. > Values for the attribute make can be: Audi, BMW, Ford > Values for the attribute type can be: Cab, Coupé, SUV > > It should be possible to make a decision of how important a attribute is. > These values can be. > - Not at all > - Some > - A lot > - Required > > If the choice for a specific attribute is ‘not at all’ it should not affect > the points at all. > If the choice for a specific attribute is ‘some’ it should give 50% points > if the car has that attribute value. > If the choice for a specific attribute is ‘A lot’ it should give 100% points > if the car has that attribute value. > If the choice for a specific attribute is ‘Required‘ it should vive 100% > points and it will require that the car has this attribute value to even show > in the result list. > > If we have 3 attributes and the one that makes the search do something like > this: > > Make: Audi, Ford (a lot) > Year: 01,02,03 (some) > Type: (not at all) > > The max points in the result should be 100. And since it is more important > (for the searcher) that the make is correct than that the year is, that will > generate 50% more points. > That will give us that if the make is a audi or a ford it will give that > post 75 points. If the year is right (01,02,03) it will give the post 25 > points. So if the car is a audi or a ford and is created in either 01,02 or > 03 it will give that car 100 points. > Since the one that makes the search has set the type to ‘not at all’, that > will not affect the points at all. > > This point calculation also means that this: > > Make: Audi (a lot) – 33.3 points > Year: 01,02 (a lot) – 33.3 points > Type: SUV (a lot) – 33.3 points > > Is the same thing as > Make: Audi (some) – 33.3 points > Year: 01,02 (some) – 33.3 points > Type: SUV (some) – 33.3 points > > Since we will always return 100 points. > > But if we change the importance of any of the attributes, it will also > affect the possible points. > > Make: Audi (some) – 25 points > Year: 01,02 (a lot) – 50 points > Type: SUV (some) – 25 points > > Anyone have any ideas of how to design the tables? Remember that it should > be simple to add new attributes and attribute values (prefer that this is > done with DML instead of DDL, but I can probably accept the requirement of > DDL). The performance is very very important (more important than DML to add > attributes/attribute values). > Yes that is a solution for the car table. Now we need a table or two for the
searchdata and the weighting. Or maybe we can store this in variables, but we need to calculate the score in some way since we may want to order the result by score and only return, for example top 20. Show quote "Mark Williams" wrote: > From a database design standpoint, your tables would be pretty straight > forward. > > CREATE TABLE cars ( > vin char(17) NOT NULL PRIMARY KEY, > manufacturerid int REFERENCES manufacturers (manufacturerid), > model ... > color ... > ) > > Since the ranking and weighting is something that is decided by a > (ephemeral) user, it's not something that you would want to store in the > database itself, but rather dynamically calculate with your application. > > "Patrik" wrote: > > > Hello! > > > > I am trying to design some tables to solve this problem. The main goal is to > > have a very good performance, but still have a good design. > > > > I am trying to build a search function with a ranking system. The search > > function can be used to search for cars/hotels or whatever. Each car/hotel > > have x numbers of attributes and each attribute can have a value. > > > > For example > > > > Attributes for a car can be: make, year, type. > > Values for the attribute make can be: Audi, BMW, Ford > > Values for the attribute type can be: Cab, Coupé, SUV > > > > It should be possible to make a decision of how important a attribute is. > > These values can be. > > - Not at all > > - Some > > - A lot > > - Required > > > > If the choice for a specific attribute is ‘not at all’ it should not affect > > the points at all. > > If the choice for a specific attribute is ‘some’ it should give 50% points > > if the car has that attribute value. > > If the choice for a specific attribute is ‘A lot’ it should give 100% points > > if the car has that attribute value. > > If the choice for a specific attribute is ‘Required‘ it should vive 100% > > points and it will require that the car has this attribute value to even show > > in the result list. > > > > If we have 3 attributes and the one that makes the search do something like > > this: > > > > Make: Audi, Ford (a lot) > > Year: 01,02,03 (some) > > Type: (not at all) > > > > The max points in the result should be 100. And since it is more important > > (for the searcher) that the make is correct than that the year is, that will > > generate 50% more points. > > That will give us that if the make is a audi or a ford it will give that > > post 75 points. If the year is right (01,02,03) it will give the post 25 > > points. So if the car is a audi or a ford and is created in either 01,02 or > > 03 it will give that car 100 points. > > Since the one that makes the search has set the type to ‘not at all’, that > > will not affect the points at all. > > > > This point calculation also means that this: > > > > Make: Audi (a lot) – 33.3 points > > Year: 01,02 (a lot) – 33.3 points > > Type: SUV (a lot) – 33.3 points > > > > Is the same thing as > > Make: Audi (some) – 33.3 points > > Year: 01,02 (some) – 33.3 points > > Type: SUV (some) – 33.3 points > > > > Since we will always return 100 points. > > > > But if we change the importance of any of the attributes, it will also > > affect the possible points. > > > > Make: Audi (some) – 25 points > > Year: 01,02 (a lot) – 50 points > > Type: SUV (some) – 25 points > > > > Anyone have any ideas of how to design the tables? Remember that it should > > be simple to add new attributes and attribute values (prefer that this is > > done with DML instead of DDL, but I can probably accept the requirement of > > DDL). The performance is very very important (more important than DML to add > > attributes/attribute values). > > The requitrements seem to be very usual, so I would design the tables
the usual way. Writing queries with ranking is very easy: create table #car(vin char(3), /*to save typing*/ make char(5), type char(5), year smallint) insert into #car select '111', 'Ford', 'Sedan', 2003 union all select '112', 'Ford', 'SUV', 2004 union all select '113', 'Ford', 'Truck', 2004 union all select '114', 'Audi', 'Sedan', 2002 go select * from (select vin, make, type, year, /* the one below is for the mandatory condition in the where clause*/ 1 + /* somewhat important */ case when year in(2002,2003) then 0.5 else 0 end + /* important */ case when type in('Sedan','SUV') then 1 else 0 end rank from #car where make in('Ford') ) t order by rank desc go drop table #car vin make type year rank ---- ----- ----- ------ -------------- 111 Ford Sedan 2003 2.5 112 Ford SUV 2004 2.0 113 Ford Truck 2004 1.0 (3 row(s) affected) Thank you!
If you had to store the search data, how could that table be designed? If you create a table like the car table you will have alot of null values. Another way is to store all values in tables and then create the table to store the search like. (searchId int, valueId int) Thanks! Show quote "Alexander Kuznetsov" wrote: > The requitrements seem to be very usual, so I would design the tables > the usual way. > Writing queries with ranking is very easy: > > create table #car(vin char(3), /*to save typing*/ > make char(5), type char(5), year smallint) > insert into #car > select '111', 'Ford', 'Sedan', 2003 > union all > select '112', 'Ford', 'SUV', 2004 > union all > select '113', 'Ford', 'Truck', 2004 > union all > select '114', 'Audi', 'Sedan', 2002 > go > select * from > (select vin, make, type, year, > /* the one below is for the mandatory condition in the where clause*/ > 1 + > /* somewhat important */ > case when year in(2002,2003) then 0.5 else 0 end + > /* important */ > case when type in('Sedan','SUV') then 1 else 0 end > rank > from #car > where make in('Ford') > ) t > order by rank desc > go > drop table #car > > vin make type year rank > ---- ----- ----- ------ -------------- > 111 Ford Sedan 2003 2.5 > 112 Ford SUV 2004 2.0 > 113 Ford Truck 2004 1.0 > > (3 row(s) affected) > > On Thu, 26 Jan 2006 10:51:04 -0800, Patrik wrote:
>Thank you! Hi Patrik,> >If you had to store the search data, how could that table be designed? You might have a Searches table, with a SearchID (primary key), the importance of the criteria and possibly some other descriptive information. The search criteria have to go in seperate tables, since a search might be for zero, one or more than one year; zero, one or more than one make, etc. CREATE TABLE Searches (SearchID int NOT NULL PRIMARY KEY ,MakeImportance char(1) NOT NULL CHECK (MakeImportance IN ('N', 'S', 'A', 'R') -- 'N'ot at all / 'S'ome / 'A' lot / 'R'equired ,YearImportance char(1) NOT NULL CHECK (YearImportance IN ('N', 'S', 'A', 'R') .... ) CREATE TABLE MakeSearches (SearchID int NOT NULL REFERENCES Searches(SearchID) ,Make char(5) NOT NULL -- REFERENCES Makes(Make) -- if you have a Makes table ,PRIMARY KEY(SearchID, Make) CREATE TABLE YearSearches (SearchID int NOT NULL REFERENCES Searches(SearchID) ,Make char(5) NOT NULL -- REFERENCES Makes(Make) -- if you have a Makes table ,PRIMARY KEY(SearchID, Make) Of course, you'll have to adapt Alexander's query to use this table. Something like below (using only Year and Make): SELECT TOP 20 c.VIN, c.Make, c.Type, c.Year, CASE WHEN c.Year IN (SELECT Year FROM YearSearches WHERE SearchID = @SearchID) THEN CASE YearImportance WHEN 'N' THEN 0 WHEN 'S' THEN 0.5 ELSE 1 END ELSE 0 END + CASE WHEN c.Make IN (SELECT Make FROM MakeSearches WHERE SearchID = @SearchID) THEN CASE MakeImportance WHEN 'N' THEN 0 WHEN 'S' THEN 0.5 ELSE 1 END ELSE 0 END AS Rank FROM Cars AS c, Searches AS s WHERE s.SearchID = @SearchID AND CASE WHEN s.YearImportance = 'R' THEN CASE WHEN c.Year IN (SELECT Year FROM YearSearches WHERE SearchID = @SearchID) THEN 1 ELSE 0 END ELSE 1 END = 1 AND CASE WHEN s.MakeImportance = 'R' THEN CASE WHEN c.Make IN (SELECT Make FROM MakeSearches WHERE SearchID = @SearchID) THEN 1 ELSE 0 END ELSE 1 END = 1 ORDER BY Rank DESC (untested) -- Hugo Kornelis, SQL Server MVP Hi Hugo,
I have a feeling that your query will most likely always scan the whole table. I think in this case generating a specific query with a really simple WHERE clause dynamically might be a better option for most situations I can think of, at least if there are indexes on the columns you usually search on. What do you think? On 26 Jan 2006 14:01:54 -0800, Alexander Kuznetsov wrote:
>Hi Hugo, Hi Alexander,> >I have a feeling that your query will most likely always scan the whole >table. I think in this case generating a specific query with a really >simple WHERE clause dynamically might be a better option for most >situations I can think of, at least if there are indexes on the columns >you usually search on. > >What do you think? Hmmm. Yes, you're probably right. For a much faster solution, the WHERE clause has to be custom-made for the 'R'equired search terms. Best to use several stored procedures for that. Then use an IF ELSE tree to call the correct stored procedure, based on whatever was chosen to be 'R'equired. For instance, if Make was required and Year not, then a stored procedure would be called with this body: SELECT TOP 20 c.VIN, c.Make, c.Type, c.Year, CASE WHEN c.Year IN (SELECT Year FROM YearSearches WHERE SearchID = @SearchID) THEN CASE YearImportance WHEN 'N' THEN 0 WHEN 'S' THEN 0.5 ELSE 1 END ELSE 0 END AS Rank FROM Cars AS c, Searches AS s WHERE s.SearchID = @SearchID AND c.Make IN (SELECT Make FROM MakeSearches WHERE SearchID = @SearchID) ORDER BY Rank DESC Or, even better - pass YearImportance from the Searches table as a parameter, after transforming it from 'N'/'S'/'A' to 0/0.5/1, and optimize the WHERE clause some more: SELECT TOP 20 c.VIN, c.Make, c.Type, c.Year, CASE WHEN c.Year IN (SELECT Year FROM YearSearches WHERE SearchID = @SearchID) THEN @YearImportance ELSE 0 END AS Rank FROM Cars AS c INNER JOIN MakeSearches AS ms ON ms.SearchID = @SearchID AND ms.Make = c.make ORDER BY Rank DESC Of course, with just Make and Year there is already a potential for 4 stored procedures if all combinations are allowed. Add type to get a total of 8 stored procedures; add a fourth argument and you're already at 16 stored procedures. I hate to say it, but this might be a case where dynamic SQL could be the better choice. Of course, do read http://www.sommarskog.se/dynamic_sql.html and http://www.sommarskog.se/dyn-search.html. -- Hugo Kornelis, SQL Server MVP > I hate to say it, but this might be a case where dynamic SQL could be the better choice. Hi Hugo, I'd concur. Besides number of possible combinations, one needsto consider possible data skew, like more Fords than Ferraris Will the IN()-statement really perform well?
And all those tables for each attribute. Is there no other way? I know that it is possible to create a generic table (EAV), but that could be hard to maintain and to add constraints to. On 26 Jan 2006 14:19:01 -0800, Patrik wrote:
>Will the IN()-statement really perform well? Hi Patrik,Both EXISTS and JOIN are in most cases faster. But see my reply to Alexander as well. The WHERE clause has to be less complex to allow the optimizer to pick a good index to use. > And it would probably reduce your performance to below zero.>And all those tables for each attribute. Is there no other way? I know >that it is possible to create a generic table (EAV), but that could be >hard to maintain and to add constraints to. -- Hugo Kornelis, SQL Server MVP |
|||||||||||||||||||||||