Home All Groups Group Topic Archive Search About

Table design (search with ranking)

Author
26 Jan 2006 5:50 PM
Patrik
Hello!

I am trying to design some tables to solve this problem. The main goal is to
have a very good performance, but still have a good design.

I am trying to build a search function with a ranking system. The search
function can be used to search for cars/hotels or whatever. Each car/hotel
have x numbers of attributes and each attribute can have a value.

For example

Attributes for a car can be: make, year, type.
Values for the attribute make can be: Audi, BMW, Ford
Values for the attribute type can be: Cab, Coupé, SUV

It should be possible to make a decision of how important a attribute is.
These values can be.
- Not at all
- Some
- A lot
- Required

If the choice for a specific attribute is ‘not at all’ it should not affect
the points at all.
If the choice for a specific attribute is ‘some’ it should give 50% points
if the car has that attribute value.
If the choice for a specific attribute is ‘A lot’ it should give 100% points
if the car has that attribute value.
If the choice for a specific attribute is ‘Required‘ it should vive 100%
points and it will require that the car has this attribute value to even show
in the result list.

If we have 3 attributes and the one that makes the search do something like
this:

Make: Audi, Ford (a lot)
Year: 01,02,03 (some)
Type: (not at all)

The max points in the result should be 100. And since it is more important
(for the searcher) that the make is correct than that the year is, that will
generate 50% more points.
That will give us that if the make is a audi or a ford it will give that
post 75 points. If the year is right (01,02,03) it will give the post 25
points. So if the car is a audi or a ford and is created in either 01,02 or
03 it will give that car 100 points.
Since the one that makes the search has set the type to ‘not at all’, that
will not affect the points at all.

This point calculation also means that this:

Make: Audi (a lot) – 33.3 points
Year: 01,02 (a lot) – 33.3 points
Type: SUV (a lot) – 33.3 points

Is the same thing as
Make: Audi (some) – 33.3 points
Year: 01,02 (some) – 33.3 points
Type: SUV (some) – 33.3 points

Since we will always return 100 points.

But if we change the importance of any of the attributes, it will also
affect the possible points.

Make: Audi (some) – 25 points
Year: 01,02 (a lot) – 50 points
Type: SUV (some) – 25 points

Anyone have any ideas of how to design the tables? Remember that it should
be simple to add new attributes and attribute values (prefer that this is
done with DML instead of DDL, but I can probably accept the requirement of
DDL). The performance is very very important (more important than DML to add
attributes/attribute values).

Author
26 Jan 2006 6:10 PM
Mark Williams
From a database design standpoint, your tables would be pretty straight
forward.

CREATE TABLE cars (
  vin char(17) NOT NULL PRIMARY KEY,
  manufacturerid int REFERENCES manufacturers (manufacturerid),
  model ...
  color ...
)

Since the ranking and weighting is something that is decided by a
(ephemeral) user, it's not something that you would want to store in the
database itself, but rather dynamically calculate with your application.

Show quote
"Patrik" wrote:

> Hello!
>
> I am trying to design some tables to solve this problem. The main goal is to
> have a very good performance, but still have a good design.
>
> I am trying to build a search function with a ranking system. The search
> function can be used to search for cars/hotels or whatever. Each car/hotel
> have x numbers of attributes and each attribute can have a value.
>
> For example
>
> Attributes for a car can be: make, year, type.
> Values for the attribute make can be: Audi, BMW, Ford
> Values for the attribute type can be: Cab, Coupé, SUV
>
> It should be possible to make a decision of how important a attribute is.
> These values can be.
> - Not at all
> - Some
> - A lot
> - Required
>
> If the choice for a specific attribute is ‘not at all’ it should not affect
> the points at all.
> If the choice for a specific attribute is ‘some’ it should give 50% points
> if the car has that attribute value.
> If the choice for a specific attribute is ‘A lot’ it should give 100% points
> if the car has that attribute value.
> If the choice for a specific attribute is ‘Required‘ it should vive 100%
> points and it will require that the car has this attribute value to even show
> in the result list.
>
> If we have 3 attributes and the one that makes the search do something like
> this:
>
> Make: Audi, Ford (a lot)
> Year: 01,02,03 (some)
> Type: (not at all)
>
> The max points in the result should be 100. And since it is more important
> (for the searcher) that the make is correct than that the year is, that will
> generate 50% more points.
> That will give us that if the make is a audi or a ford it will give that
> post 75 points. If the year is right (01,02,03) it will give the post 25
> points. So if the car is a audi or a ford and is created in either 01,02 or
> 03 it will give that car 100 points.
> Since the one that makes the search has set the type to ‘not at all’, that
> will not affect the points at all.
>
> This point calculation also means that this:
>
> Make: Audi (a lot) – 33.3 points
> Year: 01,02 (a lot) – 33.3 points
> Type: SUV (a lot) – 33.3 points

> Is the same thing as
> Make: Audi (some) – 33.3 points
> Year: 01,02 (some) – 33.3 points
> Type: SUV (some) – 33.3 points
>
> Since we will always return 100 points.
>
> But if we change the importance of any of the attributes, it will also
> affect the possible points.
>
> Make: Audi (some) – 25 points
> Year: 01,02 (a lot) – 50 points
> Type: SUV (some) – 25 points
>
> Anyone have any ideas of how to design the tables? Remember that it should
> be simple to add new attributes and attribute values (prefer that this is
> done with DML instead of DDL, but I can probably accept the requirement of
> DDL). The performance is very very important (more important than DML to add
> attributes/attribute values).
>
Author
26 Jan 2006 6:31 PM
Patrik
Yes that is a solution for the car table. Now we need a table or two for the
searchdata and the weighting. Or maybe we can store this in variables, but we
need to calculate the score in some way since we may want to order the result
by score and only return, for example top 20.

Show quote
"Mark Williams" wrote:

> From a database design standpoint, your tables would be pretty straight
> forward.
>
> CREATE TABLE cars (
>   vin char(17) NOT NULL PRIMARY KEY,
>   manufacturerid int REFERENCES manufacturers (manufacturerid),
>   model ...
>   color ...
> )
>
> Since the ranking and weighting is something that is decided by a
> (ephemeral) user, it's not something that you would want to store in the
> database itself, but rather dynamically calculate with your application.
>
> "Patrik" wrote:
>
> > Hello!
> >
> > I am trying to design some tables to solve this problem. The main goal is to
> > have a very good performance, but still have a good design.
> >
> > I am trying to build a search function with a ranking system. The search
> > function can be used to search for cars/hotels or whatever. Each car/hotel
> > have x numbers of attributes and each attribute can have a value.
> >
> > For example
> >
> > Attributes for a car can be: make, year, type.
> > Values for the attribute make can be: Audi, BMW, Ford
> > Values for the attribute type can be: Cab, Coupé, SUV
> >
> > It should be possible to make a decision of how important a attribute is.
> > These values can be.
> > - Not at all
> > - Some
> > - A lot
> > - Required
> >
> > If the choice for a specific attribute is ‘not at all’ it should not affect
> > the points at all.
> > If the choice for a specific attribute is ‘some’ it should give 50% points
> > if the car has that attribute value.
> > If the choice for a specific attribute is ‘A lot’ it should give 100% points
> > if the car has that attribute value.
> > If the choice for a specific attribute is ‘Required‘ it should vive 100%
> > points and it will require that the car has this attribute value to even show
> > in the result list.
> >
> > If we have 3 attributes and the one that makes the search do something like
> > this:
> >
> > Make: Audi, Ford (a lot)
> > Year: 01,02,03 (some)
> > Type: (not at all)
> >
> > The max points in the result should be 100. And since it is more important
> > (for the searcher) that the make is correct than that the year is, that will
> > generate 50% more points.
> > That will give us that if the make is a audi or a ford it will give that
> > post 75 points. If the year is right (01,02,03) it will give the post 25
> > points. So if the car is a audi or a ford and is created in either 01,02 or
> > 03 it will give that car 100 points.
> > Since the one that makes the search has set the type to ‘not at all’, that
> > will not affect the points at all.
> >
> > This point calculation also means that this:
> >
> > Make: Audi (a lot) – 33.3 points
> > Year: 01,02 (a lot) – 33.3 points
> > Type: SUV (a lot) – 33.3 points
> > 
> > Is the same thing as
> > Make: Audi (some) – 33.3 points
> > Year: 01,02 (some) – 33.3 points
> > Type: SUV (some) – 33.3 points
> >
> > Since we will always return 100 points.
> >
> > But if we change the importance of any of the attributes, it will also
> > affect the possible points.
> >
> > Make: Audi (some) – 25 points
> > Year: 01,02 (a lot) – 50 points
> > Type: SUV (some) – 25 points
> >
> > Anyone have any ideas of how to design the tables? Remember that it should
> > be simple to add new attributes and attribute values (prefer that this is
> > done with DML instead of DDL, but I can probably accept the requirement of
> > DDL). The performance is very very important (more important than DML to add
> > attributes/attribute values).
> >
Author
26 Jan 2006 6:40 PM
Alexander Kuznetsov
The requitrements seem to be very usual, so I would design the tables
the usual way.
Writing queries with ranking is very easy:

create table #car(vin char(3), /*to save typing*/
make char(5), type char(5), year smallint)
insert into #car
select '111', 'Ford', 'Sedan', 2003
union all
select '112', 'Ford', 'SUV', 2004
union all
select '113', 'Ford', 'Truck', 2004
union all
select '114', 'Audi', 'Sedan', 2002
go
select * from
(select vin, make, type, year,
/* the one below is for the mandatory condition in the where clause*/
1 +
/* somewhat important */
case when year in(2002,2003) then 0.5 else 0 end +
/* important */
case when type in('Sedan','SUV') then 1 else 0 end
rank
from #car
where make in('Ford')
) t
order by rank desc
go
drop table #car

vin  make  type  year   rank
---- ----- ----- ------ --------------
111  Ford  Sedan 2003   2.5
112  Ford  SUV   2004   2.0
113  Ford  Truck 2004   1.0

(3 row(s) affected)
Author
26 Jan 2006 6:51 PM
Patrik
Thank you!

If you had to store the search data, how could that table be designed?

If you create a table like the car table you will have alot of null values.
Another way is to store all values in tables and then create the table to
store the search like. (searchId int, valueId int)


Thanks!

Show quote
"Alexander Kuznetsov" wrote:

> The requitrements seem to be very usual, so I would design the tables
> the usual way.
> Writing queries with ranking is very easy:
>
> create table #car(vin char(3), /*to save typing*/
> make char(5), type char(5), year smallint)
> insert into #car
> select '111', 'Ford', 'Sedan', 2003
> union all
> select '112', 'Ford', 'SUV', 2004
> union all
> select '113', 'Ford', 'Truck', 2004
> union all
> select '114', 'Audi', 'Sedan', 2002
> go
> select * from
> (select vin, make, type, year,
> /* the one below is for the mandatory condition in the where clause*/
> 1 +
> /* somewhat important */
> case when year in(2002,2003) then 0.5 else 0 end +
> /* important */
> case when type in('Sedan','SUV') then 1 else 0 end
> rank
> from #car
> where make in('Ford')
> ) t
> order by rank desc
> go
> drop table #car
>
> vin  make  type  year   rank
> ---- ----- ----- ------ --------------
> 111  Ford  Sedan 2003   2.5
> 112  Ford  SUV   2004   2.0
> 113  Ford  Truck 2004   1.0
>
> (3 row(s) affected)
>
>
Author
26 Jan 2006 9:52 PM
Hugo Kornelis
On Thu, 26 Jan 2006 10:51:04 -0800, Patrik wrote:

>Thank you!
>
>If you had to store the search data, how could that table be designed?

Hi Patrik,

You might have a Searches table, with a SearchID (primary key), the
importance of the criteria and possibly some other descriptive
information. The search criteria have to go in seperate tables, since a
search might be for zero, one or more than one year; zero, one or more
than one make, etc.

CREATE TABLE Searches
       (SearchID int NOT NULL PRIMARY KEY
       ,MakeImportance char(1) NOT NULL
           CHECK (MakeImportance IN ('N', 'S', 'A', 'R')
           -- 'N'ot at all / 'S'ome / 'A' lot / 'R'equired
       ,YearImportance char(1) NOT NULL
           CHECK (YearImportance IN ('N', 'S', 'A', 'R')
        ....
       )

CREATE TABLE MakeSearches
       (SearchID int NOT NULL REFERENCES Searches(SearchID)
       ,Make char(5) NOT NULL -- REFERENCES Makes(Make)  -- if you have
a Makes table
       ,PRIMARY KEY(SearchID, Make)

CREATE TABLE YearSearches
       (SearchID int NOT NULL REFERENCES Searches(SearchID)
       ,Make char(5) NOT NULL -- REFERENCES Makes(Make)  -- if you have
a Makes table
       ,PRIMARY KEY(SearchID, Make)

Of course, you'll have to adapt Alexander's query to use this table.
Something like below (using only Year and Make):

SELECT TOP 20
         c.VIN, c.Make, c.Type, c.Year,
         CASE WHEN c.Year IN (SELECT Year
                              FROM   YearSearches
                              WHERE  SearchID = @SearchID)
              THEN CASE YearImportance
                   WHEN 'N' THEN 0
                   WHEN 'S' THEN 0.5
                   ELSE 1
                   END
              ELSE 0
         END +
         CASE WHEN c.Make IN (SELECT Make
                              FROM   MakeSearches
                              WHERE  SearchID = @SearchID)
              THEN CASE MakeImportance
                   WHEN 'N' THEN 0
                   WHEN 'S' THEN 0.5
                   ELSE 1
                   END
              ELSE 0
         END AS Rank
FROM     Cars AS c, Searches AS s
WHERE    s.SearchID = @SearchID
AND      CASE WHEN s.YearImportance = 'R'
              THEN CASE WHEN c.Year IN (SELECT Year
                                        FROM   YearSearches
                                        WHERE  SearchID = @SearchID)
                        THEN 1
                        ELSE 0
                   END
              ELSE 1
         END = 1
AND      CASE WHEN s.MakeImportance = 'R'
              THEN CASE WHEN c.Make IN (SELECT Make
                                        FROM   MakeSearches
                                        WHERE  SearchID = @SearchID)
                        THEN 1
                        ELSE 0
                   END
              ELSE 1
         END = 1
ORDER BY Rank DESC

(untested)

--
Hugo Kornelis, SQL Server MVP
Author
26 Jan 2006 10:01 PM
Alexander Kuznetsov
Hi Hugo,

I have a feeling that your query will most likely always scan the whole
table. I think in this case generating a specific query with a really
simple WHERE clause dynamically might be a better option for most
situations I can think of, at least if there are indexes on the columns
you usually search on.

What do you think?
Author
27 Jan 2006 10:26 AM
Patrik
Any more ideas?
Author
27 Jan 2006 10:04 PM
Hugo Kornelis
On 26 Jan 2006 14:01:54 -0800, Alexander Kuznetsov wrote:

>Hi Hugo,
>
>I have a feeling that your query will most likely always scan the whole
>table. I think in this case generating a specific query with a really
>simple WHERE clause dynamically might be a better option for most
>situations I can think of, at least if there are indexes on the columns
>you usually search on.
>
>What do you think?

Hi Alexander,

Hmmm. Yes, you're probably right.

For a much faster solution, the WHERE clause has to be custom-made for
the 'R'equired search terms. Best to use several stored procedures for
that. Then use an IF ELSE tree to call the correct stored procedure,
based on whatever was chosen to be 'R'equired.

For instance, if Make was required and Year not, then a stored procedure
would be called with this body:

SELECT TOP 20
         c.VIN, c.Make, c.Type, c.Year,
         CASE WHEN c.Year IN (SELECT Year
                              FROM   YearSearches
                              WHERE  SearchID = @SearchID)
              THEN CASE YearImportance
                   WHEN 'N' THEN 0
                   WHEN 'S' THEN 0.5
                   ELSE 1
                   END
              ELSE 0
         END AS Rank
FROM     Cars AS c, Searches AS s
WHERE    s.SearchID = @SearchID
AND      c.Make IN (SELECT Make
                    FROM   MakeSearches
                    WHERE  SearchID = @SearchID)
ORDER BY Rank DESC

Or, even better - pass YearImportance from the Searches table as a
parameter, after transforming it from 'N'/'S'/'A' to 0/0.5/1, and
optimize the WHERE clause some more:

SELECT TOP 20
           c.VIN, c.Make, c.Type, c.Year,
           CASE WHEN c.Year IN (SELECT Year
                                FROM   YearSearches
                                WHERE  SearchID = @SearchID)
                THEN @YearImportance
                ELSE 0
           END AS Rank
FROM       Cars AS c
INNER JOIN MakeSearches AS ms
      ON   ms.SearchID = @SearchID
      AND  ms.Make = c.make
ORDER BY   Rank DESC

Of course, with just Make and Year there is already a potential for 4
stored procedures if all combinations are allowed. Add type to get a
total of 8 stored procedures; add a fourth argument and you're already
at 16 stored procedures.

I hate to say it, but this might be a case where dynamic SQL could be
the better choice.
Of course, do read http://www.sommarskog.se/dynamic_sql.html and
http://www.sommarskog.se/dyn-search.html.

--
Hugo Kornelis, SQL Server MVP
Author
27 Jan 2006 11:12 PM
Alexander Kuznetsov
> I hate to say it, but this might be a case where dynamic SQL could be the better choice.

Hi Hugo, I'd concur. Besides number of possible combinations, one needs
to consider possible data skew, like more Fords than Ferraris
Author
26 Jan 2006 10:19 PM
Patrik
Will the IN()-statement really perform well?

And all those tables for each attribute. Is there no other way? I know
that it is possible to create a generic table (EAV), but that could be
hard to maintain and to add constraints to.
Author
27 Jan 2006 10:05 PM
Hugo Kornelis
On 26 Jan 2006 14:19:01 -0800, Patrik wrote:

>Will the IN()-statement really perform well?

Hi Patrik,

Both EXISTS and JOIN are in most cases faster. But see my reply to
Alexander as well. The WHERE clause has to be less complex to allow the
optimizer to pick a good index to use.

>
>And all those tables for each attribute. Is there no other way? I know
>that it is possible to create a generic table (EAV), but that could be
>hard to maintain and to add constraints to.

And it would probably reduce your performance to below zero.

--
Hugo Kornelis, SQL Server MVP

AddThis Social Bookmark Button