Home All Groups Group Topic Archive Search About

Adv. T-SQL: Finding closest match across multiple columns

Author
28 Jan 2006 11:31 PM
Todd Brooks
Without resorting to OLAP, I would like a way to query a table of multiple
columns to find a match within a specified tolerance level.  For example, if
I have a table like below:

CREATE TABLE MyTable (
    [ID] [int] IDENTITY (1, 1) NOT NULL ,
    [Col1] [int] NOT NULL ,
    [Col2] [int] NOT NULL ,
    [Col3] [int] NOT NULL ,
    [Col4] [int] NOT NULL
)

Now, let's say that I have a string made up of  4 integers, ex. 1-2-3-4.  I
have a tolerance level of 1, which means that I would like to search MyTable
for Col1 = 1, Col2 = 2, Col3 = 3 and Col4 = 4, but if there isn't an exact
match, then I will accept a row where a single column doesn't match.  In the
latter case, an example would be a row where Col1 = 1, Col2 = 2, Col3 = 3
and Col4 = 1 (this is the column that wouldn't match).

I'm NOT dead set on the columns being INTs, if it makes it easier for them
to be CHARs, that is fine.
It is also fine if there isn't an exact match, that I get back multiple rows
with a tolerance of 1.

Is this possible with T-SQL?

Regards,

todd

Author
29 Jan 2006 12:04 AM
Erland Sommarskog
Todd Brooks (truistic@newsgroups.nospam) writes:
Show quote
> Without resorting to OLAP, I would like a way to query a table of
> multiple columns to find a match within a specified tolerance level.
> For example, if I have a table like below:
>
> CREATE TABLE MyTable (
>     [ID] [int] IDENTITY (1, 1) NOT NULL ,
>     [Col1] [int] NOT NULL ,
>     [Col2] [int] NOT NULL ,
>     [Col3] [int] NOT NULL ,
>     [Col4] [int] NOT NULL
> )
>
> Now, let's say that I have a string made up of  4 integers, ex. 1-2-3-4.
>  I have a tolerance level of 1, which means that I would like to search
> MyTable for Col1 = 1, Col2 = 2, Col3 = 3 and Col4 = 4, but if there
> isn't an exact match, then I will accept a row where a single column
> doesn't match.  In the latter case, an example would be a row where Col1
> = 1, Col2 = 2, Col3 = 3 and Col4 = 1 (this is the column that wouldn't
> match).

This could work:

   SELECT TOP 1 WITH TIES *
   FROM   MyTable
   WHERE  CASE WHEN Col1 = @val1 THEN 0 ELSE 1 END +
          CASE WHEN Col2 = @val2 THEN 0 ELSE 1 END +
          CASE WHEN Col3 = @val3 THEN 0 ELSE 1 END +
          CASE WHEN Col4 = @val4 THEN 0 ELSE 1 END <= @tolerancelvl


--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
29 Jan 2006 5:29 AM
brooks.todd
Limited testing this evening shows that your solution will work.

Thank you VERY much!

Regards,

todd

AddThis Social Bookmark Button