Home All Groups Group Topic Archive Search About
Author
28 Aug 2006 4:32 PM
Jon Spivey
Hi,
Using SQL Server 2000 on Win2003

I've got a table like this
Models
ModelID int identity pk
Model varchar(50)

In a stored procedure I need to retrieve the modelid given a model. The
problem is that sometimes the input to the sp will match an existing model
but the words will be in a different order, eg
U6 PEBL Green
is a model in the database an input to the stored procedure might be PEBL U6
Green. Obviously they are the same model so I need to match them. I've
thought about splitting the stored proedure input into words and checking
each word exists in the model field, something like#
WHERE CHARINDEX(Word1, Model) > 0 AND CHARINDEX(Word2, Model) > 0 AND
CHARINDEX(Word3, Model) > 0

Would this be the best way? If so how would I go about splitting the input
into words and checking each word exists - input and model names could be
between 1 and 5 words. This is for a script that will run once a day on a
schedule so I'm not overly concerned with efficiency. Any suggestions would
be appreciated.

Thanks very much,
Jon

Author
28 Aug 2006 4:39 PM
lord.zoltar
Jon Spivey wrote:
Show quoteHide quote
> Hi,
> Using SQL Server 2000 on Win2003
>
> I've got a table like this
> Models
> ModelID int identity pk
> Model varchar(50)
>
> In a stored procedure I need to retrieve the modelid given a model. The
> problem is that sometimes the input to the sp will match an existing model
> but the words will be in a different order, eg
> U6 PEBL Green
> is a model in the database an input to the stored procedure might be PEBL U6
> Green. Obviously they are the same model so I need to match them. I've
> thought about splitting the stored proedure input into words and checking
> each word exists in the model field, something like#
> WHERE CHARINDEX(Word1, Model) > 0 AND CHARINDEX(Word2, Model) > 0 AND
> CHARINDEX(Word3, Model) > 0
>
> Would this be the best way? If so how would I go about splitting the input
> into words and checking each word exists - input and model names could be
> between 1 and 5 words. This is for a script that will run once a day on a
> schedule so I'm not overly concerned with efficiency. Any suggestions would
> be appreciated.
>
> Thanks very much,
> Jon

Hmm it depends on what PEBL and U6 mean. Myself, I'd seperate the
information into 3 columns, one for colour, one for "u6", and one for
"Pebl". Will there *always* be 3 words in a Model? If so, and the 3
columns isn't an option (if the database already exists, and is too
much work to change), then maybe just splitting into 3 words will
work...
Author
28 Aug 2006 4:47 PM
Jon Spivey
<lord.zol***@gmail.com> wrote in message
Show quoteHide quote
news:1156783178.879090.32770@m79g2000cwm.googlegroups.com...
>
> Jon Spivey wrote:
>> Hi,
>> Using SQL Server 2000 on Win2003
>>
>> I've got a table like this
>> Models
>> ModelID int identity pk
>> Model varchar(50)
>>
>> In a stored procedure I need to retrieve the modelid given a model. The
>> problem is that sometimes the input to the sp will match an existing
>> model
>> but the words will be in a different order, eg
>> U6 PEBL Green
>> is a model in the database an input to the stored procedure might be PEBL
>> U6
>> Green. Obviously they are the same model so I need to match them. I've
>> thought about splitting the stored proedure input into words and checking
>> each word exists in the model field, something like#
>> WHERE CHARINDEX(Word1, Model) > 0 AND CHARINDEX(Word2, Model) > 0 AND
>> CHARINDEX(Word3, Model) > 0
>>
>> Would this be the best way? If so how would I go about splitting the
>> input
>> into words and checking each word exists - input and model names could be
>> between 1 and 5 words. This is for a script that will run once a day on a
>> schedule so I'm not overly concerned with efficiency. Any suggestions
>> would
>> be appreciated.
>>
>> Thanks very much,
>> Jon
>
> Hmm it depends on what PEBL and U6 mean. Myself, I'd seperate the
> information into 3 columns, one for colour, one for "u6", and one for
> "Pebl". Will there *always* be 3 words in a Model? If so, and the 3
> columns isn't an option (if the database already exists, and is too
> much work to change), then maybe just splitting into 3 words will
> work...
>

Hi,
The models don't always follow that format, they're mobile phones. Sometimes
it will be 1 word sometimes 2 sometimes 3.it wouldn't be safe to assume
anything about the format of model and I don't have any control over the
input to the stored procedure that needs to match a model. Would it be
possible to split an input of >1 words into words within the SP and if so
how?

Thanks,
Jon
Author
28 Aug 2006 5:24 PM
lord.zoltar
> Hi,
> The models don't always follow that format, they're mobile phones. Sometimes
> it will be 1 word sometimes 2 sometimes 3.it wouldn't be safe to assume
> anything about the format of model and I don't have any control over the
> input to the stored procedure that needs to match a model. Would it be
> possible to split an input of >1 words into words within the SP and if so
> how?
>
> Thanks,
> Jon

so you have to split a single string into 3?
Here's an example some (very very) quick (and ugly... but working ;) )
code I threw together that can do this:

DECLARE @input varchar(50)
SET @input='this is word'
DECLARE @w1 varchar(10)
declare @w2 varchar(10)
declare @w3 varchar(10)
declare @remainder varchar(10)

declare @space1 integer
declare @space2 integer

SET @space1 = CHARINDEX(' ',@input);
set @w1 = left(@input,@space1);
set @remainder = right(@input, len(@input)-@space1);
set @space2 = charindex(' ',@remainder);
set @w2 = left(@remainder, @space2);
set @w3 = right(@remainder, len(@remainder)-@space2);
print 'remainder:'+@remainder;
print 'w1:'+@w1;
print 'w2:'+@w2;
print 'w3:'+@w3;


You'll probably want to do major clean up on it before using though
(too many variables, prints not needed in SP, etc...)

more on SQL String functions here:
http://msdn2.microsoft.com/en-us/library/ms181984.aspx
Author
28 Aug 2006 5:23 PM
Alexander Kuznetsov
Jon,

I used UDF iter_charlist_to_table from Erland Sommarskog's article on
arrays and lists
and used another UDF named NumTokens.

SELECT 'PEBL U6 Green' Model INTO #t
UNION ALL
SELECT 'PBL U6 Green'
UNION ALL
SELECT 'PEBL U6 Green Q1'
UNION ALL
SELECT 'PBL U6'
UNION ALL
SELECT 'U6 Green'
UNION ALL
SELECT 'PBL Green'
go
SELECT * FROM #t WHERE (SELECT COUNT(*) FROM
dbo.iter_charlist_to_table('PEBL Green U6',' ')) =
(SELECT COUNT(*) FROM dbo.iter_charlist_to_table('PEBL Green U6',' ')
  JOIN (SELECT #t.Model) t ON ' '+Model+' ' LIKE '% '+[str]+' %')
AND
dbo.NumTokens('PEBL Green U6',' ') = dbo.NumTokens(Model, ' ')

Model           
----------------
PEBL U6 Green

(1 row(s) affected)
Author
28 Aug 2006 5:31 PM
Jim Underwood
What happens if you have two different values in your table with the same
words in different order?

i.e.
'PBL U6 Green' and 'U6 PBL Green'

Unless you have some way of preventing this from happening, you will end up
with input that matches multiple entries.

Show quoteHide quote
"Alexander Kuznetsov" <AK_TIREDOFSPAM@hotmail.COM> wrote in message
news:1156785803.518776.303370@74g2000cwt.googlegroups.com...
> Jon,
>
> I used UDF iter_charlist_to_table from Erland Sommarskog's article on
> arrays and lists
> and used another UDF named NumTokens.
>
> SELECT 'PEBL U6 Green' Model INTO #t
> UNION ALL
> SELECT 'PBL U6 Green'
> UNION ALL
> SELECT 'PEBL U6 Green Q1'
> UNION ALL
> SELECT 'PBL U6'
> UNION ALL
> SELECT 'U6 Green'
> UNION ALL
> SELECT 'PBL Green'
> go
> SELECT * FROM #t WHERE (SELECT COUNT(*) FROM
> dbo.iter_charlist_to_table('PEBL Green U6',' ')) =
> (SELECT COUNT(*) FROM dbo.iter_charlist_to_table('PEBL Green U6',' ')
>   JOIN (SELECT #t.Model) t ON ' '+Model+' ' LIKE '% '+[str]+' %')
> AND
> dbo.NumTokens('PEBL Green U6',' ') = dbo.NumTokens(Model, ' ')
>
> Model
> ----------------
> PEBL U6 Green
>
> (1 row(s) affected)
>
Author
28 Aug 2006 5:48 PM
Alexander Kuznetsov
Good point. I would develop a scalar UDF named  ordered_charlist
so that it returns tokens sorted alphabetically:

SELECT dbo.ordered_charlist( 'PBL U6 Green' )
---------------
Green PBL U6

That's even easier to query

WHERE dbo.ordered_charlist(Model) = dbo.ordered_charlist( 'PBL U6
Green' )

I think you could create a unique index on computed column defined as
dbo.ordered_charlist(Model) to prevent your scenario:

Jim Underwood wrote:
Show quoteHide quote
> What happens if you have two different values in your table with the same
> words in different order?
>
> i.e.
> 'PBL U6 Green' and 'U6 PBL Green'
>
Author
28 Aug 2006 6:07 PM
Jon Spivey
Hi,
This looks very promising. I've googled and found iter_charlist_to_table but
I can't find the other UDF you mention NumTokens. Do you have the code or a
link please?

Thanks,
Jon

Show quoteHide quote
"Alexander Kuznetsov" <AK_TIREDOFSPAM@hotmail.COM> wrote in message
news:1156785803.518776.303370@74g2000cwt.googlegroups.com...
> Jon,
>
> I used UDF iter_charlist_to_table from Erland Sommarskog's article on
> arrays and lists
> and used another UDF named NumTokens.
>
> SELECT 'PEBL U6 Green' Model INTO #t
> UNION ALL
> SELECT 'PBL U6 Green'
> UNION ALL
> SELECT 'PEBL U6 Green Q1'
> UNION ALL
> SELECT 'PBL U6'
> UNION ALL
> SELECT 'U6 Green'
> UNION ALL
> SELECT 'PBL Green'
> go
> SELECT * FROM #t WHERE (SELECT COUNT(*) FROM
> dbo.iter_charlist_to_table('PEBL Green U6',' ')) =
> (SELECT COUNT(*) FROM dbo.iter_charlist_to_table('PEBL Green U6',' ')
>  JOIN (SELECT #t.Model) t ON ' '+Model+' ' LIKE '% '+[str]+' %')
> AND
> dbo.NumTokens('PEBL Green U6',' ') = dbo.NumTokens(Model, ' ')
>
> Model
> ----------------
> PEBL U6 Green
>
> (1 row(s) affected)
>