Home All Groups Group Topic Archive Search About

Casing a name like McKinley

Author
19 Jul 2006 3:26 PM
Walter Mallon
I'm writing script to properly case names that we import in all caps.  The
challenge is dealing with names with a capital third letter like McKinley.
Is anyone willing to share some script examples or suggestions for this.  I
know all the syntax and what not, I'm looking for ideas for when to
capitalize the third letter, e.g., rules such as when the first two
characters are "mc" always capitalize third letter, etc.

Thanks in Advance,

Walter

Author
19 Jul 2006 3:56 PM
Arnie Rowland
O'Connor -Second Letter (or third character -does the apostophe come
through?)
or MacDonald -forth letter
There's a few names with fifth letter.

That's going to be an interesting function. Please share your results.

I've investigated and used third party products that are adept at name
standardization. Especially necessary with foreign names.


--
Arnie Rowland
Most good judgment comes from experience.
Most experience comes from bad judgment.
- Anonymous


Show quote
"Walter Mallon" <waltmal***@yahoo.com> wrote in message
news:%23PMl6e0qGHA.2180@TK2MSFTNGP05.phx.gbl...
> I'm writing script to properly case names that we import in all caps.  The
> challenge is dealing with names with a capital third letter like McKinley.
> Is anyone willing to share some script examples or suggestions for this.
> I know all the syntax and what not, I'm looking for ideas for when to
> capitalize the third letter, e.g., rules such as when the first two
> characters are "mc" always capitalize third letter, etc.
>
> Thanks in Advance,
>
> Walter
>
Author
19 Jul 2006 4:00 PM
Aaron Bertrand [SQL Server MVP]
How about last names like "Van der Wal" and "di Paulo"?  Fun, fun, indeed!



Show quote
"Arnie Rowland" <ar***@1568.com> wrote in message
news:u7DPWv0qGHA.148@TK2MSFTNGP04.phx.gbl...
> O'Connor -Second Letter (or third character -does the apostophe come
> through?)
> or MacDonald -forth letter
> There's a few names with fifth letter.
>
> That's going to be an interesting function. Please share your results.
>
> I've investigated and used third party products that are adept at name
> standardization. Especially necessary with foreign names.
>
>
> --
> Arnie Rowland
> Most good judgment comes from experience.
> Most experience comes from bad judgment.
> - Anonymous
>
>
> "Walter Mallon" <waltmal***@yahoo.com> wrote in message
> news:%23PMl6e0qGHA.2180@TK2MSFTNGP05.phx.gbl...
>> I'm writing script to properly case names that we import in all caps.
>> The challenge is dealing with names with a capital third letter like
>> McKinley. Is anyone willing to share some script examples or suggestions
>> for this. I know all the syntax and what not, I'm looking for ideas for
>> when to capitalize the third letter, e.g., rules such as when the first
>> two characters are "mc" always capitalize third letter, etc.
>>
>> Thanks in Advance,
>>
>> Walter
>>
>
>
Author
19 Jul 2006 4:21 PM
Arnie Rowland
I thought that was "van der Wal"

--
Arnie Rowland
Most good judgment comes from experience.
Most experience comes from bad judgment.
- Anonymous


Show quote
"Aaron Bertrand [SQL Server MVP]" <ten.xoc@dnartreb.noraa> wrote in message
news:%23rDNZx0qGHA.4864@TK2MSFTNGP03.phx.gbl...
> How about last names like "Van der Wal" and "di Paulo"?  Fun, fun, indeed!
>
>
>
> "Arnie Rowland" <ar***@1568.com> wrote in message
> news:u7DPWv0qGHA.148@TK2MSFTNGP04.phx.gbl...
>> O'Connor -Second Letter (or third character -does the apostophe come
>> through?)
>> or MacDonald -forth letter
>> There's a few names with fifth letter.
>>
>> That's going to be an interesting function. Please share your results.
>>
>> I've investigated and used third party products that are adept at name
>> standardization. Especially necessary with foreign names.
>>
>>
>> --
>> Arnie Rowland
>> Most good judgment comes from experience.
>> Most experience comes from bad judgment.
>> - Anonymous
>>
>>
>> "Walter Mallon" <waltmal***@yahoo.com> wrote in message
>> news:%23PMl6e0qGHA.2180@TK2MSFTNGP05.phx.gbl...
>>> I'm writing script to properly case names that we import in all caps.
>>> The challenge is dealing with names with a capital third letter like
>>> McKinley. Is anyone willing to share some script examples or suggestions
>>> for this. I know all the syntax and what not, I'm looking for ideas for
>>> when to capitalize the third letter, e.g., rules such as when the first
>>> two characters are "mc" always capitalize third letter, etc.
>>>
>>> Thanks in Advance,
>>>
>>> Walter
>>>
>>
>>
>
>
Author
19 Jul 2006 4:29 PM
Aaron Bertrand [SQL Server MVP]
I dated a Van der Wal with a capital V once, so I think it depends.  (Like
MacDonald vs. Macdonald vs. McDonald -- I've seen all three.)


Show quote
>I thought that was "van der Wal"
Author
19 Jul 2006 6:55 PM
Walter Mallon
Yes it is quite a quandry.  Maybe some one out there has dealt with this
before.


Walter


Show quote
"Aaron Bertrand [SQL Server MVP]" <ten.xoc@dnartreb.noraa> wrote in message
news:%23rDNZx0qGHA.4864@TK2MSFTNGP03.phx.gbl...
> How about last names like "Van der Wal" and "di Paulo"?  Fun, fun, indeed!
>
>
>
> "Arnie Rowland" <ar***@1568.com> wrote in message
> news:u7DPWv0qGHA.148@TK2MSFTNGP04.phx.gbl...
>> O'Connor -Second Letter (or third character -does the apostophe come
>> through?)
>> or MacDonald -forth letter
>> There's a few names with fifth letter.
>>
>> That's going to be an interesting function. Please share your results.
>>
>> I've investigated and used third party products that are adept at name
>> standardization. Especially necessary with foreign names.
>>
>>
>> --
>> Arnie Rowland
>> Most good judgment comes from experience.
>> Most experience comes from bad judgment.
>> - Anonymous
>>
>>
>> "Walter Mallon" <waltmal***@yahoo.com> wrote in message
>> news:%23PMl6e0qGHA.2180@TK2MSFTNGP05.phx.gbl...
>>> I'm writing script to properly case names that we import in all caps.
>>> The challenge is dealing with names with a capital third letter like
>>> McKinley. Is anyone willing to share some script examples or suggestions
>>> for this. I know all the syntax and what not, I'm looking for ideas for
>>> when to capitalize the third letter, e.g., rules such as when the first
>>> two characters are "mc" always capitalize third letter, etc.
>>>
>>> Thanks in Advance,
>>>
>>> Walter
>>>
>>
>>
>
>
Author
19 Jul 2006 4:37 PM
--CELKO--
Google over to Melissa Data and get some of their software for
scrubbing address and name data.  Also SSA and Group One has some good
stuff.
Author
19 Jul 2006 4:59 PM
Tony Rogerson
Hi Walter,

If you are using SQL Server 2005 you can adapt the Camel Case logic I've
written and described in my blog...

http://sqlblogcasts.com/blogs/tonyrogerson/search.aspx?q=camel+case&p=1

set nocount on

declare @seq table (
   seq int not null primary key

)

declare @i int

set @i = 1

while @i <= 50
begin
   insert @seq values( @i )
   set @i = @i + 1
end

declare @names table (
   word varchar(50) not null

)

declare @breaks table (
   break_on_character char(1) not null

)

declare @exclude table (
   subtext varchar(20) not null

)

insert @names ( word ) values( 'colin leversuch-roberts being a pain lol' )
insert @names ( word ) values( 'jim von trapp' )

insert @breaks ( break_on_character ) values( ' ' )
insert @breaks ( break_on_character ) values( '-' )

insert @exclude ( subtext ) values( ' von' )

select CamelCase =
   (
      select Camel
      from ( select case when seq = 1
                           or ( substring( n.word, seq-1, 1 ) IN ( select
break_on_character from @breaks )
                           and not exists ( select *
                                            from @exclude e
                                            where subtext = substring(
n.word, seq-1, len( subtext ) )
                                    ) )
                         then upper( substring( n.word, seq, 1 ) )
                         else lower( substring( n.word, seq, 1 ) )

                     end as [text()]
              from @seq
              where seq <= len( n.word )
              order by seq
              for xml path( '' ), type ) AS c ( Camel )
         for xml raw, type ).value( '/row[1]/Camel[1]', 'varchar(max)' )

from @names n


--
Tony Rogerson
SQL Server MVP
http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL
Server Consultant
http://sqlserverfaq.com - free video tutorials


Show quote
"Walter Mallon" <waltmal***@yahoo.com> wrote in message
news:%23PMl6e0qGHA.2180@TK2MSFTNGP05.phx.gbl...
> I'm writing script to properly case names that we import in all caps.  The
> challenge is dealing with names with a capital third letter like McKinley.
> Is anyone willing to share some script examples or suggestions for this.
> I know all the syntax and what not, I'm looking for ideas for when to
> capitalize the third letter, e.g., rules such as when the first two
> characters are "mc" always capitalize third letter, etc.
>
> Thanks in Advance,
>
> Walter
>
Author
19 Jul 2006 6:01 PM
Carl Imthurn
Walter --

A couple of questions and observations:

Approximately how many names are you dealing with?
Is this a one-time import or an ongoing process?
I guess for me the kicker would be one person who wanted to be McDonald, someone else who wanted to be Mcdonald . . . in the past I remember dealing with Joann and JoAnn and Julieann and JulieAnn . . .
You could get close, I believe, but cases like the above would have to be done manually, and if the process is indeed an ongoing import, perhaps storing the name in a separate column from the import and comparing it to incoming data would get you about as
close as you could get.

Carl

Walter Mallon wrote:
Show quote
> I'm writing script to properly case names that we import in all caps.  The
> challenge is dealing with names with a capital third letter like McKinley.
> Is anyone willing to share some script examples or suggestions for this.  I
> know all the syntax and what not, I'm looking for ideas for when to
> capitalize the third letter, e.g., rules such as when the first two
> characters are "mc" always capitalize third letter, etc.
>
> Thanks in Advance,
>
> Walter
>
>
Author
19 Jul 2006 6:58 PM
Walter Mallon
We get files from clients of upwards of 8 million names at a time for
ongoing import processes.  We currenlty have a Perl functionality that we
use to case names but are investigating replacing it with SQL in order to
remove a component from our servers.  It may or may not be feasable so I'm
just looking for suggestions at this point.

Thanks!


Show quote
"Carl Imthurn" <nospam@all.thanks> wrote in message
news:Oz4b1z1qGHA.1140@TK2MSFTNGP05.phx.gbl...
> Walter --
>
> A couple of questions and observations:
>
> Approximately how many names are you dealing with?
> Is this a one-time import or an ongoing process?
> I guess for me the kicker would be one person who wanted to be McDonald,
> someone else who wanted to be Mcdonald . . . in the past I remember
> dealing with Joann and JoAnn and Julieann and JulieAnn . . .
> You could get close, I believe, but cases like the above would have to be
> done manually, and if the process is indeed an ongoing import, perhaps
> storing the name in a separate column from the import and comparing it to
> incoming data would get you about as close as you could get.
>
> Carl
>
> Walter Mallon wrote:
>> I'm writing script to properly case names that we import in all caps.
>> The challenge is dealing with names with a capital third letter like
>> McKinley. Is anyone willing to share some script examples or suggestions
>> for this.  I know all the syntax and what not, I'm looking for ideas for
>> when to capitalize the third letter, e.g., rules such as when the first
>> two characters are "mc" always capitalize third letter, etc.
>>
>> Thanks in Advance,
>>
>> Walter
Author
19 Jul 2006 7:07 PM
Aaron Bertrand [SQL Server MVP]
> We get files from clients of upwards of 8 million names at a time for
> ongoing import processes.  We currenlty have a Perl functionality that we
> use to case names but are investigating replacing it with SQL in order to
> remove a component from our servers.

I'm sure you make T-SQL duplicate and perform whatever rules Perl is
applying.

The problem is still that you can't program for cases (no pun intended) like
those mentioned before: McDonald vs. Mcdonald.

Personally, I think it is safer to either (a) display the data exactly as
the user entered it; or, (b) just UPPER CASE it.

Because, if you guess wrong, you could frustrate or even offend people who
are very sensitive to the way their name is spelled.

A
Author
19 Jul 2006 7:39 PM
Arnie Rowland
I tend to recommend the upper() option for both names and addresses for the
simple reason that USPS prefers uppercase on any mailing piece.

However, I recommend that if you need to 'standardize' names and/or
addresses, you may find one of these vendors with a cost effective product.
I have installed and trained some of them for clients. Sometimes,
standardization is required in order by search effectively -think law
enforcement and the various ways someone may present their name in an
attempt to avoid having their previous records 'discovered'.

http://www.identitysystems.com/products/indexT.htm
http://www.name-searching.com/Name_Search_SQL_Server.html
http://www.intelligentsearch.com
http://www.softwarecompany.com/activex/activegender.htm
http://www.donaldfish.com/name_standardization.htm

--
Arnie Rowland
Most good judgment comes from experience.
Most experience comes from bad judgment.
- Anonymous


Show quote
"Aaron Bertrand [SQL Server MVP]" <ten.xoc@dnartreb.noraa> wrote in message
news:ug$H6Z2qGHA.3776@TK2MSFTNGP03.phx.gbl...
>> We get files from clients of upwards of 8 million names at a time for
>> ongoing import processes.  We currenlty have a Perl functionality that we
>> use to case names but are investigating replacing it with SQL in order to
>> remove a component from our servers.
>
> I'm sure you make T-SQL duplicate and perform whatever rules Perl is
> applying.
>
> The problem is still that you can't program for cases (no pun intended)
> like those mentioned before: McDonald vs. Mcdonald.
>
> Personally, I think it is safer to either (a) display the data exactly as
> the user entered it; or, (b) just UPPER CASE it.
>
> Because, if you guess wrong, you could frustrate or even offend people who
> are very sensitive to the way their name is spelled.
>
> A
>
Author
19 Jul 2006 8:33 PM
Aaron Bertrand [SQL Server MVP]
> think law enforcement and the various ways someone may present their name
> in an attempt to avoid having their previous records 'discovered'.

But that would require mis-spelling, not improper casing.  If I put a
warrant out for John Smith, john smith better not get away scot-free!
Author
19 Jul 2006 8:58 PM
Arnie Rowland
Was that for Jon Smith, or John Smythe, or John Smitth, or John Smeeth or
....

I had actually segued the topic into name standardization for searching-at
least in my mind (that is, what little mind I still have.)

--
Arnie Rowland
Most good judgment comes from experience.
Most experience comes from bad judgment.
- Anonymous


Show quote
"Aaron Bertrand [SQL Server MVP]" <ten.xoc@dnartreb.noraa> wrote in message
news:OQgrAK3qGHA.1140@TK2MSFTNGP05.phx.gbl...
>> think law enforcement and the various ways someone may present their name
>> in an attempt to avoid having their previous records 'discovered'.
>
> But that would require mis-spelling, not improper casing.  If I put a
> warrant out for John Smith, john smith better not get away scot-free!
>
Author
20 Jul 2006 2:26 PM
Walter Mallon
The Perl script calls a compiled dll type object to do the actual casing.  I
don't have access to that part of the code or I would just duplicate it.
That would make it a lot easier :)

Walt


Show quote
"Aaron Bertrand [SQL Server MVP]" <ten.xoc@dnartreb.noraa> wrote in message
news:ug$H6Z2qGHA.3776@TK2MSFTNGP03.phx.gbl...
>> We get files from clients of upwards of 8 million names at a time for
>> ongoing import processes.  We currenlty have a Perl functionality that we
>> use to case names but are investigating replacing it with SQL in order to
>> remove a component from our servers.
>
> I'm sure you make T-SQL duplicate and perform whatever rules Perl is
> applying.
>
> The problem is still that you can't program for cases (no pun intended)
> like those mentioned before: McDonald vs. Mcdonald.
>
> Personally, I think it is safer to either (a) display the data exactly as
> the user entered it; or, (b) just UPPER CASE it.
>
> Because, if you guess wrong, you could frustrate or even offend people who
> are very sensitive to the way their name is spelled.
>
> A
>
Author
19 Jul 2006 7:19 PM
Carl Imthurn
Here's one (of many) possibilities:
Design a ProperCaseNames table. You'll have to make some executive decisions, i.e. 'MCDONALD' will become 'McDonald' regardless of the possibility of someone wanting it to be 'Mcdonald'.
This table will have to be populated over time, but ultimately I believe it would contain 99.99% of possible non-standard proper-cased names (at least in the Western world; don't know much about non-Western names). One issue that would arise is how you
would discover that a particular name needed to be added.
Then, a simple UPDATE statement would suffice.

Not a perfect solution, but then again, there is none . . .

Carl

Walter Mallon wrote:
Show quote
> We get files from clients of upwards of 8 million names at a time for
> ongoing import processes.  We currenlty have a Perl functionality that we
> use to case names but are investigating replacing it with SQL in order to
> remove a component from our servers.  It may or may not be feasable so I'm
> just looking for suggestions at this point.
>
> Thanks!
>
Author
20 Jul 2006 2:25 PM
Walter Mallon
This is a good idea.  Thanks.

Walter


Show quote
"Carl Imthurn" <nospam@all.thanks> wrote in message
news:%23h$lYf2qGHA.3920@TK2MSFTNGP04.phx.gbl...
> Here's one (of many) possibilities:
> Design a ProperCaseNames table. You'll have to make some executive
> decisions, i.e. 'MCDONALD' will become 'McDonald' regardless of the
> possibility of someone wanting it to be 'Mcdonald'.
> This table will have to be populated over time, but ultimately I believe
> it would contain 99.99% of possible non-standard proper-cased names (at
> least in the Western world; don't know much about non-Western names). One
> issue that would arise is how you would discover that a particular name
> needed to be added.
> Then, a simple UPDATE statement would suffice.
>
> Not a perfect solution, but then again, there is none . . .
>
> Carl
>
> Walter Mallon wrote:
>> We get files from clients of upwards of 8 million names at a time for
>> ongoing import processes.  We currenlty have a Perl functionality that we
>> use to case names but are investigating replacing it with SQL in order to
>> remove a component from our servers.  It may or may not be feasable so
>> I'm just looking for suggestions at this point.
>>
>> Thanks!
>>

AddThis Social Bookmark Button