|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Casing a name like McKinleyI'm writing script to properly case names that we import in all caps. The
challenge is dealing with names with a capital third letter like McKinley. Is anyone willing to share some script examples or suggestions for this. I know all the syntax and what not, I'm looking for ideas for when to capitalize the third letter, e.g., rules such as when the first two characters are "mc" always capitalize third letter, etc. Thanks in Advance, Walter O'Connor -Second Letter (or third character -does the apostophe come
through?) or MacDonald -forth letter There's a few names with fifth letter. That's going to be an interesting function. Please share your results. I've investigated and used third party products that are adept at name standardization. Especially necessary with foreign names. -- Show quoteArnie Rowland Most good judgment comes from experience. Most experience comes from bad judgment. - Anonymous "Walter Mallon" <waltmal***@yahoo.com> wrote in message news:%23PMl6e0qGHA.2180@TK2MSFTNGP05.phx.gbl... > I'm writing script to properly case names that we import in all caps. The > challenge is dealing with names with a capital third letter like McKinley. > Is anyone willing to share some script examples or suggestions for this. > I know all the syntax and what not, I'm looking for ideas for when to > capitalize the third letter, e.g., rules such as when the first two > characters are "mc" always capitalize third letter, etc. > > Thanks in Advance, > > Walter > How about last names like "Van der Wal" and "di Paulo"? Fun, fun, indeed!
Show quote "Arnie Rowland" <ar***@1568.com> wrote in message news:u7DPWv0qGHA.148@TK2MSFTNGP04.phx.gbl... > O'Connor -Second Letter (or third character -does the apostophe come > through?) > or MacDonald -forth letter > There's a few names with fifth letter. > > That's going to be an interesting function. Please share your results. > > I've investigated and used third party products that are adept at name > standardization. Especially necessary with foreign names. > > > -- > Arnie Rowland > Most good judgment comes from experience. > Most experience comes from bad judgment. > - Anonymous > > > "Walter Mallon" <waltmal***@yahoo.com> wrote in message > news:%23PMl6e0qGHA.2180@TK2MSFTNGP05.phx.gbl... >> I'm writing script to properly case names that we import in all caps. >> The challenge is dealing with names with a capital third letter like >> McKinley. Is anyone willing to share some script examples or suggestions >> for this. I know all the syntax and what not, I'm looking for ideas for >> when to capitalize the third letter, e.g., rules such as when the first >> two characters are "mc" always capitalize third letter, etc. >> >> Thanks in Advance, >> >> Walter >> > > I thought that was "van der Wal"
-- Show quoteArnie Rowland Most good judgment comes from experience. Most experience comes from bad judgment. - Anonymous "Aaron Bertrand [SQL Server MVP]" <ten.xoc@dnartreb.noraa> wrote in message news:%23rDNZx0qGHA.4864@TK2MSFTNGP03.phx.gbl... > How about last names like "Van der Wal" and "di Paulo"? Fun, fun, indeed! > > > > "Arnie Rowland" <ar***@1568.com> wrote in message > news:u7DPWv0qGHA.148@TK2MSFTNGP04.phx.gbl... >> O'Connor -Second Letter (or third character -does the apostophe come >> through?) >> or MacDonald -forth letter >> There's a few names with fifth letter. >> >> That's going to be an interesting function. Please share your results. >> >> I've investigated and used third party products that are adept at name >> standardization. Especially necessary with foreign names. >> >> >> -- >> Arnie Rowland >> Most good judgment comes from experience. >> Most experience comes from bad judgment. >> - Anonymous >> >> >> "Walter Mallon" <waltmal***@yahoo.com> wrote in message >> news:%23PMl6e0qGHA.2180@TK2MSFTNGP05.phx.gbl... >>> I'm writing script to properly case names that we import in all caps. >>> The challenge is dealing with names with a capital third letter like >>> McKinley. Is anyone willing to share some script examples or suggestions >>> for this. I know all the syntax and what not, I'm looking for ideas for >>> when to capitalize the third letter, e.g., rules such as when the first >>> two characters are "mc" always capitalize third letter, etc. >>> >>> Thanks in Advance, >>> >>> Walter >>> >> >> > > I dated a Van der Wal with a capital V once, so I think it depends. (Like
MacDonald vs. Macdonald vs. McDonald -- I've seen all three.) Show quote >I thought that was "van der Wal" Yes it is quite a quandry. Maybe some one out there has dealt with this
before. Walter Show quote "Aaron Bertrand [SQL Server MVP]" <ten.xoc@dnartreb.noraa> wrote in message news:%23rDNZx0qGHA.4864@TK2MSFTNGP03.phx.gbl... > How about last names like "Van der Wal" and "di Paulo"? Fun, fun, indeed! > > > > "Arnie Rowland" <ar***@1568.com> wrote in message > news:u7DPWv0qGHA.148@TK2MSFTNGP04.phx.gbl... >> O'Connor -Second Letter (or third character -does the apostophe come >> through?) >> or MacDonald -forth letter >> There's a few names with fifth letter. >> >> That's going to be an interesting function. Please share your results. >> >> I've investigated and used third party products that are adept at name >> standardization. Especially necessary with foreign names. >> >> >> -- >> Arnie Rowland >> Most good judgment comes from experience. >> Most experience comes from bad judgment. >> - Anonymous >> >> >> "Walter Mallon" <waltmal***@yahoo.com> wrote in message >> news:%23PMl6e0qGHA.2180@TK2MSFTNGP05.phx.gbl... >>> I'm writing script to properly case names that we import in all caps. >>> The challenge is dealing with names with a capital third letter like >>> McKinley. Is anyone willing to share some script examples or suggestions >>> for this. I know all the syntax and what not, I'm looking for ideas for >>> when to capitalize the third letter, e.g., rules such as when the first >>> two characters are "mc" always capitalize third letter, etc. >>> >>> Thanks in Advance, >>> >>> Walter >>> >> >> > > Google over to Melissa Data and get some of their software for
scrubbing address and name data. Also SSA and Group One has some good stuff. Hi Walter,
If you are using SQL Server 2005 you can adapt the Camel Case logic I've written and described in my blog... http://sqlblogcasts.com/blogs/tonyrogerson/search.aspx?q=camel+case&p=1 set nocount on declare @seq table ( seq int not null primary key ) declare @i int set @i = 1 while @i <= 50 begin insert @seq values( @i ) set @i = @i + 1 end declare @names table ( word varchar(50) not null ) declare @breaks table ( break_on_character char(1) not null ) declare @exclude table ( subtext varchar(20) not null ) insert @names ( word ) values( 'colin leversuch-roberts being a pain lol' ) insert @names ( word ) values( 'jim von trapp' ) insert @breaks ( break_on_character ) values( ' ' ) insert @breaks ( break_on_character ) values( '-' ) insert @exclude ( subtext ) values( ' von' ) select CamelCase = ( select Camel from ( select case when seq = 1 or ( substring( n.word, seq-1, 1 ) IN ( select break_on_character from @breaks ) and not exists ( select * from @exclude e where subtext = substring( n.word, seq-1, len( subtext ) ) ) ) then upper( substring( n.word, seq, 1 ) ) else lower( substring( n.word, seq, 1 ) ) end as [text()] from @seq where seq <= len( n.word ) order by seq for xml path( '' ), type ) AS c ( Camel ) for xml raw, type ).value( '/row[1]/Camel[1]', 'varchar(max)' ) from @names n -- Show quoteTony Rogerson SQL Server MVP http://sqlblogcasts.com/blogs/tonyrogerson - technical commentary from a SQL Server Consultant http://sqlserverfaq.com - free video tutorials "Walter Mallon" <waltmal***@yahoo.com> wrote in message news:%23PMl6e0qGHA.2180@TK2MSFTNGP05.phx.gbl... > I'm writing script to properly case names that we import in all caps. The > challenge is dealing with names with a capital third letter like McKinley. > Is anyone willing to share some script examples or suggestions for this. > I know all the syntax and what not, I'm looking for ideas for when to > capitalize the third letter, e.g., rules such as when the first two > characters are "mc" always capitalize third letter, etc. > > Thanks in Advance, > > Walter > Walter --
A couple of questions and observations: Approximately how many names are you dealing with? Is this a one-time import or an ongoing process? I guess for me the kicker would be one person who wanted to be McDonald, someone else who wanted to be Mcdonald . . . in the past I remember dealing with Joann and JoAnn and Julieann and JulieAnn . . . You could get close, I believe, but cases like the above would have to be done manually, and if the process is indeed an ongoing import, perhaps storing the name in a separate column from the import and comparing it to incoming data would get you about as close as you could get. Carl Walter Mallon wrote: Show quote > I'm writing script to properly case names that we import in all caps. The > challenge is dealing with names with a capital third letter like McKinley. > Is anyone willing to share some script examples or suggestions for this. I > know all the syntax and what not, I'm looking for ideas for when to > capitalize the third letter, e.g., rules such as when the first two > characters are "mc" always capitalize third letter, etc. > > Thanks in Advance, > > Walter > > We get files from clients of upwards of 8 million names at a time for
ongoing import processes. We currenlty have a Perl functionality that we use to case names but are investigating replacing it with SQL in order to remove a component from our servers. It may or may not be feasable so I'm just looking for suggestions at this point. Thanks! Show quote "Carl Imthurn" <nospam@all.thanks> wrote in message news:Oz4b1z1qGHA.1140@TK2MSFTNGP05.phx.gbl... > Walter -- > > A couple of questions and observations: > > Approximately how many names are you dealing with? > Is this a one-time import or an ongoing process? > I guess for me the kicker would be one person who wanted to be McDonald, > someone else who wanted to be Mcdonald . . . in the past I remember > dealing with Joann and JoAnn and Julieann and JulieAnn . . . > You could get close, I believe, but cases like the above would have to be > done manually, and if the process is indeed an ongoing import, perhaps > storing the name in a separate column from the import and comparing it to > incoming data would get you about as close as you could get. > > Carl > > Walter Mallon wrote: >> I'm writing script to properly case names that we import in all caps. >> The challenge is dealing with names with a capital third letter like >> McKinley. Is anyone willing to share some script examples or suggestions >> for this. I know all the syntax and what not, I'm looking for ideas for >> when to capitalize the third letter, e.g., rules such as when the first >> two characters are "mc" always capitalize third letter, etc. >> >> Thanks in Advance, >> >> Walter > We get files from clients of upwards of 8 million names at a time for I'm sure you make T-SQL duplicate and perform whatever rules Perl is > ongoing import processes. We currenlty have a Perl functionality that we > use to case names but are investigating replacing it with SQL in order to > remove a component from our servers. applying. The problem is still that you can't program for cases (no pun intended) like those mentioned before: McDonald vs. Mcdonald. Personally, I think it is safer to either (a) display the data exactly as the user entered it; or, (b) just UPPER CASE it. Because, if you guess wrong, you could frustrate or even offend people who are very sensitive to the way their name is spelled. A I tend to recommend the upper() option for both names and addresses for the
simple reason that USPS prefers uppercase on any mailing piece. However, I recommend that if you need to 'standardize' names and/or addresses, you may find one of these vendors with a cost effective product. I have installed and trained some of them for clients. Sometimes, standardization is required in order by search effectively -think law enforcement and the various ways someone may present their name in an attempt to avoid having their previous records 'discovered'. http://www.identitysystems.com/products/indexT.htm http://www.name-searching.com/Name_Search_SQL_Server.html http://www.intelligentsearch.com http://www.softwarecompany.com/activex/activegender.htm http://www.donaldfish.com/name_standardization.htm -- Show quoteArnie Rowland Most good judgment comes from experience. Most experience comes from bad judgment. - Anonymous "Aaron Bertrand [SQL Server MVP]" <ten.xoc@dnartreb.noraa> wrote in message news:ug$H6Z2qGHA.3776@TK2MSFTNGP03.phx.gbl... >> We get files from clients of upwards of 8 million names at a time for >> ongoing import processes. We currenlty have a Perl functionality that we >> use to case names but are investigating replacing it with SQL in order to >> remove a component from our servers. > > I'm sure you make T-SQL duplicate and perform whatever rules Perl is > applying. > > The problem is still that you can't program for cases (no pun intended) > like those mentioned before: McDonald vs. Mcdonald. > > Personally, I think it is safer to either (a) display the data exactly as > the user entered it; or, (b) just UPPER CASE it. > > Because, if you guess wrong, you could frustrate or even offend people who > are very sensitive to the way their name is spelled. > > A > > think law enforcement and the various ways someone may present their name But that would require mis-spelling, not improper casing. If I put a > in an attempt to avoid having their previous records 'discovered'. warrant out for John Smith, john smith better not get away scot-free! Was that for Jon Smith, or John Smythe, or John Smitth, or John Smeeth or
.... I had actually segued the topic into name standardization for searching-at least in my mind (that is, what little mind I still have.) -- Show quoteArnie Rowland Most good judgment comes from experience. Most experience comes from bad judgment. - Anonymous "Aaron Bertrand [SQL Server MVP]" <ten.xoc@dnartreb.noraa> wrote in message news:OQgrAK3qGHA.1140@TK2MSFTNGP05.phx.gbl... >> think law enforcement and the various ways someone may present their name >> in an attempt to avoid having their previous records 'discovered'. > > But that would require mis-spelling, not improper casing. If I put a > warrant out for John Smith, john smith better not get away scot-free! > The Perl script calls a compiled dll type object to do the actual casing. I
don't have access to that part of the code or I would just duplicate it. That would make it a lot easier :) Walt Show quote "Aaron Bertrand [SQL Server MVP]" <ten.xoc@dnartreb.noraa> wrote in message news:ug$H6Z2qGHA.3776@TK2MSFTNGP03.phx.gbl... >> We get files from clients of upwards of 8 million names at a time for >> ongoing import processes. We currenlty have a Perl functionality that we >> use to case names but are investigating replacing it with SQL in order to >> remove a component from our servers. > > I'm sure you make T-SQL duplicate and perform whatever rules Perl is > applying. > > The problem is still that you can't program for cases (no pun intended) > like those mentioned before: McDonald vs. Mcdonald. > > Personally, I think it is safer to either (a) display the data exactly as > the user entered it; or, (b) just UPPER CASE it. > > Because, if you guess wrong, you could frustrate or even offend people who > are very sensitive to the way their name is spelled. > > A > Here's one (of many) possibilities:
Design a ProperCaseNames table. You'll have to make some executive decisions, i.e. 'MCDONALD' will become 'McDonald' regardless of the possibility of someone wanting it to be 'Mcdonald'. This table will have to be populated over time, but ultimately I believe it would contain 99.99% of possible non-standard proper-cased names (at least in the Western world; don't know much about non-Western names). One issue that would arise is how you would discover that a particular name needed to be added. Then, a simple UPDATE statement would suffice. Not a perfect solution, but then again, there is none . . . Carl Walter Mallon wrote: Show quote > We get files from clients of upwards of 8 million names at a time for > ongoing import processes. We currenlty have a Perl functionality that we > use to case names but are investigating replacing it with SQL in order to > remove a component from our servers. It may or may not be feasable so I'm > just looking for suggestions at this point. > > Thanks! > This is a good idea. Thanks.
Walter Show quote "Carl Imthurn" <nospam@all.thanks> wrote in message news:%23h$lYf2qGHA.3920@TK2MSFTNGP04.phx.gbl... > Here's one (of many) possibilities: > Design a ProperCaseNames table. You'll have to make some executive > decisions, i.e. 'MCDONALD' will become 'McDonald' regardless of the > possibility of someone wanting it to be 'Mcdonald'. > This table will have to be populated over time, but ultimately I believe > it would contain 99.99% of possible non-standard proper-cased names (at > least in the Western world; don't know much about non-Western names). One > issue that would arise is how you would discover that a particular name > needed to be added. > Then, a simple UPDATE statement would suffice. > > Not a perfect solution, but then again, there is none . . . > > Carl > > Walter Mallon wrote: >> We get files from clients of upwards of 8 million names at a time for >> ongoing import processes. We currenlty have a Perl functionality that we >> use to case names but are investigating replacing it with SQL in order to >> remove a component from our servers. It may or may not be feasable so >> I'm just looking for suggestions at this point. >> >> Thanks! >> |
|||||||||||||||||||||||