Home All Groups Group Topic Archive Search About

Sort Numbers in Nvarchar As Numbers

Author
28 Jul 2006 6:10 AM
Andrew Hayes
I've read several posts and articles about this, but they all see to revolve
around replacements and substrings and other paraphernalia. In my case the
nvarchar field can contain any type of numbers/letters in any combination,
but strings that are numbers should be sorted as such.

For example:

1
7
100
20
A8
G57X
91b17
B32

should be sorted as:

1
7
20
100
91b17
A8
B32
G57X

I believe that COLLATE can be used, but I want to only use it when I SELECT
since modifying the collation of the column in the table design gives me the
usual "Cannot resolve collation conflict" error.

I tried

    ORDER BY CAST(ProductCode AS nvarchar(20)) COLLATE Latin1_General_BIN

but that had no effect on the sort order. Nor did putting any other
collation in there seem to affect the output.

Is there any way to do this without using temporary tables or custom types?

Author
28 Jul 2006 6:33 AM
Uri Dimant
Andrew
create table #t (c varchar(20))

insert into #t values ('1')
insert into #t values ('7')
insert into #t values ('100')
insert into #t values ('20')
insert into #t values ('A8')
insert into #t values ('G57X')
insert into #t values ('91b17')
insert into #t values ('B32')

SELECT * FROM #t ORDER BY
substring(c, 0,patindex('%[0-9]%',c))+right ('00000' + substring(c,
patindex('%[0-9]%',c) , len(c)),5)



Show quote
"Andrew Hayes" <AndrewHa***@discussions.microsoft.com> wrote in message
news:ehj6mygsGHA.1596@TK2MSFTNGP05.phx.gbl...
> I've read several posts and articles about this, but they all see to
> revolve around replacements and substrings and other paraphernalia. In my
> case the nvarchar field can contain any type of numbers/letters in any
> combination, but strings that are numbers should be sorted as such.
>
> For example:
>
> 1
> 7
> 100
> 20
> A8
> G57X
> 91b17
> B32
>
> should be sorted as:
>
> 1
> 7
> 20
> 100
> 91b17
> A8
> B32
> G57X
>
> I believe that COLLATE can be used, but I want to only use it when I
> SELECT since modifying the collation of the column in the table design
> gives me the usual "Cannot resolve collation conflict" error.
>
> I tried
>
>    ORDER BY CAST(ProductCode AS nvarchar(20)) COLLATE Latin1_General_BIN
>
> but that had no effect on the sort order. Nor did putting any other
> collation in there seem to affect the output.
>
> Is there any way to do this without using temporary tables or custom
> types?
>
Author
28 Jul 2006 7:05 AM
Andrew Hayes
Thanks Uri. I modified the ORDER BY to be like you mentioned and it seemed
to give me the right result, but then I realised it's not exactly what I was
looking for.

That is, if the table contains:

91
91b17
91g
91fz64
91g8

It would sort those as:

91
91b17
91fz64
91g
91g8

I'll try messing about with patindex and substrings to see if I can get the
results I need.
Author
28 Jul 2006 7:16 AM
Uri Dimant
Andrew

Well ,  I can see two posible soultions .
1) Using a SORT opeartion on the CLIENT side istead of T-SQL
2) If you persist using T-SQL , try loop thru the cursor to get what you
want



Show quote
"Andrew Hayes" <AndrewHa***@discussions.microsoft.com> wrote in message
news:OsIQaRhsGHA.1304@TK2MSFTNGP06.phx.gbl...
> Thanks Uri. I modified the ORDER BY to be like you mentioned and it seemed
> to give me the right result, but then I realised it's not exactly what I
> was looking for.
>
> That is, if the table contains:
>
> 91
> 91b17
> 91g
> 91fz64
> 91g8
>
> It would sort those as:
>
> 91
> 91b17
> 91fz64
> 91g
> 91g8
>
> I'll try messing about with patindex and substrings to see if I can get
> the results I need.
>
Author
28 Jul 2006 7:18 AM
Andrew Hayes
Yeah. It would be simple if it was a fixed format, but it's a general column
used for multiple customers with different methods, so I have to be generic
about it.

Will see how it goes. Thanks for your help though. It's given me a few new
ideas to try.

Show quote
"Uri Dimant" <u***@iscar.co.il> wrote in message
news:uokUQWhsGHA.3264@TK2MSFTNGP03.phx.gbl...
> Andrew
>
> Well ,  I can see two posible soultions .
> 1) Using a SORT opeartion on the CLIENT side istead of T-SQL
> 2) If you persist using T-SQL , try loop thru the cursor to get what you
> want
>
>
>
> "Andrew Hayes" <AndrewHa***@discussions.microsoft.com> wrote in message
> news:OsIQaRhsGHA.1304@TK2MSFTNGP06.phx.gbl...
>> Thanks Uri. I modified the ORDER BY to be like you mentioned and it
>> seemed to give me the right result, but then I realised it's not exactly
>> what I was looking for.
>>
>> That is, if the table contains:
>>
>> 91
>> 91b17
>> 91g
>> 91fz64
>> 91g8
>>
>> It would sort those as:
>>
>> 91
>> 91b17
>> 91fz64
>> 91g
>> 91g8
>>
>> I'll try messing about with patindex and substrings to see if I can get
>> the results I need.
>>
>
>
Author
28 Jul 2006 11:19 AM
Steve Kass
Andrew,

Here are two solutions that allow you to sort, if the rule
is as follows: Separate each identifier into its contiguous
alphabetic and numeric "runs".  Sort the identifiers by
first "run", then second "run", then third "run", etc.  Sort
numeric values before alphabetic ones, and sort numeric
values by number, alphabetic values lexicographically.

Before giving the examples, I'll make one more suggestion -
if each customer's [id]s sort correctly among themselves,
can you sort by customer first, or is there a real business
reason to define an ordering that tells you whether one
of customer X's names comes before or after one of
customer Y's?

Anyway, using Uri's setup, here are the two solutions.
Each has limitations, and the UDF might better be written
using CLR routines, if you are using SQL Server 2005.
The CTE solution can also be rewritten using techniques
from SQL Server 2000, though it's a bit messier - the two
solutions are not that much different, really - they both split
the identifiers up into their pieces, then sort the pieces,
right-justifying numeric segments and left-justifying string
ones.

create function Stretch (
  @c varchar(20)
) returns varchar(200) as begin
  declare @return varchar(200) set @return = ''
  declare @chunk varchar(10) set @chunk = ''
  declare @pos int set @pos = 0
  declare @ten int set @ten = 10
  while @pos <= len(@c) begin
    set @chunk = @chunk + substring(@c,@pos,1)
    while substring(@c,@pos,2)+'0' like '[0-9][0-9]%'
    or substring(@c,@pos,2)+'a' like '[^0-9][^0-9]%' begin
      set @pos = @pos + 1
      set @chunk = @chunk + substring(@c,@pos,1)
    end
    set @pos = @pos + 1
    if @chunk like '[0-9]%'
      set @return = @return + space(10-len(@chunk)) + @chunk
    else
      set @return = @return + @chunk + space(10-len(@chunk))
    set @chunk = ''
  end
  return @return
end
go


create table #t (c varchar(20))

insert into #t values ('1')
insert into #t values ('7')
insert into #t values ('100')
insert into #t values ('20')
insert into #t values ('A8')
insert into #t values ('G57X')
insert into #t values ('91a17')
insert into #t values ('91ac7')
insert into #t values ('91b17')
insert into #t values ('B32')
go

with Nbr(n) as (
  select distinct number
  from master..spt_values
  where number between 0 and 50
), Tsplits(c,n,rk) as (
  select c, n, row_number() over (partition by c order by n) as rk
  from #t, Nbr
  where n <= len(c)+1
  and (
    n = len(c)+1
    or substring('a'+c,n,2) like '[^0-9][0-9]'
    or substring('0'+c,n,2) like '[0-9][^0-9]'
  )
), Tchunks(c,s,p) as (
  select
    c, substring(c,min(n),max(n)-min(n)), min(rk)
  from Tsplits
  cross join (
    select 0 as i union all select 1
  ) Two
  group by c,rk+i
  having count(n) = 2
), TchunksSlid(c,s,p) as (
  select
    c,
    case when s like '[0-9]%'
      then right('0000000000'+s,10)
    else s end,
    p
  from Tchunks
)
  select c from TchunksSlid
  group by c
  order by
    max(case when p = 1 then s else '' end),
    max(case when p = 2 then s else '' end),
    max(case when p = 3 then s else '' end),
    max(case when p = 4 then s else '' end)

go

select c from #t
order by dbo.Stretch(c)
go

drop table #t
drop function Stretch

-- Steve Kass
-- Drew University
-- http://www.stevekass.com
-- 8F2E85D3-2C21-4855-94F3-9A600F4BC58B


Andrew Hayes wrote:

Show quote
>Yeah. It would be simple if it was a fixed format, but it's a general column
>used for multiple customers with different methods, so I have to be generic
>about it.
>
>Will see how it goes. Thanks for your help though. It's given me a few new
>ideas to try.
>
>"Uri Dimant" <u***@iscar.co.il> wrote in message
>news:uokUQWhsGHA.3264@TK2MSFTNGP03.phx.gbl...

>
>>Andrew
>>
>>Well ,  I can see two posible soultions .
>>1) Using a SORT opeartion on the CLIENT side istead of T-SQL
>>2) If you persist using T-SQL , try loop thru the cursor to get what you
>>want
>>
>>
>>
>>"Andrew Hayes" <AndrewHa***@discussions.microsoft.com> wrote in message
>>news:OsIQaRhsGHA.1304@TK2MSFTNGP06.phx.gbl...
>>   
>>
>>>Thanks Uri. I modified the ORDER BY to be like you mentioned and it
>>>seemed to give me the right result, but then I realised it's not exactly
>>>what I was looking for.
>>>
>>>That is, if the table contains:
>>>
>>>91
>>>91b17
>>>91g
>>>91fz64
>>>91g8
>>>
>>>It would sort those as:
>>>
>>>91
>>>91b17
>>>91fz64
>>>91g
>>>91g8
>>>
>>>I'll try messing about with patindex and substrings to see if I can get
>>>the results I need.
>>>
>>>     
>>>
>>   
>>
>
>

>
Author
29 Jul 2006 1:43 AM
Mike Labosh
Show quote
> That is, if the table contains:
>
> 91
> 91b17
> 91g
> 91fz64
> 91g8
>
> It would sort those as:
>
> 91
> 91b17
> 91fz64
> 91g
> 91g8

It would be so cool, if you could get away with something outrageous like
this:

SELECT c, ISNUMERIC(c) AS ItsANumber
FROM #t
CASE
WHEN ISNUMERIC(c) THEN ORDER BY CAST(c AS INT)
ELSE ORDER BY c
END

And then leave it buried way down deep in the system, without even a single
comment, just waiting for the poor guy that comes after you.
--

Peace & happy computing,

Mike Labosh, MCSD MCT
Owner, vbSensei.Com
"y = (-b ± (b^2 - 4 * a * c)^.5) / 2 * a" -- Dr. Houser
Author
28 Jul 2006 6:36 AM
Chris Lim
Andrew Hayes wrote:
> I've read several posts and articles about this, but they all see to revolve
> around replacements and substrings and other paraphernalia. In my case the
> nvarchar field can contain any type of numbers/letters in any combination,
> but strings that are numbers should be sorted as such.

Have you tried something like:

ORDER BY
            CASE WHEN ISNUMERIC(ProductCode) = 1
                            THEN CAST(ProductCode AS INT)
                      ELSE <2billion,or whatever max INT value can be>
            END,
            ProductCode

I don't have access to SQL Server at the moment so not sure if that
will work or not, but give it a go. Note that ISNUMERIC returns 1 for
decimals as well, so if PATINDEX might be better if you only want
integers (e.g. PATINDEX('%[^0-9]%') = 0).
Author
28 Jul 2006 6:48 AM
Uri Dimant
Chris
if you add insert into #t values ('1.') to my DDL ,sql server will throw an
error. In some sitations it is not reliable to use ISNUMERIC()
Aaron has a great article on his web www.aspfaq.com  site about the subject





Show quote
"Chris Lim" <blackca***@hotmail.com> wrote in message
news:1154068610.636318.52980@75g2000cwc.googlegroups.com...
> Andrew Hayes wrote:
>> I've read several posts and articles about this, but they all see to
>> revolve
>> around replacements and substrings and other paraphernalia. In my case
>> the
>> nvarchar field can contain any type of numbers/letters in any
>> combination,
>> but strings that are numbers should be sorted as such.
>
> Have you tried something like:
>
> ORDER BY
>            CASE WHEN ISNUMERIC(ProductCode) = 1
>                            THEN CAST(ProductCode AS INT)
>                      ELSE <2billion,or whatever max INT value can be>
>            END,
>            ProductCode
>
> I don't have access to SQL Server at the moment so not sure if that
> will work or not, but give it a go. Note that ISNUMERIC returns 1 for
> decimals as well, so if PATINDEX might be better if you only want
> integers (e.g. PATINDEX('%[^0-9]%') = 0).
>
Author
29 Jul 2006 3:04 AM
Mike Labosh
> if you add insert into #t values ('1.') to my DDL ,sql server will throw
> an error. In some sitations it is not reliable to use ISNUMERIC()
> Aaron has a great article on his web www.aspfaq.com  site about the
> subject

There is a similar evil behavior for currencies:

"$1, 234.56" is clearly not a numeric string, but on a system configured for
USA Dollars (via Control Panel -> Regional Settings) says this:

IF ISNUMERIC("$1,234.56"), it succeeds, but then on the same server, if you
execute this:

IF ISNUMERIC("EU1,234.56") (for Euros) it fails.

Then on that same server, you can visit the Regional Settings and tell it to
use EUROS instead of US Dollars, and you will get opposite behavior:

ISNUMERIC("$1,234.56") is not, but ISNUMERIC("EU1,234.56") will be.

What a painful afternoon THAT was!

Ditto on dates:  When a user types "5/6/70", is that May 6th or June 5th?
Depending on how the Regional Settings are set, it will be interpreted as
one or the other.

And it gets even better.  Your server has a regional settings configuration,
and so does the client workstation, and if they are not the same, both
computers can interpret the same value differently.  For both numeric
currency AND dates.

GRR!!
--

Peace & happy computing,

Mike Labosh, MCSD MCT
Owner, vbSensei.Com
"y = (-b ± (b^2 - 4 * a * c)^.5) / 2 * a" -- Dr. Houser
Author
29 Jul 2006 3:21 AM
Steve Kass
All of which shows how important it is to leave as little
as possible to chance.  Either use culture-independent
literals (For dates, 'YYYYMMDD', and for currency,
decimal literals with no comma or currency symbol.),
or explicity convert.

If you think the date and number settings are a mess,
try troubleshooting non-Unicode code page incompatibilities.

SK


Mike Labosh wrote:

Show quote
>>if you add insert into #t values ('1.') to my DDL ,sql server will throw
>>an error. In some sitations it is not reliable to use ISNUMERIC()
>>Aaron has a great article on his web www.aspfaq.com  site about the
>>subject
>>   
>>
>
>There is a similar evil behavior for currencies:
>
>"$1, 234.56" is clearly not a numeric string, but on a system configured for
>USA Dollars (via Control Panel -> Regional Settings) says this:
>
>IF ISNUMERIC("$1,234.56"), it succeeds, but then on the same server, if you
>execute this:
>
>IF ISNUMERIC("EU1,234.56") (for Euros) it fails.
>
>Then on that same server, you can visit the Regional Settings and tell it to
>use EUROS instead of US Dollars, and you will get opposite behavior:
>
>ISNUMERIC("$1,234.56") is not, but ISNUMERIC("EU1,234.56") will be.
>
>What a painful afternoon THAT was!
>
>Ditto on dates:  When a user types "5/6/70", is that May 6th or June 5th?
>Depending on how the Regional Settings are set, it will be interpreted as
>one or the other.
>
>And it gets even better.  Your server has a regional settings configuration,
>and so does the client workstation, and if they are not the same, both
>computers can interpret the same value differently.  For both numeric
>currency AND dates.
>
>GRR!!

>
Author
29 Jul 2006 4:05 AM
Mike Labosh
> All of which shows how important it is to leave as little
> as possible to chance.  Either use culture-independent
> literals (For dates, 'YYYYMMDD', and for currency,
> decimal literals with no comma or currency symbol.),
> or explicity convert.

OMG!  I have been talking to databases since I was 9 years old, and NOW the
light just clicked on:  This is precisely why Mainframers do bizarre COBOL
nonsense that stores it as X8 data type as yyyymmdd, so you can cast it to a
number, and it has place value that helps you do date-diffs..  And more
importantly, it is unmistakably deterministic.

I always thought it was because 1,000 years ago, disk capacity cost so much
that the ancient Romans had to save bytes.  (or something like that)

But the catistrophic bug is built in:   When the date value of 31-Dec-9999
rolls over to the next day, all our code is guaranteed to break because of
that pesky "Y-10-K" bug.

Dammit, back to the drawing board.
--

Peace & happy computing,

Mike Labosh, MCSD MCT
Owner, vbSensei.Com
"y = (-b ± (b^2 - 4 * a * c)^.5) / 2 * a" -- Dr. Houser

And if it really takes 6.02E+23 protons to make a "mole", is that why those
critters are so little?
Author
29 Jul 2006 3:56 AM
Mike Labosh
After having carefully studied this entire thread, and hacking around inside
QA for 4 hours or something, a buddy of mine got you *ALMOST* there:

--drop table #t
create table #t(f varchar(10))
insert #t values('10')
insert #t values('1')
insert #t values('100')
insert #t values('91')
insert #t values('91b17')
insert #t values('91g')
insert #t values('91fz64')
insert #t values('91g8')

select f2, f from
(
select f, f2 =
  cast(
   case when isnumeric(substring(f,1,1))= 1 then substring(f,1,1) else ''
end +
   case when isnumeric(substring(f,2,1))= 1 then substring(f,2,1) else ''
end +
   case when isnumeric(substring(f,3,1))= 1 then substring(f,3,1) else ''
end +
   case when isnumeric(substring(f,4,1))= 1 then substring(f,4,1) else ''
end +
   case when isnumeric(substring(f,5,1))= 1 then substring(f,5,1) else ''
end +
   case when isnumeric(substring(f,6,1))= 1 then substring(f,6,1) else ''
end
  as int)
from #t
) t
order by f2

The only problem is that (1) is the filthiest hack I have ever seen and (2)
it does not sort the 100 after the 91b.

But I'm still chewing on it some more.  I dig questions of this bizarre
nature because they stimulate my brain, and pull my (married) mind away from
the totally hot babes sitting in my class this week.

So congradulations:  My wife will hate you forever because your interesting
problem kept me sitting at my computer for two days straight.  :)


--

Peace & happy computing,

Mike Labosh, MCSD MCT
Owner, vbSensei.Com
"y = (-b ± (b^2 - 4 * a * c)^.5) / 2 * a" -- Dr. Houser
Author
29 Jul 2006 4:18 AM
Steve Kass
Did sorting by my Stretch() function get the order wrong?

SK

Mike Labosh wrote:

Show quote
>After having carefully studied this entire thread, and hacking around inside
>QA for 4 hours or something, a buddy of mine got you *ALMOST* there:
>
>--drop table #t
>create table #t(f varchar(10))
>insert #t values('10')
>insert #t values('1')
>insert #t values('100')
>insert #t values('91')
>insert #t values('91b17')
>insert #t values('91g')
>insert #t values('91fz64')
>insert #t values('91g8')
>
>select f2, f from
> (
> select f, f2 =
>  cast(
>   case when isnumeric(substring(f,1,1))= 1 then substring(f,1,1) else ''
>end +
>   case when isnumeric(substring(f,2,1))= 1 then substring(f,2,1) else ''
>end +
>   case when isnumeric(substring(f,3,1))= 1 then substring(f,3,1) else ''
>end +
>   case when isnumeric(substring(f,4,1))= 1 then substring(f,4,1) else ''
>end +
>   case when isnumeric(substring(f,5,1))= 1 then substring(f,5,1) else ''
>end +
>   case when isnumeric(substring(f,6,1))= 1 then substring(f,6,1) else ''
>end
>  as int)
> from #t
> ) t
>order by f2
>
>The only problem is that (1) is the filthiest hack I have ever seen and (2)
>it does not sort the 100 after the 91b.
>
>But I'm still chewing on it some more.  I dig questions of this bizarre
>nature because they stimulate my brain, and pull my (married) mind away from
>the totally hot babes sitting in my class this week.
>
>So congradulations:  My wife will hate you forever because your interesting
>problem kept me sitting at my computer for two days straight.  :)
>
>

>
Author
29 Jul 2006 7:49 AM
Chris Lim
Mike Labosh wrote:
> The only problem is that (1) is the filthiest hack I have ever seen and (2)
> it does not sort the 100 after the 91b.

I thought the OP only wanted strings that were entirely numbers to be
treated as numbers, and strings that were were alphanumeric to be
sorted as strings? The required order of his sample data was:

1
7
20
100
91b17
A8
B32
G57X

Have I totally missed the point? (I think I have, given that everyone
else has interpreted the requirements differently to me!)

Chris

AddThis Social Bookmark Button