Home All Groups Group Topic Archive Search About

Wish List: String Concatenation Aggregate Operator

Author
31 Mar 2006 7:43 PM
Ian Boyd
i wish SQL defined a new aggregate operator (e.g. SUM, AVG, COUNT, STDEV,
etc) that would concatenate column values as strings.

Consider

Transactions Table

LCTID    TransactionDate    Amount    CurrencyCode
=====    ===============    ======    ============
1    3/12/2006    450.00    CAD
1    3/13/2006    300.00    USD
1    3/11/2006    100.00    CAD
2    3/15/2006    30.00    USD

i could do

SELECT
    LCTID,
    MAX(TransactionDate) AS LastTransactionDate,
    SUM(Amount) AS TotalAmount,
    CurrencyCode
FROM Tranactions
GROUP BY LCTID, CurrencyCode

and get

LCTID    LastTransactionDate    TotalAmount    CurrencyCode
=====    ===================    ===========    ============
1    3/12/2006    550.00    CAD
1    3/13/2006    300.00    USD
2    3/15/2006    30.00    USD

But what i really want to do is:

SELECT
    LCTID,
    MAX(TransactionDate) AS LastTransactionDate,
    CONCAT(CAST(Amount AS varchar(50))+' '+CurrencyCode, ', ') AS Amounts
FROM Tranactions
GROUP BY LCTID

and get
LCTID    LastTransactionDate    Amounts
===== =================== =======
1    3/13/2006    550.00 CAD, 300.00 USD
2    3/15/2006    30.00 USD

Where the 2nd parameter would be a "concatenation separator".

Maybe SQL Server X will let us write our own user-defined aggregate function
Show quote
:)

Author
31 Mar 2006 7:51 PM
Ian Boyd
Now that i think about it, having a UDF where SQL Server hands me a table
variable that contains the rows that needs to be aggregated into a single
scalar value would be great.

CREATE AGGREGATE myStringConcatenationAggregate
    (@rowsToAggregate TABLE)
RETURNS varchar
AS
BEGIN
    ...
END
Author
31 Mar 2006 10:26 PM
Erland Sommarskog
Ian Boyd (ian.msnews***@avatopia.com) writes:
> i wish SQL defined a new aggregate operator (e.g. SUM, AVG, COUNT, STDEV,
> etc) that would concatenate column values as strings.

Santa comes early this year! There is such an operator in SQL 2005,
although it's not in the corner you would look for it. From one of the
devs in the XML team, I got this query:

    select CustomerID,
            substring(OrdIdList, 1, datalength(OrdIdList)/2 - 1)
            -- strip the last ',' from the list
     from
       Customers c cross apply
       (select convert(nvarchar(30), OrderID) + ',' as [text()]
        from Orders o
        where o.CustomerID = c.CustomerID
        order by o.OrderID
        for xml path('')) as Dummy(OrdIdList)
     go


--
Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
Author
2 Apr 2006 10:08 PM
Hugo Kornelis
On Fri, 31 Mar 2006 14:43:02 -0500, Ian Boyd wrote:

>i wish SQL defined a new aggregate operator (e.g. SUM, AVG, COUNT, STDEV,
>etc) that would concatenate column values as strings.
(snip)
>Maybe SQL Server X will let us write our own user-defined aggregate function
>:)

Hi Ian,

Though you won't need it for this one (see Erland's post), you already
have the ability to write your own aggregate functions in SQL Server
2005, but you'll have to use the CLR.

Hyperlink to relevant content in SQL Server 2005 Books Online:

ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/denet9/html/bad9b7e8-5967-4afa-8dc8-6d840faf9372.htm

Hyperlink to the same content on the web:
  http://msdn2.microsoft.com/en-us/library/ms131057(SQL.90).aspx

--
Hugo Kornelis, SQL Server MVP
Author
3 Apr 2006 2:41 AM
Kent Tegels
Hello Hugo,

> Though you won't need it for this one (see Erland's post), you already
> have the ability to write your own aggregate functions in SQL Server
> 2005, but you'll have to use the CLR.

And here's an example:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text;
using System.IO;
[Serializable]
[SqlUserDefinedAggregate(Format.UserDefined,MaxByteSize=8000)]
public class CSVLister : IBinarySerialize {
    string sb;
    public void Read(BinaryReader r) {
        sb = r.ReadString();
    }
    public void Write(BinaryWriter w) {
        w.Write(sb);
    }
    public void Init() {
        sb = string.Empty;
    }
    public void Accumulate(SqlString Value) {
        sb += Value.Value;
        sb += ", ";
    }
    public void Merge(CSVLister Group) {
    }
    public SqlString Terminate() {
        return new SqlString(sb);
    }
}

This isn't intented for use on Multi-proc boxes for obvious reasons.

Thank you,
Kent Tegels
DevelopMentor
http://staff.develop.com/ktegels/
Author
3 Apr 2006 3:45 AM
Steve Kass
Kent,

Unfortunately, it's not only multi-processor boxes that
use the Merge() method in a user-defined aggregate.
Your aggregate fails on this query, which uses partial
aggregation in the plan.  (The 8000-byte limit is not
exceeded here.)

select
  count(CustomerID) as Custs,
  sum(len(agg)) as totalAggLen,
  dbo.CSVLister(agg) as OrderList
from (
  select CustomerID, dbo.CSVLister(OrderID) as agg
  from Northwind..Orders
  where EmployeeID = 2
  group by CustomerID
  union all
  select CustomerID, dbo.CSVLister(OrderID)
  from Northwind..Orders
  where EmployeeID = 3
  group by CustomerID
) T

-- output
Custs       totalAggLen OrderList  
----------- ----------- -------------
122          1439      

-- Query plan
StmtText                                                                                                                                                                                                                                       

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

  |--Compute
Scalar(DEFINE:([Expr1010]=CONVERT_IMPLICIT(int,[globalagg1020],0),
[Expr1011]=CASE WHEN [globalagg1022]=(0) THEN NULL ELSE [globalagg1024]
END, [Expr1012]=dbo.CSVLister([globalagg1026])))
       |--Stream
Aggregate(DEFINE:([globalagg1020]=SUM([partialagg1019]),
[globalagg1022]=SUM([partialagg1021]),
[globalagg1024]=SUM([partialagg1023]),
[globalagg1026]=dbo.CSVLister([partialagg1025])))
            |--Concatenation
                 |--Stream
Aggregate(DEFINE:([partialagg1019]=COUNT([Northwind].[dbo].[Orders].[CustomerID]),
[partialagg1021]=COUNT_BIG(len([Expr1003])),
[partialagg1023]=SUM(len([Expr1003])),
[partialagg1025]=dbo.CSVLister([Expr1003])))
                 |    |--Compute
Scalar(DEFINE:([Expr1003]=dbo.CSVLister([Expr1028])))
                 |         |--Stream Aggregate(GROUP
BY:([Northwind].[dbo].[Orders].[CustomerID])
DEFINE:([Expr1028]=dbo.CSVLister([Expr1013])))
                 |              |--Sort(ORDER
BY:([Northwind].[dbo].[Orders].[CustomerID] ASC))
                 |                   |--Compute
Scalar(DEFINE:([Expr1013]=CONVERT_IMPLICIT(nvarchar(4000),[Northwind].[dbo].[Orders].[OrderID],0)))
                 |                        |--Clustered Index
Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders]),
WHERE:([Northwind].[dbo].[Orders].[EmployeeID]=(2)))
                 |--Stream
Aggregate(DEFINE:([partialagg1019]=COUNT([Northwind].[dbo].[Orders].[CustomerID]),
[partialagg1021]=COUNT_BIG(len([Expr1007])),
[partialagg1023]=SUM(len([Expr1007])),
[partialagg1025]=dbo.CSVLister([Expr1007])))
                      |--Compute
Scalar(DEFINE:([Expr1007]=dbo.CSVLister([Expr1029])))
                           |--Stream Aggregate(GROUP
BY:([Northwind].[dbo].[Orders].[CustomerID])
DEFINE:([Expr1029]=dbo.CSVLister([Expr1014])))
                                |--Sort(ORDER
BY:([Northwind].[dbo].[Orders].[CustomerID] ASC))
                                     |--Compute
Scalar(DEFINE:([Expr1014]=CONVERT_IMPLICIT(nvarchar(4000),[Northwind].[dbo].[Orders].[OrderID],0)))
                                          |--Clustered Index
Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders]),
WHERE:([Northwind].[dbo].[Orders].[EmployeeID]=(3)))



The CLR is useful for a lot of things, but it's a lot
harder to write a correct CLR routine than you
might think.

Steve Kass
Drew University

Kent Tegels wrote:

Show quote
> Hello Hugo,
>
>> Though you won't need it for this one (see Erland's post), you already
>> have the ability to write your own aggregate functions in SQL Server
>> 2005, but you'll have to use the CLR.
>
>
> And here's an example:
>
> using System;
> using System.Data;
> using System.Data.SqlClient;
> using System.Data.SqlTypes;
> using Microsoft.SqlServer.Server;
> using System.Text;
> using System.IO;
> [Serializable]
> [SqlUserDefinedAggregate(Format.UserDefined,MaxByteSize=8000)]
> public class CSVLister : IBinarySerialize {
>    string sb;
>    public void Read(BinaryReader r) {
>        sb = r.ReadString();
>    }
>    public void Write(BinaryWriter w) {
>        w.Write(sb);
>    }
>    public void Init() {
>        sb = string.Empty;
>    }
>    public void Accumulate(SqlString Value) {
>        sb += Value.Value;
>        sb += ", ";
>    }
>    public void Merge(CSVLister Group) {
>    }
>    public SqlString Terminate() {
>        return new SqlString(sb);
>    }
> }
>
> This isn't intented for use on Multi-proc boxes for obvious reasons.
>
> Thank you,
> Kent Tegels
> DevelopMentor
> http://staff.develop.com/ktegels/
>
>
Author
3 Apr 2006 12:54 AM
--CELKO--
Again, look at what 1NF means and get enough math behind you to
understand why Dr. Codd does it that way.
Author
3 Apr 2006 7:27 AM
walter
In what way is this question much different than your SQL puzzle, like
the one of friday? It is just about munging the data into a form that
is directly usable by the client. SQL in itself is not very good at
that, (though SQL 2005 definately has some very usefull improvements).

I've got enough math to understand Dr. Codd (and Date) in his writings
about the RM. But remember that SQL is not an implementation of a RM at
all. Both Codd and Date (and you as well) often totally ignore the
practical problems the user is faced with. SQL is an implementation of
a model that tries to be a bridge between two worlds: The relational
world from a mathematical pov, and second a usable tool for solving
practical problems.

I'd rather had seen split a database into two layers, one purely
relational. Second a layer designed to mung your data into a form the
user wishes. We would not have those discussions where one confuses the
writings of 35 years ago with the current implementation of SQL. SQL is
not capable of handling the RM well, and certainly does not well in the
data munging area as well. But I accept it is the standard, and I have
to live with it, but please do not ever confuse the RM with SQL.

So please, mr Celko, please dive into the books again (I could
recommend 'An introduction to relational database systems' by C.J.DATE)
to see the differences :)

Walter,
Author
4 Apr 2006 9:04 PM
--CELKO--
>> what way is this question much different than your SQL puzzle, like the one of friday? <<

The puzzle was more specific as to the data required and not a general
function -- only two items, not (n) unspecific things.  The ordering
was well-defined, not vague.  There was no need of separators since the
data was in columns.

In the real world, I would have put each each job history in a row by
itself, so the code would port bertter and let the front end or middle
tier assemble the current and prior jobs into a single line for
display.

But do not confuse puzzles with good practices.  Back in the 1960's we
used to have Commerical FORTRAN packages on small machines (DEC, DG,
HP, and a ton of other companies that are loooong gone) that did
character math a la mock-COBOL.

"I told him I had been that morning at a meeting of the people called
Quakers, where I had heard a woman preach. Johnson: "Sir, a woman's
preaching is like a dog's walking on his hind legs. It is not done
well; but you are surprised to find it done at all."

--Boswell on Dr. Samuel Johnson --

Commercial FORTRAN libraries quite fast on the old 16-bit
mini-computers in their day.  But it was a kludge.  It was not good
programming.  I know this.

I can write crappy code with anyone. Hell, I did that at the start of
my career until I found S.P..  Confession: I separated my math from my
programming the first years when I started.  Then along came Structured
Programming, Dijkstra, Wirth, Myers, Yourdon, et al.  Programming went
from art to engineering.  All my math kicked in.

SQL is almost at the point that procedural code was before Structured
Programming.  We have a few tricks (Calendar tables, nested sets, etc.)
but we do not have an orderly 
approach to code yet.
Author
3 Apr 2006 7:28 AM
walter
In what way is this question much different than your SQL puzzle, like
the one of friday? It is just about munging the data into a form that
is directly usable by the client. SQL in itself is not very good at
that, (though SQL 2005 definately has some very usefull improvements).

I've got enough math to understand Dr. Codd (and Date) in his writings
about the RM. But remember that SQL is not an implementation of a RM at
all. Both Codd and Date (and you as well) often totally ignore the
practical problems the user is faced with. SQL is an implementation of
a model that tries to be a bridge between two worlds: The relational
world from a mathematical pov, and second a usable tool for solving
practical problems.

I'd rather had seen split a database into two layers, one purely
relational. Second a layer designed to mung your data into a form the
user wishes. We would not have those discussions where one confuses the
writings of 35 years ago with the current implementation of SQL. SQL is
not capable of handling the RM well, and certainly does not well in the
data munging area as well. But I accept it is the standard, and I have
to live with it, but please do not ever confuse the RM with SQL.

So please, mr Celko, please dive into the books again (I could
recommend 'An introduction to relational database systems' by C.J.DATE)
to see the differences :)

Walter,
Author
4 Apr 2006 9:29 PM
--CELKO--
>> I've got enough math to understand Dr. Codd (and Date) in his writings about the RM.<<

I have two separate Master's -- one in Math and one in Comp Sci and I
still get surprised ; Codd and Date are not the same RM model; no NULL
(Date), one NULL (RM #1 and SQL), and two NULLs (RM#2 and parts of
SQL-99 OLAP).

>> But remember that SQL is not an implementation of a RM at all. Both Codd and Date (and you as well) often totally ignore the practical problems the user is faced with. SQL is an implementation of a model that tries to be a bridge between two worlds: <<

Hey, I am the programmer's friend, they guy that publishes code and
consults with product developers and helps old ladies with RDBMS
problems.  But you are right about RM versus SQL.  I can add Algebra
versus FORTRAN to list, ICON versus Post production strings, LISP
versus lists, erc. Any programming language based on a math model will
not match.

>> I'd rather had seen split a database into two layers, one purely relational. Second a layer designed to mung your data into a form the user wishes <<

Me, too!  Tiered architecture at its best!  Raise you one:  put
busienss rules in a separate place that can be used by all the tiers.

>> We would not have those discussions where one confuses the writings of 35 years ago with the current implementation of SQL. <<

Gee, better not use the math of 8000 years ago when I am balancign my
books :)

>> SQL is not capable of handling the RM well, and certainly does not well in the data munging area as well. <<

Sorry, but it does RM very well, if youy learn it; it was never, never
meant to munge data.  We defined it without any real I/O on purpose.

>> But I accept it is the standard, and I have to live with it, but please do not ever confuse the RM with SQL. <<

Hey, I have never said I was the RM; I am the SQL guy.

>> So please, mr Celko, please dive into the books again (I could recommend 'An introduction to relational database systems' by C.J.DATE)to see the differences )  <<

I have all the editions and have read them cover to cover, summited
errata and been asked to review one edition.  I like the book, I
recommend it, but would prefer that peopel read Meyers THEORY for the
foundations.  If you thought freshman calculius was bad, this will kill
you....

I disagree with Date, and mostly agree with Codd on the RM model.  I
like NULLs and 3VL.  I also know when SQL should not be used. 
..
Author
6 Apr 2006 1:19 PM
walter
have two separate Master's -- one in Math and one in Comp Sci and I
still get surprised ; Codd and Date are not the same RM model; no NULL
(Date), one NULL (RM #1 and SQL), and two NULLs (RM#2 and parts of
SQL-99 OLAP).

Comp SCI/Software development with (non)relational databases specialty
here, not that I think it is relevant here. I'm well aware of all the
quarrels between CODD and DATE and I have to take sides here as I
simply do not agree with DATE in most of those issues here.

My point is however on that your referal to CODD in the other message
does not make much sense here. CODD did write the initial papers
leading the the RM. SQL in itself is a product that is developped with
the RM in mind, but is really a poor implementation of it, esspecially
as the RM was designed with the 'less is more' phrase in mind.
Author
6 Apr 2006 1:19 PM
walter
have two separate Master's -- one in Math and one in Comp Sci and I
still get surprised ; Codd and Date are not the same RM model; no NULL
(Date), one NULL (RM #1 and SQL), and two NULLs (RM#2 and parts of
SQL-99 OLAP).

Comp SCI/Software development with (non)relational databases specialty
here, not that I think it is relevant here. I'm well aware of all the
quarrels between CODD and DATE and I have to take sides here as I
simply do not agree with DATE in most of those issues here.

My point is however on that your referal to CODD in the other message
does not make much sense here. CODD did write the initial papers
leading the the RM. SQL in itself is a product that is developped with
the RM in mind, but is really a poor implementation of it, esspecially
as the RM was designed with the 'less is more' phrase in mind.

AddThis Social Bookmark Button