|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Wish List: String Concatenation Aggregate Operatoretc) that would concatenate column values as strings. Consider Transactions Table LCTID TransactionDate Amount CurrencyCode ===== =============== ====== ============ 1 3/12/2006 450.00 CAD 1 3/13/2006 300.00 USD 1 3/11/2006 100.00 CAD 2 3/15/2006 30.00 USD i could do SELECT LCTID, MAX(TransactionDate) AS LastTransactionDate, SUM(Amount) AS TotalAmount, CurrencyCode FROM Tranactions GROUP BY LCTID, CurrencyCode and get LCTID LastTransactionDate TotalAmount CurrencyCode ===== =================== =========== ============ 1 3/12/2006 550.00 CAD 1 3/13/2006 300.00 USD 2 3/15/2006 30.00 USD But what i really want to do is: SELECT LCTID, MAX(TransactionDate) AS LastTransactionDate, CONCAT(CAST(Amount AS varchar(50))+' '+CurrencyCode, ', ') AS Amounts FROM Tranactions GROUP BY LCTID and get LCTID LastTransactionDate Amounts ===== =================== ======= 1 3/13/2006 550.00 CAD, 300.00 USD 2 3/15/2006 30.00 USD Where the 2nd parameter would be a "concatenation separator". Maybe SQL Server X will let us write our own user-defined aggregate function Show quote :) Now that i think about it, having a UDF where SQL Server hands me a table
variable that contains the rows that needs to be aggregated into a single scalar value would be great. CREATE AGGREGATE myStringConcatenationAggregate (@rowsToAggregate TABLE) RETURNS varchar AS BEGIN ... END Ian Boyd (ian.msnews***@avatopia.com) writes:
> i wish SQL defined a new aggregate operator (e.g. SUM, AVG, COUNT, STDEV, Santa comes early this year! There is such an operator in SQL 2005, > etc) that would concatenate column values as strings. although it's not in the corner you would look for it. From one of the devs in the XML team, I got this query: select CustomerID, substring(OrdIdList, 1, datalength(OrdIdList)/2 - 1) -- strip the last ',' from the list from Customers c cross apply (select convert(nvarchar(30), OrderID) + ',' as [text()] from Orders o where o.CustomerID = c.CustomerID order by o.OrderID for xml path('')) as Dummy(OrdIdList) go -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx On Fri, 31 Mar 2006 14:43:02 -0500, Ian Boyd wrote:
>i wish SQL defined a new aggregate operator (e.g. SUM, AVG, COUNT, STDEV, Hi Ian,>etc) that would concatenate column values as strings. (snip) >Maybe SQL Server X will let us write our own user-defined aggregate function >:) Though you won't need it for this one (see Erland's post), you already have the ability to write your own aggregate functions in SQL Server 2005, but you'll have to use the CLR. Hyperlink to relevant content in SQL Server 2005 Books Online: ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/denet9/html/bad9b7e8-5967-4afa-8dc8-6d840faf9372.htm Hyperlink to the same content on the web: http://msdn2.microsoft.com/en-us/library/ms131057(SQL.90).aspx -- Hugo Kornelis, SQL Server MVP Hello Hugo,
> Though you won't need it for this one (see Erland's post), you already And here's an example:> have the ability to write your own aggregate functions in SQL Server > 2005, but you'll have to use the CLR. using System; using System.Data; using System.Data.SqlClient; using System.Data.SqlTypes; using Microsoft.SqlServer.Server; using System.Text; using System.IO; [Serializable] [SqlUserDefinedAggregate(Format.UserDefined,MaxByteSize=8000)] public class CSVLister : IBinarySerialize { string sb; public void Read(BinaryReader r) { sb = r.ReadString(); } public void Write(BinaryWriter w) { w.Write(sb); } public void Init() { sb = string.Empty; } public void Accumulate(SqlString Value) { sb += Value.Value; sb += ", "; } public void Merge(CSVLister Group) { } public SqlString Terminate() { return new SqlString(sb); } } This isn't intented for use on Multi-proc boxes for obvious reasons. Thank you, Kent Tegels DevelopMentor http://staff.develop.com/ktegels/ Kent,
Unfortunately, it's not only multi-processor boxes that use the Merge() method in a user-defined aggregate. Your aggregate fails on this query, which uses partial aggregation in the plan. (The 8000-byte limit is not exceeded here.) select count(CustomerID) as Custs, sum(len(agg)) as totalAggLen, dbo.CSVLister(agg) as OrderList from ( select CustomerID, dbo.CSVLister(OrderID) as agg from Northwind..Orders where EmployeeID = 2 group by CustomerID union all select CustomerID, dbo.CSVLister(OrderID) from Northwind..Orders where EmployeeID = 3 group by CustomerID ) T -- output Custs totalAggLen OrderList ----------- ----------- ------------- 122 1439 -- Query plan StmtText ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |--Compute Scalar(DEFINE:([Expr1010]=CONVERT_IMPLICIT(int,[globalagg1020],0), [Expr1011]=CASE WHEN [globalagg1022]=(0) THEN NULL ELSE [globalagg1024] END, [Expr1012]=dbo.CSVLister([globalagg1026]))) |--Stream Aggregate(DEFINE:([globalagg1020]=SUM([partialagg1019]), [globalagg1022]=SUM([partialagg1021]), [globalagg1024]=SUM([partialagg1023]), [globalagg1026]=dbo.CSVLister([partialagg1025]))) |--Concatenation Aggregate(DEFINE:([partialagg1019]=COUNT([Northwind].[dbo].[Orders].[CustomerID]), |--Stream [partialagg1021]=COUNT_BIG(len([Expr1003])), [partialagg1023]=SUM(len([Expr1003])), [partialagg1025]=dbo.CSVLister([Expr1003]))) | |--Compute Scalar(DEFINE:([Expr1003]=dbo.CSVLister([Expr1028]))) | |--Stream Aggregate(GROUP BY:([Northwind].[dbo].[Orders].[CustomerID]) DEFINE:([Expr1028]=dbo.CSVLister([Expr1013]))) | |--Sort(ORDER BY:([Northwind].[dbo].[Orders].[CustomerID] ASC)) | |--Compute Scalar(DEFINE:([Expr1013]=CONVERT_IMPLICIT(nvarchar(4000),[Northwind].[dbo].[Orders].[OrderID],0))) | |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders]), WHERE:([Northwind].[dbo].[Orders].[EmployeeID]=(2))) |--Stream Aggregate(DEFINE:([partialagg1019]=COUNT([Northwind].[dbo].[Orders].[CustomerID]), [partialagg1021]=COUNT_BIG(len([Expr1007])), [partialagg1023]=SUM(len([Expr1007])), [partialagg1025]=dbo.CSVLister([Expr1007]))) |--Compute Scalar(DEFINE:([Expr1007]=dbo.CSVLister([Expr1029]))) |--Stream Aggregate(GROUP BY:([Northwind].[dbo].[Orders].[CustomerID]) DEFINE:([Expr1029]=dbo.CSVLister([Expr1014]))) |--Sort(ORDER BY:([Northwind].[dbo].[Orders].[CustomerID] ASC)) |--Compute Scalar(DEFINE:([Expr1014]=CONVERT_IMPLICIT(nvarchar(4000),[Northwind].[dbo].[Orders].[OrderID],0))) |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders]), WHERE:([Northwind].[dbo].[Orders].[EmployeeID]=(3))) The CLR is useful for a lot of things, but it's a lot harder to write a correct CLR routine than you might think. Steve Kass Drew University Kent Tegels wrote: Show quote > Hello Hugo, > >> Though you won't need it for this one (see Erland's post), you already >> have the ability to write your own aggregate functions in SQL Server >> 2005, but you'll have to use the CLR. > > > And here's an example: > > using System; > using System.Data; > using System.Data.SqlClient; > using System.Data.SqlTypes; > using Microsoft.SqlServer.Server; > using System.Text; > using System.IO; > [Serializable] > [SqlUserDefinedAggregate(Format.UserDefined,MaxByteSize=8000)] > public class CSVLister : IBinarySerialize { > string sb; > public void Read(BinaryReader r) { > sb = r.ReadString(); > } > public void Write(BinaryWriter w) { > w.Write(sb); > } > public void Init() { > sb = string.Empty; > } > public void Accumulate(SqlString Value) { > sb += Value.Value; > sb += ", "; > } > public void Merge(CSVLister Group) { > } > public SqlString Terminate() { > return new SqlString(sb); > } > } > > This isn't intented for use on Multi-proc boxes for obvious reasons. > > Thank you, > Kent Tegels > DevelopMentor > http://staff.develop.com/ktegels/ > > Again, look at what 1NF means and get enough math behind you to
understand why Dr. Codd does it that way. In what way is this question much different than your SQL puzzle, like
the one of friday? It is just about munging the data into a form that is directly usable by the client. SQL in itself is not very good at that, (though SQL 2005 definately has some very usefull improvements). I've got enough math to understand Dr. Codd (and Date) in his writings about the RM. But remember that SQL is not an implementation of a RM at all. Both Codd and Date (and you as well) often totally ignore the practical problems the user is faced with. SQL is an implementation of a model that tries to be a bridge between two worlds: The relational world from a mathematical pov, and second a usable tool for solving practical problems. I'd rather had seen split a database into two layers, one purely relational. Second a layer designed to mung your data into a form the user wishes. We would not have those discussions where one confuses the writings of 35 years ago with the current implementation of SQL. SQL is not capable of handling the RM well, and certainly does not well in the data munging area as well. But I accept it is the standard, and I have to live with it, but please do not ever confuse the RM with SQL. So please, mr Celko, please dive into the books again (I could recommend 'An introduction to relational database systems' by C.J.DATE) to see the differences :) Walter, >> what way is this question much different than your SQL puzzle, like the one of friday? << The puzzle was more specific as to the data required and not a generalfunction -- only two items, not (n) unspecific things. The ordering was well-defined, not vague. There was no need of separators since the data was in columns. In the real world, I would have put each each job history in a row by itself, so the code would port bertter and let the front end or middle tier assemble the current and prior jobs into a single line for display. But do not confuse puzzles with good practices. Back in the 1960's we used to have Commerical FORTRAN packages on small machines (DEC, DG, HP, and a ton of other companies that are loooong gone) that did character math a la mock-COBOL. "I told him I had been that morning at a meeting of the people called Quakers, where I had heard a woman preach. Johnson: "Sir, a woman's preaching is like a dog's walking on his hind legs. It is not done well; but you are surprised to find it done at all." --Boswell on Dr. Samuel Johnson -- Commercial FORTRAN libraries quite fast on the old 16-bit mini-computers in their day. But it was a kludge. It was not good programming. I know this. I can write crappy code with anyone. Hell, I did that at the start of my career until I found S.P.. Confession: I separated my math from my programming the first years when I started. Then along came Structured Programming, Dijkstra, Wirth, Myers, Yourdon, et al. Programming went from art to engineering. All my math kicked in. SQL is almost at the point that procedural code was before Structured Programming. We have a few tricks (Calendar tables, nested sets, etc.) but we do not have an orderly approach to code yet. In what way is this question much different than your SQL puzzle, like
the one of friday? It is just about munging the data into a form that is directly usable by the client. SQL in itself is not very good at that, (though SQL 2005 definately has some very usefull improvements). I've got enough math to understand Dr. Codd (and Date) in his writings about the RM. But remember that SQL is not an implementation of a RM at all. Both Codd and Date (and you as well) often totally ignore the practical problems the user is faced with. SQL is an implementation of a model that tries to be a bridge between two worlds: The relational world from a mathematical pov, and second a usable tool for solving practical problems. I'd rather had seen split a database into two layers, one purely relational. Second a layer designed to mung your data into a form the user wishes. We would not have those discussions where one confuses the writings of 35 years ago with the current implementation of SQL. SQL is not capable of handling the RM well, and certainly does not well in the data munging area as well. But I accept it is the standard, and I have to live with it, but please do not ever confuse the RM with SQL. So please, mr Celko, please dive into the books again (I could recommend 'An introduction to relational database systems' by C.J.DATE) to see the differences :) Walter, >> I've got enough math to understand Dr. Codd (and Date) in his writings about the RM.<< I have two separate Master's -- one in Math and one in Comp Sci and Istill get surprised ; Codd and Date are not the same RM model; no NULL (Date), one NULL (RM #1 and SQL), and two NULLs (RM#2 and parts of SQL-99 OLAP). >> But remember that SQL is not an implementation of a RM at all. Both Codd and Date (and you as well) often totally ignore the practical problems the user is faced with. SQL is an implementation of a model that tries to be a bridge between two worlds: << Hey, I am the programmer's friend, they guy that publishes code andconsults with product developers and helps old ladies with RDBMS problems. But you are right about RM versus SQL. I can add Algebra versus FORTRAN to list, ICON versus Post production strings, LISP versus lists, erc. Any programming language based on a math model will not match. >> I'd rather had seen split a database into two layers, one purely relational. Second a layer designed to mung your data into a form the user wishes << Me, too! Tiered architecture at its best! Raise you one: putbusienss rules in a separate place that can be used by all the tiers. >> We would not have those discussions where one confuses the writings of 35 years ago with the current implementation of SQL. << Gee, better not use the math of 8000 years ago when I am balancign mybooks :) >> SQL is not capable of handling the RM well, and certainly does not well in the data munging area as well. << Sorry, but it does RM very well, if youy learn it; it was never, nevermeant to munge data. We defined it without any real I/O on purpose. >> But I accept it is the standard, and I have to live with it, but please do not ever confuse the RM with SQL. << Hey, I have never said I was the RM; I am the SQL guy.>> So please, mr Celko, please dive into the books again (I could recommend 'An introduction to relational database systems' by C.J.DATE)to see the differences ) << I have all the editions and have read them cover to cover, summitederrata and been asked to review one edition. I like the book, I recommend it, but would prefer that peopel read Meyers THEORY for the foundations. If you thought freshman calculius was bad, this will kill you.... I disagree with Date, and mostly agree with Codd on the RM model. I like NULLs and 3VL. I also know when SQL should not be used. .. have two separate Master's -- one in Math and one in Comp Sci and I
still get surprised ; Codd and Date are not the same RM model; no NULL (Date), one NULL (RM #1 and SQL), and two NULLs (RM#2 and parts of SQL-99 OLAP). Comp SCI/Software development with (non)relational databases specialty here, not that I think it is relevant here. I'm well aware of all the quarrels between CODD and DATE and I have to take sides here as I simply do not agree with DATE in most of those issues here. My point is however on that your referal to CODD in the other message does not make much sense here. CODD did write the initial papers leading the the RM. SQL in itself is a product that is developped with the RM in mind, but is really a poor implementation of it, esspecially as the RM was designed with the 'less is more' phrase in mind. have two separate Master's -- one in Math and one in Comp Sci and I
still get surprised ; Codd and Date are not the same RM model; no NULL (Date), one NULL (RM #1 and SQL), and two NULLs (RM#2 and parts of SQL-99 OLAP). Comp SCI/Software development with (non)relational databases specialty here, not that I think it is relevant here. I'm well aware of all the quarrels between CODD and DATE and I have to take sides here as I simply do not agree with DATE in most of those issues here. My point is however on that your referal to CODD in the other message does not make much sense here. CODD did write the initial papers leading the the RM. SQL in itself is a product that is developped with the RM in mind, but is really a poor implementation of it, esspecially as the RM was designed with the 'less is more' phrase in mind.
Other interesting topics
|
|||||||||||||||||||||||