|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Accumulating Aggregator?I have the following sql: select * from cars c inner join carcolours cc on c.car_id = cc.car_id This returns (for example): Volvo | Red Volvo | Blue Alfa | Red Alfa | Green Fiat | Black Fiat | Pink Fiat | Brown What i need to return is this: Volvo | Red | 1 Volvo | Blue | 2 Alfa | Red | 1 Alfa | Green | 2 Fiat | Black | 1 Fiat | Pink | 2 Fiat | Brown | 3 where the numbers correspond to the record number for the current car. o there are two Volvo rows, 1 and 2, two Alfa rows, 1 and 2 and three fiat rows, 1, 2 and 3. Does that make sense? How can i achieve this? Thanks Andrew Andrew (infoREM***@THISmuonlab.com) writes:
Show quote > I have the following sql: Please tell us next which version of SQL Server you are using.> > select * from cars c inner join carcolours cc on c.car_id = cc.car_id > > This returns (for example): > > Volvo | Red > Volvo | Blue > Alfa | Red > Alfa | Green > Fiat | Black > Fiat | Pink > Fiat | Brown > > > What i need to return is this: > > > Volvo | Red | 1 > Volvo | Blue | 2 > Alfa | Red | 1 > Alfa | Green | 2 > Fiat | Black | 1 > Fiat | Pink | 2 > Fiat | Brown | 3 > > where the numbers correspond to the record number for the current car. o > there are two Volvo rows, 1 and 2, two Alfa rows, 1 and 2 and three fiat > rows, 1, 2 and 3. Does that make sense? > > How can i achieve this? SQL 2005: SELECT car, colour, row_number OVER (PARTITION BY car ORDER BY colour) FROM tbl SQL 2000: SELECT a.car, a.colour, (SELECT COUNT(*) FROM tbl b WHERE a.car = b.car AND a.colour <= b.colour) FROM tbl a -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx Aww, crap, you beat me to it:
> SELECT a.car, a.colour, We got almost identical syntax. But you would up with 1, 2, 3 where I got > (SELECT COUNT(*) > FROM tbl b > WHERE a.car = b.car > AND a.colour <= b.colour) > FROM tbl a 0, 1, 2. But that was a fun brainteaser. -- Peace & happy computing, Mike Labosh, MCSD MCT Owner, vbSensei.Com "y = (-b ± (b^2 - 4 * a * c)^.5) / 2 * a" -- Dr. Houser Erland Sommarskog wrote:
Show quote > 2000.> Please tell us next which version of SQL Server you are using. > > SQL 2005: > > SELECT car, colour, row_number OVER (PARTITION BY car ORDER BY colour) > FROM tbl > > SQL 2000: > > SELECT a.car, a.colour, > (SELECT COUNT(*) > FROM tbl b > WHERE a.car = b.car > AND a.colour <= b.colour) > FROM tbl a > Can you please elaborate on what tbl a and tbl b are? Is this some clever SQL syntax I don't know about? car and colour are in separate tables but to me the way you've written it seems to suggest they are in the same table. Thanks! Andrew Andrew (infoREM***@THISmuonlab.com) writes:
Show quote > Erland Sommarskog wrote: Sorry, overlooked that part. If they are in separate tahpes, we get:>> Please tell us next which version of SQL Server you are using. >> >> SQL 2005: >> >> SELECT car, colour, row_number OVER (PARTITION BY car ORDER BY colour) >> FROM tbl >> >> SQL 2000: >> >> SELECT a.car, a.colour, >> (SELECT COUNT(*) >> FROM tbl b >> WHERE a.car = b.car >> AND a.colour <= b.colour) >> FROM tbl a >> > > Can you please elaborate on what tbl a and tbl b are? > > Is this some clever SQL syntax I don't know about? > > car and colour are in separate tables but to me the way you've written > it seems to suggest they are in the same table. SELECT c1.car, cc1.colour (select COUNT(*) from cars c2 join carcolours cc2 on c2.car_id = cc2.car_id WHERE c1.car = c2.car AND cc1.colour <= cc2.colour) FROM cars c1 JOIN carcolours cc1 ON c1.car_id = cc1.car_id -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx After agonizing over my first flawed attempts with obscene CASE blocks, WITH
ROLLUP, WITH CUBE, and pounding my face against the COMPUTE BY clause, it finally hit me: CREATE TABLE #Cars ( Model VARCHAR(10), Color VARCHAR(10) ) INSERT INTO #Cars (Model, Color) VALUES ('Volvo', 'Red') INSERT INTO #Cars (Model, Color) VALUES ('Volvo', 'Blue') INSERT INTO #Cars (Model, Color) VALUES ('Alfa', 'Red') INSERT INTO #Cars (Model, Color) VALUES ('Alfa', 'Green') INSERT INTO #Cars (Model, Color) VALUES ('Fiat', 'Black') INSERT INTO #Cars (Model, Color) VALUES ('Fiat', 'Pink') -- Is the customer gay or something? INSERT INTO #Cars (Model, Color) VALUES ('Fiat', 'Brown') SELECT Model, Color, COUNT(Model), COUNT(Color) FROM #Cars GROUP BY Model, Color ORDER BY Model, Color SELECT Model, Color, ( SELECT COUNT(*) FROM #Cars c2 WHERE c2.Model = c1.Model AND c2.Color > c1.Color ) AS Increment FROM #Cars c1 ORDER BY Model, ( SELECT COUNT(*) FROM #Cars c2 WHERE c2.Model = c1.Model AND c2.Color > c1.Color ) DROP TABLE #Cars -- Peace & happy computing, Mike Labosh, MCSD MCT Owner, vbSensei.Com "y = (-b ± (b^2 - 4 * a * c)^.5) / 2 * a" -- Dr. Houser >> where the numbers correspond to the record [sic] number for the current car. << Let's get back to the basics of an RDBMS. Rows are not records; fieldsare not columns; tables are not files; there is no sequential access or ordering in an RDBMS, so "first", "next" and "last" are totally meaningless. If you want an ordering, then you need to have a column that defines that ordering. You must use an ORDER BY clause on a cursor or in an OVER() clause. Without a PARTITION clause the order will be random. > Let's get back to the basics of an RDBMS. Rows are not records; fields Maybe I am just a brain-dead developer, but I have seen you post comments > are not columns; tables are not files; there is no sequential access or > ordering in an RDBMS, so "first", "next" and "last" are totally > meaningless. If you want an ordering, then you need to have a column > that defines that ordering. You must use an ORDER BY clause on a > cursor or in an OVER() clause. Without a PARTITION clause the order > will be random. like this on many occasions, where you say a "Row" is not a "record". Could you please elaborate on what you mean so I can at least grok you? Because to people like me, those two terms are synonomous. -- Peace & happy computing, Mike Labosh, MCSD MCT Owner, vbSensei.Com "y = (-b ± (b^2 - 4 * a * c)^.5) / 2 * a" -- Dr. Houser Mike Labosh (mlabosh_at_hotmail_dot_com) writes:
> Maybe I am just a brain-dead developer, but I have seen you post comments I wasn't there at the time - but rumour has it that when Joe Celko > like this on many occasions, where you say a "Row" is not a "record". > Could you please elaborate on what you mean so I can at least grok you? > Because to people like me, those two terms are synonomous. presented at PASS he was caught saying "record". Maybe that's why he keeps repeating it - in order to try to get it right himself. Myself, I usually talk about rows and columns, rather than records and fields - when I write in English on the newsgroups. In daily discource with colleagues in Swedish. The whole thing is just a question of different terminology. SQL uses "rows" where most other languases use "record" etc. Of course, there is some important differences - records in a file are ordered that is exposed through the access method. Not so with rows in a table. You can see this elsewhere. When I learnt programming I learn about procedures and functions - these days they talk about methods as it was something else - but it isn't. It's just small talk. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx If I'm not mistaken, the ANSI SQL-99 refers in several places to "fields"...
Show quote "Erland Sommarskog" <esq***@sommarskog.se> wrote in message news:Xns980FEC848EB60Yazorman@127.0.0.1... > Mike Labosh (mlabosh_at_hotmail_dot_com) writes: >> Maybe I am just a brain-dead developer, but I have seen you post comments >> like this on many occasions, where you say a "Row" is not a "record". >> Could you please elaborate on what you mean so I can at least grok you? >> Because to people like me, those two terms are synonomous. > > I wasn't there at the time - but rumour has it that when Joe Celko > presented at PASS he was caught saying "record". Maybe that's why he > keeps repeating it - in order to try to get it right himself. > > Myself, I usually talk about rows and columns, rather than records and > fields - when I write in English on the newsgroups. In daily discource > with colleagues in Swedish. > > The whole thing is just a question of different terminology. SQL uses > "rows" where most other languases use "record" etc. Of course, there is > some important differences - records in a file are ordered that is > exposed through the access method. Not so with rows in a table. > > You can see this elsewhere. When I learnt programming I learn about > procedures and functions - these days they talk about methods as it > was something else - but it isn't. It's just small talk. > > > -- > Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se > > Books Online for SQL Server 2005 at > http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx > Books Online for SQL Server 2000 at > http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx Come on, once upon a time they would scold you for saying "indexes" not
"indices", saying that "index" is a Latin word and must have a Latin plural. Now everybody say "indexes" and nobody cares - I guess "index" is an English word now... > Come on, once upon a time they would scold you for saying "indexes" not Heh. Once, teaching a class, I used the word, "datum", and several people > "indices", saying that "index" is a Latin word and must have a Latin > plural. Now everybody say "indexes" and nobody cares - I guess "index" > is an English word now... looked up and said, simultaneously, "huh?" -- Peace & happy computing, Mike Labosh, MCSD MCT Owner, vbSensei.Com "y = (-b ± (b^2 - 4 * a * c)^.5) / 2 * a" -- Dr. Houser >> Could you please elaborate on what you mean so I can at least grok you? Because to people like me, those two terms are synonomous. <<Like most new ideas, the hard part of understanding what the relational model is comes in un-learning what you know about file systems. As Artemus Ward (William Graham Sumner, 1840-1910) put it, "It ain't so much the things we don't know that get us into trouble. It's the things we know that just ain't so." If you already have a background in data processing with traditional file systems, the first things to un-learn are: (0) Databases are not file sets. (1) Tables are not files. (2) Rows are not records. (3) Columns are not fields. Modern data processing began with punch cards. The influence of the punch card lingered on long after the invention of magnetic tapes and disk for data storage. This is why early video display terminals were 80 columns across. Even today, files which were migrated from cards to magnetic tape files or disk storage still use 80 column records. But the influence was not just on the physical side of data processing. The methods for handling data from the prior media were imitated in the new media. Data processing first consisted of sorting and merging decks of punch cards (later, sequential magnetic tape files) in a series of distinct steps. The result of each step feed into the next step in the process. This leads to temp table and other tricks to mimic that kind of processing. Relational databases do not work that way. Each user connects to the entire database all at once, not to one file at time in a sequence of steps. The users might not all have the same database access rights once they are connected, however. Magnetic tapes could not be shared among users at the same time, but shared data is the point of a database. Tables versus Files A file is closely related to its physical storage media. A table may or may not be a physical file. DB2 from IBM uses one file per table, while Sybase puts several entire databases inside one file. A table is a <i>set<i> of rows of the same kind of thing. A set has no ordering and it makes no sense to ask for the first or last row. A deck of punch cards is sequential, and so are magnetic tape files. Therefore, a <i>physical<i> file of ordered sequential records also became the <i>mental<i> model for data processing and it is still hard to shake. Anytime you look at data, it is in some physical ordering. The various access methods for disk storage system came later, but even these access methods could not shake the mental model. Another conceptual difference is that a file is usually data that deals with a whole business process. A file has to have enough data in itself to support applications for that business process. Files tend to be "mixed" data which can be described by the name of the business process, such as "The Payroll file" or something like that. Tables can be either entities or relationships within a business process. This means that the data which was held in one file is often put into several tables. Tables tend to be "pure" data which can be described by single words. The payroll would now have separate tables for timecards, employees, projects and so forth. Tables as Entities An entity is physical or conceptual "thing" which has meaning be itself. A person, a sale or a product would be an example. In a relational database, an entity is defined by its attributes, which are shown as values in columns in rows in a table. To remind users that tables are sets of entities, I like to use collective or plural nouns that describe the function of the entities within the system for the names of tables. Thus "Employee" is a bad name because it is singular; "Employees" is a better name because it is plural; "Personnel" is best because it is collective and does not summon up a mental picture of individual persons. If you have tables with exactly the same structure, then they are sets of the same kind of elements. But you should have only one set for each kind of data element! Files, on the other hand, were PHYSICALLY separate units of storage which could be alike -- each tape or disk file represents a step in the PROCEDURE , such as moving from raw data, to edited data, and finally to archived data. In SQL, this should be a status flag in a table. Tables as Relationships A relationship is shown in a table by columns which reference one or more entity tables. Without the entities, the relationship has no meaning, but the relationship can have attributes of its own. For example, a show business contract might have an agent, an employer and a talent. The method of payment is an attribute of the contract itself, and not of any of the three parties. Rows versus Records Rows are not records. A record is defined in the application program which reads it; a row is defined in the database schema and not by a program at all. The name of the field in the READ or INPUT statements of the application; a row is named in the database schema. Likewise, the PHYSICAL order of the field names in the READ statement is vital (READ a,b,c is not the same as READ c, a, b; but SELECT a,b,c is the same data as SELECT c, a, b. All empty files look alike; they are a directory entry in the operating system with a name and a length of zero bytes of storage. Empty tables still have columns, constraints, security privileges and other structures, even tho they have no rows. This is in keeping with the set theoretical model, in which the empty set is a perfectly good set. The difference between SQL's set model and standard mathematical set theory is that set theory has only one empty set, but in SQL each table has a different structure, so they cannot be used in places where non-empty versions of themselves could not be used. Another characteristic of rows in a table is that they are all alike in structure and they are all the "same kind of thing" in the model. In a file system, records can vary in size, datatypes and structure by having flags in the data stream that tell the program reading the data how to interpret it. The most common examples are Pascal's variant record, C's struct syntax and Cobol's OCCURS clause. The OCCURS keyword in Cobol and the Variant records in Pascal have a number which tells the program how many time a record structure is to be repeated in the current record. Unions in 'C' are not variant records, but variant mappings for the same physical memory. For example: union x {int ival; char j[4];} myStuff; defines myStuff to be either an integer (which are 4 bytes on most modern C compilers, but this code is non-portable) or an array of 4 bytes, depending on whether you say myStuff.ival or myStuff.j[0]; But even more than that, files often contained records which were summaries of subsets of the other records -- so called control break reports. There is no requirement that the records in a file be related in any way -- they are literally a stream of binary data whose meaning is assigned by the program reading them. Columns versus Fields A field within a record is defined by the application program that reads it. A column in a row in a table is defined by the database schema. The datatypes in a column are always scalar. The order of the application program variables in the READ or INPUT statements is important because the values are read into the program variables in that order. In SQL, columns are referenced only by their names. Yes, there are shorthands like the SELECT * clause and INSERT INTO <table name> statements which expand into a list of column names in the physical order in which the column names appear within their table declaration, but these are shorthands which resolve to named lists. The use of NULLs in SQL is also unique to the language. Fields do not support a missing data marker as part of the field, record or file itself. Nor do fields have constraints which can be added to them in the record, like the DEFAULT and CHECK() clauses in SQL. Relationships among tables within a database Files are pretty passive creatures and will take whatever an application program throws at them without much objection. Files are also independent of each other simply because they are connected to one application program at a time and therefore have no idea what other files looks like. A database actively seeks to maintain the correctness of all its data. The methods used are triggers, constraints and declarative referential integrity. Declarative referential integrity (DRI) says, in effect, that data in one table has a particular relationship with data in a second (possibly the same) table. It is also possible to have the database change itself via referential actions associated with the DRI. For example, a business rule might be that we do not sell products which are not in inventory. This rule would be enforce by a REFERENCES clause on the Orders table which references the Inventory table and a referential action of ON DELETE CASCADE Triggers are a more general way of doing much the same thing as DRI. A trigger is a block of procedural code which is executed before, after or instead of an INSERT INTO or UPDATE statement. You can do anything with a trigger that you can do with DRI and more. However, there are problems with TRIGGERs. While there is a standard syntax for them in the SQL-92 standard, most vendors have not implemented it. What they have is very proprietary syntax instead. Secondly, a trigger cannot pass information to the optimizer like DRI. In the example in this section, I know that for every product number in the Orders table, I have that same product number in the Inventory table. The optimizer can use that information in setting up EXISTS() predicates and JOINs in the queries. There is no reasonable way to parse procedural trigger code to determine this relationship. The CREATE ASSERTION statement in SQL-92 will allow the database to enforce conditions on the entire database as a whole. An ASSERTION is not like a CHECK() clause, but the difference is subtle. A CHECK() clause is executed when there are rows in the table to which it is attached. If the table is empty then all CHECK() clauses are effectively TRUE. Thus, if we wanted to be sure that the Inventory table is never empty, and we wrote: CREATE TABLE Inventory ( ... CONSTRAINT inventory_not_empty CHECK ((SELECT COUNT(*) FROM Inventory) > 0), ... ); it would not work. However, we could write: CREATE ASSERTION Inventory_not_empty CHECK ((SELECT COUNT(*) FROM Inventory) > 0); and we would get the desired results. The assertion is checked at the schema level and not at the table level. <4-year college education snipped />
OK, cool. Every time in the past, when you have jumped up and said a row is not a record, I have been terrified that I have been teaching my students improperly. You're saying that a row is different from a record, because of logical storage, right? It's just a symantic thing that differs between physical bits on the RAID stack, and what folks like me call a ADODB.Recordset or.... a .NET System.Data.DataTable Am I understanding you correctly? -- Peace & happy computing, Mike Labosh, MCSD MCT Owner, vbSensei.Com "y = (-b ± (b^2 - 4 * a * c)^.5) / 2 * a" -- Dr. Houser >> You're saying that a row is different from a record, because of logical storage, right? It's just a symantic thing .. << it is conceptual, but in 100 words or less:1) Records take meaning because of a host program reading them from contigous storage, No constraints, defaults, datatypes etc. exist in a file. Records are read and processed, one at time, field by contigous field. A record can have all kinds of non-scalar fields. 2) Rows have meaning in themselves, regardless of which host program is using them. They have constraints, defaults, datatypes etc. Rows are read as complete units, which are elements of a set. All columns are scalars. When you have the right mental model, you ask the right questions and find the right answers. Look at all the postings here that stem from row/record confusion. > 1) Records take meaning because of a host program reading them from Yeah, but you're talking about physical storage and low level I/O. We > contigous storage, No constraints, defaults, datatypes etc. exist in a > file. Records are read and processed, one at time, field by contigous > field. A record can have all kinds of non-scalar fields. > > 2) Rows have meaning in themselves, regardless of which host program is > using them. They have constraints, defaults, datatypes etc. Rows are > read as complete units, which are elements of a set. All columns are > scalars. > > When you have the right mental model, you ask the right questions and > find the right answers. Look at all the postings here that stem from > row/record confusion. developers (I hope) don't code at that level anyway, so this is why my brain does not make that distiction, and...now that I think of it, a person that makes such a distinction is programming way too close to the bare silicon. I follow your train of thought, and at my level, I agree with your synamtics, and you are correct. But seriously, just imagine the chaos that would ensue if Mikey the VB.NET guy tried to do raw file system I/O on your .mdf and .ldf files. Would you not beat me to death with a phone pole? -- Peace & happy computing, Mike Labosh, MCSD MCT Owner, vbSensei.Com "y = (-b ± (b^2 - 4 * a * c)^.5) / 2 * a" -- Dr. Houser >> Yeah, but you're talking about physical storage and low level I/O. We developers (I hope) don't code at that level anyway, so this is why my brain does not make that distiction, and...now that I think of it, a person that makes such a distinction is programming way too close to the bare silicon.<< Yes and no. The concept of contigious storage can be both physical andlogical -- I issue a "Next Record" on a file. I get the data and I do not have to now to create code to move the read-write disk head like I did on a magnetic tape system (i.e "buffer (n) bytes in to main storage address (s) and put the location address in register (x) from the tape on channel (c)" in the reallllly old days). But I still have the semantics of physically contigous fields of data, blocked into various shaped records read and processed in a sequential manner. That is the problem that leads to cursors, temp tabble as scratch tapes, etc. >> I follow your train of thought, and at my level, I agree with your synamtics, and you are correct. << Thank you sir.>> But seriously, just imagine the chaos that would ensue if Mikey the VB.NET guy tried to do raw file system I/O on your .mdf and .ldf files. Would you not beat me to death with a phone pole? << I ain't got the muscle :)"--CELKO--" <jcelko***@earthlink.net> wrote in message Somebody needs to tell those people over at ANSI. As long as they put news:1154273630.887918.183100@p79g2000cwp.googlegroups.com... > (3) Columns are not fields. information in the standards contradictory to your statements (like the 390+ references to "field" in the ANSI SQL-99 standard) you can expect ordinary users to continue to use what you consider incorrect terminology. Assuming, of course, that the ANSI standard is a reliable and adequate reference... Here's a sample from the ANSI SQL-99 standard: "6.2 <field definition> Function Define a field of a row type. Format <field definition> ::= <field name> <data type> [ <reference scope check> ] [ <collate clause> ] Syntax Rules"
Show quote
> The CREATE ASSERTION statement in SQL-92 will allow the database to Joe,> enforce conditions on the entire database as a whole. An ASSERTION is > not like a CHECK() clause, but the difference is subtle. A CHECK() > clause is executed when there are rows in the table to which it is > attached. If the table is empty then all CHECK() clauses are > effectively TRUE. Thus, if we wanted to be sure that the Inventory > table is never empty, and we wrote: > > CREATE TABLE Inventory > ( ... > CONSTRAINT inventory_not_empty > CHECK ((SELECT COUNT(*) FROM Inventory) > 0), ... ); > > it would not work. However, we could write: > > CREATE ASSERTION Inventory_not_empty > CHECK ((SELECT COUNT(*) FROM Inventory) > 0); > > and we would get the desired results. The assertion is checked at the > schema level and not at the table level. With snapshot isolation or just on Oracle the following will happen: Suppose there are 2 rows in Inventory. Suppose connection 1 begins a transaction, successfully deletes a row (there is another row so far), but not commits yet. Suppose connection 2 deletes another row - it succeeds because it does not see uncommitted changes from connection 1. Now connection 1 can commit and Inventory is empty. Sounds like a loophole, right? |
|||||||||||||||||||||||