|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
[OT] SQL HumorI am sure many of you folks are already subscribed, but for those who are
not, here's a story of some DBA job interviews of candidates that should just be summarily shot: (This will wrap) http://www.sqlservercentral.com/columnists/sMcCown/howdoyouspellsql_printversion.asp -- Peace & happy computing, Mike Labosh, MCSD "Musha ring dum a doo dum a da!" -- James Hetfield Another source for quote of the week.
Although the author is unfair emphasizing that DBA has to be aware char vs varchar2 difference. The character string types in SQL are just plain silly. For how long ordinary programming languages have [a single] type String [with unlimited bounds] already? Also when did you design a database schema with char datatype last time? In a word, there is not a single advantage of char over varchar2. Therefore, char is not even worth mentioned, and should just be deprecated: there are to many really important things on DBA plate. Mike Labosh wrote: Show quote > I am sure many of you folks are already subscribed, but for those who are > not, here's a story of some DBA job interviews of candidates that should > just be summarily shot: > > (This will wrap) > > http://www.sqlservercentral.com/columnists/sMcCown/howdoyouspellsql_printversion.asp > If this was intended to be a humorous quote from another faux DBA, then
thanks for the laugh! If that was your own opinion (and I'm assuming not), then do society a favor by quiting your job and living off unemployment. ;-) Show quote "Mikito Harakiri" <mikharakiri_nosp***@yahoo.com> wrote in message news:1124385453.899643.54220@f14g2000cwb.googlegroups.com... > Another source for quote of the week. > > Although the author is unfair emphasizing that DBA has to be aware char > vs varchar2 difference. The character string types in SQL are just > plain silly. For how long ordinary programming languages have [a > single] type String [with unlimited bounds] already? Also when did you > design a database schema with char datatype last time? In a word, there > is not a single advantage of char over varchar2. Therefore, char is not > even worth mentioned, and should just be deprecated: there are to many > really important things on DBA plate. > > Mike Labosh wrote: >> I am sure many of you folks are already subscribed, but for those who are >> not, here's a story of some DBA job interviews of candidates that should >> just be summarily shot: >> >> (This will wrap) >> >> http://www.sqlservercentral.com/columnists/sMcCown/howdoyouspellsql_printversion.asp >> > No, that was my opinion. Do you imply I'm not qualified to be a DBA?
Well, I would be ashamed to be called a master of extents and segment management. JT wrote: Show quote > If this was intended to be a humorous quote from another faux DBA, then > thanks for the laugh! If that was your own opinion (and I'm assuming not), > then do society a favor by quiting your job and living off unemployment. > ;-) > > "Mikito Harakiri" <mikharakiri_nosp***@yahoo.com> wrote in message > news:1124385453.899643.54220@f14g2000cwb.googlegroups.com... > > Another source for quote of the week. > > > > Although the author is unfair emphasizing that DBA has to be aware char > > vs varchar2 difference. The character string types in SQL are just > > plain silly. For how long ordinary programming languages have [a > > single] type String [with unlimited bounds] already? Also when did you > > design a database schema with char datatype last time? In a word, there > > is not a single advantage of char over varchar2. Therefore, char is not > > even worth mentioned, and should just be deprecated: there are to many > > really important things on DBA plate. > > > > Mike Labosh wrote: > >> I am sure many of you folks are already subscribed, but for those who are > >> not, here's a story of some DBA job interviews of candidates that should > >> just be summarily shot: > >> > >> (This will wrap) > >> > >> http://www.sqlservercentral.com/columnists/sMcCown/howdoyouspellsql_printversion.asp > >> > > Mikito Harakiri wrote:
> No, that was my opinion. Do you imply I'm not qualified to be a DBA? What's so shameful about being called a DBA ?> Well, I would be ashamed to be called a master of extents and segment > management. > Show quote > > You smell like someone who spends most of their time lurking around the
halls of a university or research lab rather than developing database models for real production systems. Show quote "Mikito Harakiri" <mikharakiri_nosp***@yahoo.com> wrote in message news:1124388849.145863.189590@g47g2000cwa.googlegroups.com... > No, that was my opinion. Do you imply I'm not qualified to be a DBA? > Well, I would be ashamed to be called a master of extents and segment > management. > > JT wrote: >> If this was intended to be a humorous quote from another faux DBA, then >> thanks for the laugh! If that was your own opinion (and I'm assuming >> not), >> then do society a favor by quiting your job and living off unemployment. >> ;-) >> >> "Mikito Harakiri" <mikharakiri_nosp***@yahoo.com> wrote in message >> news:1124385453.899643.54220@f14g2000cwb.googlegroups.com... >> > Another source for quote of the week. >> > >> > Although the author is unfair emphasizing that DBA has to be aware char >> > vs varchar2 difference. The character string types in SQL are just >> > plain silly. For how long ordinary programming languages have [a >> > single] type String [with unlimited bounds] already? Also when did you >> > design a database schema with char datatype last time? In a word, there >> > is not a single advantage of char over varchar2. Therefore, char is not >> > even worth mentioned, and should just be deprecated: there are to many >> > really important things on DBA plate. >> > >> > Mike Labosh wrote: >> >> I am sure many of you folks are already subscribed, but for those who >> >> are >> >> not, here's a story of some DBA job interviews of candidates that >> >> should >> >> just be summarily shot: >> >> >> >> (This will wrap) >> >> >> >> http://www.sqlservercentral.com/columnists/sMcCown/howdoyouspellsql_printversion.asp >> >> >> > > >> I would be ashamed to be called a master of extents and segment management. <<what is so shameful in it? > there is not a single advantage of char over varchar2 "varchar2" ? You seem to live in an Oracle world, but here we are MSSQL Server people. And in SQL Server there is an advantage for char over varchar: char(n) has takes two bytes less to store than varchar(n), if the string has always n characters. Razvan Razvan Socol wrote:
> > there is not a single advantage of char over varchar2 Hmm... With todays terabytes of disc memory, are 2 extra bytes really> > "varchar2" ? You seem to live in an Oracle world, but here we are MS > SQL Server people. And in SQL Server there is an advantage for char > over varchar: char(n) has takes two bytes less to store than > varchar(n), if the string has always n characters. > > Razvan that important? If storage size is really critical, then maybe column compression might help? Not that I totally disagree with the concept of deprecating char vs
varchar, but I see this argument used a lot, and there's two flaws with it. 1. Storage size may be cheap and plentiful, but performance should always be foremost in the DBA mind. 2 bytes in a single column pf storage may not be much, but you also need to write queries that retrieve that extra 2 bytes. It's not just disk space; it's also memory and CPU. 2. Failing to appreciate the differences between varchar and char sets us on a path of lazy design. If it doesn't matter if I use char(10) vs varchar(10), then what's the matter with varchar(50)? How about varchar(51) etc? Eventually we could get away with "oh screw it; disks are cheap, memory is cheap, CPU's are powerful, let's just put everything in an text column and parse it on the fly". I realize that what you are saying is a long way from suggesting that we just throw away relational design altogether, but it's the small steps that lead to the bigger leaps that lead to the cliffs. Just my .02 Stu Stu wrote:
> Not that I totally disagree with the concept of deprecating char vs How much performance difference does it really make? Keep in mind that> varchar, but I see this argument used a lot, and there's two flaws with > it. > > 1. Storage size may be cheap and plentiful, but performance should > always be foremost in the DBA mind. 2 bytes in a single column pf > storage may not be much, but you also need to write queries that > retrieve that extra 2 bytes. It's not just disk space; it's also > memory and CPU. in a typical table you can declare char a couple of boolean (Y/N) columns at most. > 2. Failing to appreciate the differences between varchar and char sets Failing to appreciate the differences between varchar and char sets is> us on a path of lazy design. If it doesn't matter if I use char(10) vs > varchar(10), then what's the matter with varchar(50)? How about > varchar(51) etc? Eventually we could get away with "oh screw it; disks > are cheap, memory is cheap, CPU's are powerful, let's just put > everything in an text column and parse it on the fly". an ability to raise the level of abstraction. (BTW, the skill many DBAs lack.) With low level of abstraction you would never quit chaising perceived problems (eg. trying to figure out the "optimal" block size, pondering if select * from table is faster than select col1, col2, ... from table etc.) >> > storage may not be much, but you also need to write queries that How much performance difference does it really make?> retrieve that extra 2 bytes. It's not just disk space; it's also > memory and CPU. << beleive me or not, in some cases it's 100% or more AK wrote:
> >> > storage may not be much, but you also need to write queries that Do you have a bencmark to prove it?> > retrieve that extra 2 bytes. It's not just disk space; it's also > > memory and CPU. > > > > How much performance difference does it really make? > << > > beleive me or not, in some cases it's 100% or more "AK" <AK_TIREDOFSPAM@hotmail.COM> wrote in message I simply don't believe you. I don't believe the difference can be 100%. Inews:1124396559.557686.77080@g43g2000cwa.googlegroups.com... > >> > storage may not be much, but you also need to write queries that > > retrieve that extra 2 bytes. It's not just disk space; it's also > > memory and CPU. > How much performance difference does it really make? > << > > beleive me or not, in some cases it's 100% or more certainly don't believe it can be more. And I don't believe you can even measure what difference there is. And even if I did believe all that, I don't believe it is anything but a bug in your software. So I 100% don't believe you. However, I am willing to be persuaded by proof. Roy >> I simply don't believe you. I don't believe the difference can be> beleive me or not, in some cases it's 100% or more 100%. I certainly don't believe it can be more. And I don't believe you can even measure what difference there is. And even if I did believe all that, I don't believe it is anything but a bug in your software. So I 100% don't believe you. However, I am willing to be persuaded by proof. << Vow, I missed such a statement. Well, imagine a table with rows that are almost 4K each. That's 2 rows per an 8K page. Add some additional width, and you end up with just one row per data page, a 100% increase in table size. Believe me or not, that's not a bug in our software, that's the way SQL Server stores data. On one hand, moving from 2K pages in 65 to 8K pages in 70 in 200 is a big improvement. On the other hand, in Oracle we can choose page size (blocksize in Oracle universe) On 18 Aug 2005 13:17:37 -0700, Mikito Harakiri wrote:
Show quote >Stu wrote: Hi Mikito,>> Not that I totally disagree with the concept of deprecating char vs >> varchar, but I see this argument used a lot, and there's two flaws with >> it. >> >> 1. Storage size may be cheap and plentiful, but performance should >> always be foremost in the DBA mind. 2 bytes in a single column pf >> storage may not be much, but you also need to write queries that >> retrieve that extra 2 bytes. It's not just disk space; it's also >> memory and CPU. > >How much performance difference does it really make? Keep in mind that >in a typical table you can declare char a couple of boolean (Y/N) >columns at most. The extra performance cost of declaring Y/N columns varchar instead of char is trivial compared to the extra cost of doing so for foreign key columns. Most databases have lots of so-called "lookup tables". In the main table, the state is stored as a 2-letter code, country as a 3-letter ISO code and currency as another 3-letter ISO code. These codes are foreign keys into the States, Countries and Currencies tables, that have the primary key 2- or 3-letter code, a unique long name/description and possibly some other columns as well. All these 2- and 3-letter codes will be in indexes, and these indexes will be heavily used during inserts and updates (to verify the foreign key constraint) and in queries (because users generally prefer to see the full name of the state/country/currency instead of the code). Using varchar for a 2-letter code means that the space taken is doubled. This halves the number of rows that fit on one leaf page of the index. The result will be: more logical reads, lower cache hit ration, more physical reads --> slower performance. >> 2. Failing to appreciate the differences between varchar and char sets It's not a prime skill for DBAs. The abstraction level where the actual>> us on a path of lazy design. If it doesn't matter if I use char(10) vs >> varchar(10), then what's the matter with varchar(50)? How about >> varchar(51) etc? Eventually we could get away with "oh screw it; disks >> are cheap, memory is cheap, CPU's are powerful, let's just put >> everything in an text column and parse it on the fly". > >Failing to appreciate the differences between varchar and char sets is >an ability to raise the level of abstraction. (BTW, the skill many DBAs >lack.) data type is irrelevant is where you'll find the information analyst, functional designer, conceptual modeler or whatever the current name of those people is. It's not the DBA's task to check if they did their job properly. The DBA should ensure that the database runs smoothly, that as blocking and deadlocks are minimized and that queries perform as fast as possible. Especially the latter is impossible to do without appreciating the difference between char, nchar, varchar, nvarchar, text, and ntext. > With low level of abstraction you would never quit chaising No SQL Server DBA will ever try tio figure out a block size. (Remember>perceived problems (eg. trying to figure out the "optimal" block size, that this thread started in a SQL Server group and was crossposted to ..theory later!) >pondering if No need to ponder that -- all SQL Server DBAs (and presumably all DBAs> >select * from table > >is faster than > >select col1, col2, ... from table for all serious RDBMS's) know that SELECT * should never be used in production code (except in a EXISTS(..) subquery). Best, Hugo -- (Remove _NO_ and _SPAM_ to get my e-mail address) Hugo Kornelis wrote:
Show quote > On 18 Aug 2005 13:17:37 -0700, Mikito Harakiri wrote: Now, we are talking! Verifying the foreign key constraint requires a> > >Stu wrote: > >> Not that I totally disagree with the concept of deprecating char vs > >> varchar, but I see this argument used a lot, and there's two flaws with > >> it. > >> > >> 1. Storage size may be cheap and plentiful, but performance should > >> always be foremost in the DBA mind. 2 bytes in a single column pf > >> storage may not be much, but you also need to write queries that > >> retrieve that extra 2 bytes. It's not just disk space; it's also > >> memory and CPU. > > > >How much performance difference does it really make? Keep in mind that > >in a typical table you can declare char a couple of boolean (Y/N) > >columns at most. > > Hi Mikito, > > The extra performance cost of declaring Y/N columns varchar instead of > char is trivial compared to the extra cost of doing so for foreign key > columns. > > Most databases have lots of so-called "lookup tables". In the main > table, the state is stored as a 2-letter code, country as a 3-letter ISO > code and currency as another 3-letter ISO code. These codes are foreign > keys into the States, Countries and Currencies tables, that have the > primary key 2- or 3-letter code, a unique long name/description and > possibly some other columns as well. > > All these 2- and 3-letter codes will be in indexes, and these indexes > will be heavily used during inserts and updates (to verify the foreign > key constraint) and in queries (because users generally prefer to see > the full name of the state/country/currency instead of the code). > > Using varchar for a 2-letter code means that the space taken is doubled. > This halves the number of rows that fit on one leaf page of the index. > The result will be: more logical reads, lower cache hit ration, more > physical reads --> slower performance. unique index scan. If the index grows in size by factor of two, the number of levels might go up by 1, but typically would stay the same. In short, verifying a foreign key constraint would be 2 or 3 logical reads, in either case. Normally, upper levels of index are cached. I guess for country codes, the whole index is cached, so you are correct about country code PK index taking twice the space. Let's see 200 countries multipled by 2 bytes. 800 bytes versus 400 bytes! Show quote > >> 2. Failing to appreciate the differences between varchar and char sets Well, operating a machine indeed doesn't require abstract thinking. I> >> us on a path of lazy design. If it doesn't matter if I use char(10) vs > >> varchar(10), then what's the matter with varchar(50)? How about > >> varchar(51) etc? Eventually we could get away with "oh screw it; disks > >> are cheap, memory is cheap, CPU's are powerful, let's just put > >> everything in an text column and parse it on the fly". > > > >Failing to appreciate the differences between varchar and char sets is > >an ability to raise the level of abstraction. (BTW, the skill many DBAs > >lack.) > > It's not a prime skill for DBAs. The abstraction level where the actual > data type is irrelevant is where you'll find the information analyst, > functional designer, conceptual modeler or whatever the current name of > those people is. It's not the DBA's task to check if they did their job > properly. The DBA should ensure that the database runs smoothly, that as > blocking and deadlocks are minimized and that queries perform as fast as > possible. Especially the latter is impossible to do without appreciating > the difference between char, nchar, varchar, nvarchar, text, and ntext. never said it does. I'm just implying that without critical analysis of what you are doing, you would never be able to quit fighting the mess of real world. BTW, in my (admittedly very limited) SQL tuning experience, I have yet to see the case there the size of a datatype make any difference. On 18 Aug 2005 14:17:54 -0700, Mikito Harakiri wrote:
(snip quoteback) >Now, we are talking! Verifying the foreign key constraint requires a Hi Mikito,>unique index scan. If the index grows in size by factor of two, the >number of levels might go up by 1, but typically would stay the same. >In short, verifying a foreign key constraint would be 2 or 3 logical >reads, in either case. > >Normally, upper levels of index are cached. I guess for country codes, >the whole index is cached, so you are correct about country code PK >index taking twice the space. Let's see 200 countries multipled by 2 >bytes. 800 bytes versus 400 bytes! Ah, cynicism - I love it. I chose these examples because they are well known by everyone. My bad; I should have chosen more convincing (though less well-known) examples. Many companies use a short mnemonic code for their customers. The use of a short mnemonic code for products is not uncommon either. Imagine a customers table with 10,000 customers, a products table with 5,000 products and an orders table with a few million rows. Now would you prefer char(6) or varchar(6) for CustomerCode? And char(5) or varchar(5) for ProductCode? Another example: ticker symbols. Take a look at http://finance.yahoo.com To be able to generate all the graphs they offer, they have to have quite a few rows of historic quotes in their DB. How would you rate the overhead of char(5) vs varchar(5) for ticker symbol in a row that only has three columns: ticker, date/time and quote. Best, Hugo -- (Remove _NO_ and _SPAM_ to get my e-mail address) Hugo Kornelis wrote:
> Many companies use a short mnemonic code for their customers. The use of But if lookup table cardinality goes up, then, the storage factor> a short mnemonic code for products is not uncommon either. Imagine a > customers table with 10,000 customers, a products table with 5,000 > products and an orders table with a few million rows. Now would you > prefer char(6) or varchar(6) for CustomerCode? And char(5) or varchar(5) > for ProductCode? char(n) vs varchar(n) goes down! Perhaps, you can convince me that the effect of the two trailing bytes is not miniscule, as I previously thought, but it just can't be significant. Any benchmark demonstrating that the performance degradation is not in single percentage digits is welcome. >> Any benchmark demonstrating that the performance degradation is not in single percentage digits is welcome. << I am more worried about various numbers of extra blanks in theVARCHAR(n) making display problems. Trimming and padding are costly. Perhaps I'm misreading your point, but it's data in a Char column that
typically contains trailing blanks that need a Trim. A VarChar won't contain extraneous blanks unless the developer explicitly includes them in the value. Show quote "-CELKO-" <jcelko***@earthlink.net> wrote in message news:1124403285.842828.168210@g43g2000cwa.googlegroups.com... >>> Any benchmark demonstrating that the performance degradation is not in >>> single percentage digits is welcome. << > > I am more worried about various numbers of extra blanks in the > VARCHAR(n) making display problems. Trimming and padding are costly. > >> A VARCHAR(n) won't contain extraneous blanks unless the developer explicitly includes them in the value. << Not the developer, the data entry person who is probably an end userthese days. The DB guy's job is to add a constaint like "CHECK (TRIM (BOTH foobar) = UPPER (foobar))" to the column on his side. Now how often have you seen Newbies go to that trouble? Instead you see a lot of VARCHAR(50) -- notice the magic number fifty from ACCESS programmers -- columns which will eventually collect garbage, like 40 spaces and a period. Even worse, NVARCHAR(50) whcih collect garbage in Chinese! > Not the developer, the data entry person who is probably an end user Quite.> these days. The DB guy's job is to add a constaint like > "CHECK (TRIM (BOTH foobar) = UPPER (foobar))" to the column on his > side. Now how often have you seen Newbies go to that trouble? Last week I discovered a record in our "City" table called "A** HOLE" linked to a "CompanyLocation" record called "F*** YOU" and had a contact at that location named after some 1970's British speed-punk band, who was in charge of IBM mainframe products like "EAT SH**" I have also come across contacts with names like "[TYPE CONTACT NAME HERE]" and "#Error" (That's the funny one. #Error is what you get when you're in MS Access and there's a data binding issue) Working here is like living inside a Salvador Dali or MC Escher painting. -- Peace & happy computing, Mike Labosh, MCSD "Musha ring dum a doo dum a da!" -- James Hetfield My favorite was a Data Warehousing project migration with the part
number "I hate my job" repeated 9000 or so times. Jack Nicholson does data entry!! >> Working here is like living inside a Salvador Dali or MC Escher painting. << Nah! Escher and Dali had a pattern to their works -- projectivegeometery and math for Escher, Freudian Psychology for Dali. we are in Hieronymus Bosch! Crude, brutal views of Hell from the Middle Ages mixed with stupidity. > Nah! Escher and Dali had a pattern to their works -- projective I had never heard of this Hieronymus Bosch person so I Googled the name. He > geometery and math for Escher, Freudian Psychology for Dali. we are in > Hieronymus Bosch! Crude, brutal views of Hell from the Middle Ages > mixed with stupidity. is definately in need of some psychological help :) -- Peace & happy computing, Mike Labosh, MCSD "Musha ring dum a doo dum a da!" -- James Hetfield Mike Labosh wrote:
>>Nah! Escher and Dali had a pattern to their works -- projective I think a 490 year old corpse is a little beyond help at this stage (but >>geometery and math for Escher, Freudian Psychology for Dali. we are in >>Hieronymus Bosch! Crude, brutal views of Hell from the Middle Ages >>mixed with stupidity. >> >> > >I had never heard of this Hieronymus Bosch person so I Googled the name. He >is definately in need of some psychological help :) > > I agree that in his time he could have done with a good psychiatrist or 10). > Instead you see a lot of VARCHAR(50) -- notice the magic number fifty Would you prefer 42 or some other magic number for data of undeterminable > from ACCESS programmers -- columns which will eventually collect > garbage, like 40 spaces and a period. Even worse, NVARCHAR(50) whcih > collect garbage in Chinese! length? And how does the data type prevent users from entering garbage data? I've lost count of the number of times I've seen big corporate databases designed by those who hate Access with wierd problems, such as the assumption that a surname will always be less than an arbitrary number which forces many names to be entered as abbrieviations, or odd constructs such as 'Address Line 1', 'Address Line 2'... (confusing formatting with data and forever obstructing the possibility of analysis by region), or the assumption that all phone numbers must have a 2 digit area code and will never be mobiile or international numbers. Heck, there are people out there who think phone numbers are INTEGERS instead of STRINGS. Of course, it's not easy - say if you allow 50 for a locality, the longest possible placename is at least 3 times that (though probably not used in most cases). From the business perspective, you can't have the attitude that "n% of customers are lost to our business because their data won't fit in our database" (think the infamous example of the database designer who decided surnames must be at least 3 characters long and therefore prevented people with single letter surnames from obtaining drivers licenses). All of us need to be learning and a little more humble, methinks (and looking at some of the knowlege floating around in this group, that goes double for me). >From the business perspective, you can't have the attitude >that "n% Bravo!>of customers are lost to our business because their data won't fit >in our >database" (think the infamous example of the database designer >who decided >surnames must be at least 3 characters long and therefore >prevented people >with single letter surnames from obtaining drivers licenses). >All of us need to be learning and a little more humble, methinks >(and >looking at some of the knowlege floating around in this group, that >goes >double for me). On Fri, 19 Aug 2005 10:49:59 -0400, JT wrote:
>Perhaps I'm misreading your point, but it's data in a Char column that Hi JT,>typically contains trailing blanks that need a Trim. A VarChar won't contain >extraneous blanks unless the developer explicitly includes them in the >value. I think that Joe refers to the ANSI comparison rules for char and varchar: * Comparing two char (assume same length): Do character by character compare. * Comparing two varchar: First, find if actual length (not maximum length!) is different. Then, pad shortest string with spaces until it's just as long as the longest string. Finally, do character by character compare of longest string to padded version of shortest string. For comparison of char with different length, padding is also required, but this padding is independent of the actual data; it's the same for each row and the amount of padding can be determined in the compile phase. (But in joins, you should not compare two different-length char columns anyway!) Best, Hugo -- (Remove _NO_ and _SPAM_ to get my e-mail address)
Show quote
"Mikito Harakiri" <mikharakiri_nosp***@yahoo.com> wrote in message Surely, when developing a database application and performance tuning it news:1124402576.470238.211960@o13g2000cwo.googlegroups.com... > Hugo Kornelis wrote: >> Many companies use a short mnemonic code for their customers. The use of >> a short mnemonic code for products is not uncommon either. Imagine a >> customers table with 10,000 customers, a products table with 5,000 >> products and an orders table with a few million rows. Now would you >> prefer char(6) or varchar(6) for CustomerCode? And char(5) or varchar(5) >> for ProductCode? > > But if lookup table cardinality goes up, then, the storage factor > char(n) vs varchar(n) goes down! Perhaps, you can convince me that the > effect of the two trailing bytes is not miniscule, as I previously > thought, but it just can't be significant. Any benchmark demonstrating > that the performance degradation is not in single percentage digits is > welcome. > sqeezing every single performance point out of the database is worthwhile. Everytime that you make a decision which costs a percent of two of performance you degrade your database performance. Given enough of these compomises the database performance will drop significatly, I've seen this happen for real on databases with hundreds of thousands of rows, the performance degraded so much that the application became unusable. A few minor tweeks here and there and the program was running faster than it had ever run before. It may seem that it's a very academic kind of argument but in real terms the more pedantic and performance oriented the DBA is the better the final product will function. Yes, you can always throw more kit at a problem, but when you can litterally flick a switch or two and boost the performance it's worth doing. To do otherwise could be contrude as neglegence! Regards Colin Dawson www.cjdawson.com Mikito Harakiri wrote:
>>Many companies use a short mnemonic code for their customers. The use of I guess the char() is really just being used as a hint to the DBMS so it>>a short mnemonic code for products is not uncommon either. Imagine a >>customers table with 10,000 customers, a products table with 5,000 >>products and an orders table with a few million rows. Now would you >>prefer char(6) or varchar(6) for CustomerCode? And char(5) or varchar(5) >>for ProductCode? > > But if lookup table cardinality goes up, then, the storage factor > char(n) vs varchar(n) goes down! Perhaps, you can convince me that the > effect of the two trailing bytes is not miniscule, as I previously > thought, but it just can't be significant. Any benchmark demonstrating > that the performance degradation is not in single percentage digits is > welcome. can make an informed decision of what physical data structures to use. Maybe a better solution would be to have a single type ("string" or whatever) but then have a check constraint like len(column) < 6. So you could regard the char datatype as shorthand for a varchar datatype with a check constraint. The two ways of looking at it are functionally identical. I think in practice certain DBMSs will internally store a varchar(n) as a char(n) for sufficiently low values of n anyway. In theory, check constraints should be helpful to the DBMS, both for queries and for deciding physical storage structures. So there are three possibilities: 1) have char(n) and varchar(n) datatypes 2) scrap char(n) and just have varchar(n). The DBMS can decide to use different internal storage methods if n is small enough. 3) have varchar with no maximum length specified. Now if you forget to specify a maximum length via a check constraint, the DBMS isn't able to optimize things so well. You might have a similar argument with tinyint, smallint, int, bigint types. tinyint is really just an int column with a check constaint of 0 <= column <= 255 The question is: should the length constraint be part of the type or part of the database? Does it matter even? Paul. Paul. On 18 Aug 2005 15:02:56 -0700, Mikito Harakiri wrote:
Show quote >Hugo Kornelis wrote: Hi Mikito,>> Many companies use a short mnemonic code for their customers. The use of >> a short mnemonic code for products is not uncommon either. Imagine a >> customers table with 10,000 customers, a products table with 5,000 >> products and an orders table with a few million rows. Now would you >> prefer char(6) or varchar(6) for CustomerCode? And char(5) or varchar(5) >> for ProductCode? > >But if lookup table cardinality goes up, then, the storage factor >char(n) vs varchar(n) goes down! Perhaps, you can convince me that the >effect of the two trailing bytes is not miniscule, as I previously >thought, but it just can't be significant. Any benchmark demonstrating >that the performance degradation is not in single percentage digits is >welcome. Below is the script I used to compare performance. I compared three tasks: 1. Performing lots of single inserts (speed of inserts matters in databases that have to process thousands of new rows per second) 2. Adding an index (not an everyday task, but there are situations where a process can be sped up by adding an index, doing the process, then dropping the index again) 3. Generating a typical report that involves some joins (the CASE expressions I used may look silly, but imagine that I am using a payment status instead - also note that I store the results in a temporary table first, to make sure that network speed won't influence the results) You'll find the full code further below. But first the results: 1. For the inserts, the elapsed time on my machine was 118156 ms with char, 124406 ms with varchar. A degradation of over 5 %. 2. Adding the index took 2840 ms with char, 3236 ms with varchar. A performance degradation of almost 14 %. 3a. Generating the report with the index created in step 2 took 224 ms on average for char; 234 ms on average for varchar (I used three consecutive executions for both tests, each time starting with a clean cache). Performance degradation: 4.5 %. 3b. Generating the same report after dropping the supporting index showed an even bigger difference: avg 2281 for char; avg 3250 for varchar. Degradation: over 40 %. 1a. To top it off, I repeated the insertion test, but this time AFTER creating the extra index. For char, the time taken was 227936 ms; for varchar 313110 ms. Performance degradation: 37 %. So here's your choice: either you use the index, degrade the queries by "only" 4.5% but at the cost of slowing inserts down by 37%, or you discard the index to reverse these figures (inserts slowed down 5%; queries slowed down 40%). Or you choose char instead of varchar when you know that the length is more or less fixed. :-) Here's the code I used for this benchmark: -- Set up the tables CREATE TABLE Clients (ClientID char(6) NOT NULL, ClientName varchar(36) NOT NULL, PRIMARY KEY (ClientID) ) CREATE TABLE Products (ProdID char(5) NOT NULL, ProdName varchar(30) NOT NULL, PRIMARY KEY (ProdID) ) CREATE TABLE Orders (ClientID char(6) NOT NULL, ProdID char(5) NOT NULL, PRIMARY KEY (ClientID, ProdID), FOREIGN KEY (ClientID) REFERENCES Clients, FOREIGN KEY (ProdID) REFERENCES Products ) go -- Put some rows in the Clients table DECLARE @i int, @r int, @NumClients int SET @NumClients = 10000 SET @i = 0 WHILE @i < @NumClients BEGIN SET @r = CAST(@i AS bigint) * 308915776 / @NumClients INSERT Clients (ClientID, ClientName) VALUES(CHAR(65 + (@r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65 + ((@r / 676) % 26)) + CHAR(65 + ((@r / 17576) % 26)) + CHAR(65 + ((@r / 456976) % 26)) + CHAR(65 + ((@r / 11881376) % 26)), LEFT(CAST(NewID() as VARCHAR(36)), 10 + RAND() * 26)) SET @i = @i + 1 END select count(*) from Clients go -- Put some rows in the Products table DECLARE @i int, @r int, @NumProds int SET @NumProds = 5000 SET @i = 0 WHILE @i < @NumProds BEGIN SET @r = CAST(@i AS bigint) * 11881376 / @NumProds INSERT Products (ProdID, ProdName) VALUES(CHAR(65 + (@r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65 + ((@r / 676) % 26)) + CHAR(65 + ((@r / 17576) % 26)) + CHAR(65 + ((@r / 456976) % 26)), LEFT(CAST(NewID() as VARCHAR(36)), 5 + RAND() * 25)) SET @i = @i + 1 END select count(*) from Products go -- Have each client order up to 10 products DECLARE @Start datetime, @End datetime SET @Start = CURRENT_TIMESTAMP DECLARE @c int, @i int, @p int, @r int, @NumClients int, @NumProds int, @ClientID char(6), @ProdID char(5), @Amt tinyint SET @NumProds = 5000 SET @NumClients = 10000 SET @c = 0 WHILE @c < @NumClients BEGIN SET @r = CAST(@c AS bigint) * 308915776 / @NumClients SET @ClientID = CHAR(65 + (@r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65 + ((@r / 676) % 26)) + CHAR(65 + ((@r / 17576) % 26)) + CHAR(65 + ((@r / 456976) % 26)) + CHAR(65 + ((@r / 11881376) % 26)) SET @Amt = RAND() * 20 + 20 SET @i = 0 WHILE @i < @Amt BEGIN SET @r = CAST(RAND() * @NumProds AS bigint) * 11881376 / @NumProds SET @ProdID = CHAR(65 + (@r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65 + ((@r / 676) % 26)) + CHAR(65 + ((@r / 17576) % 26)) + CHAR(65 + ((@r / 456976) % 26)) INSERT Orders (ClientID, ProdID) SELECT @ClientID, @ProdID WHERE NOT EXISTS (SELECT * FROM Orders WHERE ClientID = @ClientID AND ProdID = @ProdID) SET @i = @i + 1 END SET @c = @c + 1 IF @c % 100 = 0 PRINT @c END SET @End = CURRENT_TIMESTAMP SELECT 'Generating orders', @Start, @End, DATEDIFF(ms, @Start, @End) AS Elapsed select count(*) from Orders go -- Add an index DECLARE @Start datetime, @End datetime SET @Start = CURRENT_TIMESTAMP CREATE INDEX Orders_ProdID ON Orders(ProdID) SET @End = CURRENT_TIMESTAMP SELECT 'Adding an index', @Start, @End, DATEDIFF(ms, @Start, @End) AS Elapsed go -- DROP INDEX Orders.Orders_ProdID -- Flush buffers CHECKPOINT DBCC DROPCLEANBUFFERS DBCC FREEPROCCACHE go -- Perform a typical query DECLARE @Start datetime, @End datetime SET @Start = CURRENT_TIMESTAMP SELECT p.ProdID, COUNT(CASE WHEN LEN(c.ClientName) < 20 THEN 1 END) AS ShortNameOrders, COUNT(CASE WHEN LEN(c.ClientName) > 30 THEN 1 END) AS LongNameOrders, COUNT(c.ClientName) AS TotalOrders INTO #Results FROM Products AS p INNER JOIN Orders AS o ON o.ProdID = p.ProdID INNER JOIN Clients AS c ON c.ClientID = o.ClientID WHERE p.ProdID LIKE 'D%' GROUP BY p.ProdID SET @End = CURRENT_TIMESTAMP SELECT 'Query with join', @Start, @End, DATEDIFF(ms, @Start, @End) AS Elapsed SELECT * FROM #Results DROP TABLE #Results go -- Clean up the mess DROP TABLE Orders DROP TABLE Products DROP TABLE Clients go Best, Hugo -- (Remove _NO_ and _SPAM_ to get my e-mail address) Hugo Kornelis wrote:
Show quote > On 18 Aug 2005 15:02:56 -0700, Mikito Harakiri wrote: Hugo,> > >Hugo Kornelis wrote: > >> Many companies use a short mnemonic code for their customers. The use of > >> a short mnemonic code for products is not uncommon either. Imagine a > >> customers table with 10,000 customers, a products table with 5,000 > >> products and an orders table with a few million rows. Now would you > >> prefer char(6) or varchar(6) for CustomerCode? And char(5) or varchar(5) > >> for ProductCode? > > > >But if lookup table cardinality goes up, then, the storage factor > >char(n) vs varchar(n) goes down! Perhaps, you can convince me that the > >effect of the two trailing bytes is not miniscule, as I previously > >thought, but it just can't be significant. Any benchmark demonstrating > >that the performance degradation is not in single percentage digits is > >welcome. > > Hi Mikito, > > Below is the script I used to compare performance. I compared three > tasks: > > 1. Performing lots of single inserts (speed of inserts matters in > databases that have to process thousands of new rows per second) > 2. Adding an index (not an everyday task, but there are situations where > a process can be sped up by adding an index, doing the process, then > dropping the index again) > 3. Generating a typical report that involves some joins (the CASE > expressions I used may look silly, but imagine that I am using a payment > status instead - also note that I store the results in a temporary table > first, to make sure that network speed won't influence the results) > > You'll find the full code further below. But first the results: > > 1. For the inserts, the elapsed time on my machine was 118156 ms with > char, 124406 ms with varchar. A degradation of over 5 %. > > 2. Adding the index took 2840 ms with char, 3236 ms with varchar. A > performance degradation of almost 14 %. > > 3a. Generating the report with the index created in step 2 took 224 ms > on average for char; 234 ms on average for varchar (I used three > consecutive executions for both tests, each time starting with a clean > cache). Performance degradation: 4.5 %. > 3b. Generating the same report after dropping the supporting index > showed an even bigger difference: avg 2281 for char; avg 3250 for > varchar. Degradation: over 40 %. > > 1a. To top it off, I repeated the insertion test, but this time AFTER > creating the extra index. For char, the time taken was 227936 ms; for > varchar 313110 ms. Performance degradation: 37 %. > > So here's your choice: either you use the index, degrade the queries by > "only" 4.5% but at the cost of slowing inserts down by 37%, or you > discard the index to reverse these figures (inserts slowed down 5%; > queries slowed down 40%). > > Or you choose char instead of varchar when you know that the length is > more or less fixed. :-) > > > Here's the code I used for this benchmark: > > -- Set up the tables > CREATE TABLE Clients > (ClientID char(6) NOT NULL, > ClientName varchar(36) NOT NULL, > PRIMARY KEY (ClientID) > ) > CREATE TABLE Products > (ProdID char(5) NOT NULL, > ProdName varchar(30) NOT NULL, > PRIMARY KEY (ProdID) > ) > CREATE TABLE Orders > (ClientID char(6) NOT NULL, > ProdID char(5) NOT NULL, > PRIMARY KEY (ClientID, ProdID), > FOREIGN KEY (ClientID) REFERENCES Clients, > FOREIGN KEY (ProdID) REFERENCES Products > ) > go > -- Put some rows in the Clients table > DECLARE @i int, @r int, @NumClients int > SET @NumClients = 10000 > SET @i = 0 > WHILE @i < @NumClients > BEGIN > SET @r = CAST(@i AS bigint) * 308915776 / @NumClients > INSERT Clients (ClientID, ClientName) > VALUES(CHAR(65 + (@r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65 > + ((@r / 676) % 26)) > + CHAR(65 + ((@r / 17576) % 26)) + CHAR(65 + ((@r / 456976) % > 26)) > + CHAR(65 + ((@r / 11881376) % 26)), > LEFT(CAST(NewID() as VARCHAR(36)), 10 + RAND() * 26)) > SET @i = @i + 1 > END > select count(*) from Clients > go > -- Put some rows in the Products table > DECLARE @i int, @r int, @NumProds int > SET @NumProds = 5000 > SET @i = 0 > WHILE @i < @NumProds > BEGIN > SET @r = CAST(@i AS bigint) * 11881376 / @NumProds > INSERT Products (ProdID, ProdName) > VALUES(CHAR(65 + (@r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65 > + ((@r / 676) % 26)) > + CHAR(65 + ((@r / 17576) % 26)) + CHAR(65 + ((@r / 456976) % > 26)), > LEFT(CAST(NewID() as VARCHAR(36)), 5 + RAND() * 25)) > SET @i = @i + 1 > END > select count(*) from Products > go > -- Have each client order up to 10 products > DECLARE @Start datetime, @End datetime > SET @Start = CURRENT_TIMESTAMP > DECLARE @c int, @i int, @p int, @r int, @NumClients int, @NumProds int, > @ClientID char(6), @ProdID char(5), @Amt tinyint > SET @NumProds = 5000 > SET @NumClients = 10000 > SET @c = 0 > WHILE @c < @NumClients > BEGIN > SET @r = CAST(@c AS bigint) * 308915776 / @NumClients > SET @ClientID = > CHAR(65 + (@r % 26)) + CHAR(65 + ((@r / 26) % 26)) + CHAR(65 > + ((@r / 676) % 26)) > + CHAR(65 + ((@r / 17576) % 26)) + CHAR(65 + ((@r / 456976) % > 26)) > + CHAR(65 + ((@r / 11881376) % 26)) > SET @Amt = RAND() * 20 + 20 > SET @i = 0 > WHILE @i < @Amt > BEGIN > SET @r = CAST(RAND() * @NumProds AS bigint) * 11881376 / > @NumProds > SET @ProdID = > CHAR(65 + (@r % 26)) + CHAR(65 + ((@r / 26) % 26)) + > CHAR(65 + ((@r / 676) % 26)) > + CHAR(65 + ((@r / 17576) % 26)) + CHAR(65 + ((@r / 456976) > % 26)) > INSERT Orders (ClientID, ProdID) > SELECT @ClientID, @ProdID > WHERE NOT EXISTS (SELECT * > FROM Orders > WHERE ClientID = @ClientID > AND ProdID = @ProdID) > SET @i = @i + 1 > END > SET @c = @c + 1 > IF @c % 100 = 0 > PRINT @c > END > SET @End = CURRENT_TIMESTAMP > SELECT 'Generating orders', @Start, @End, DATEDIFF(ms, @Start, @End) AS > Elapsed > select count(*) from Orders > go > -- Add an index > DECLARE @Start datetime, @End datetime > SET @Start = CURRENT_TIMESTAMP > CREATE INDEX Orders_ProdID ON Orders(ProdID) > SET @End = CURRENT_TIMESTAMP > SELECT 'Adding an index', @Start, @End, DATEDIFF(ms, @Start, @End) AS > Elapsed > go > -- DROP INDEX Orders.Orders_ProdID > -- Flush buffers > CHECKPOINT > DBCC DROPCLEANBUFFERS > DBCC FREEPROCCACHE > go > -- Perform a typical query > DECLARE @Start datetime, @End datetime > SET @Start = CURRENT_TIMESTAMP > SELECT p.ProdID, > COUNT(CASE WHEN LEN(c.ClientName) < 20 THEN 1 END) AS > ShortNameOrders, > COUNT(CASE WHEN LEN(c.ClientName) > 30 THEN 1 END) AS > LongNameOrders, > COUNT(c.ClientName) AS TotalOrders > INTO #Results > FROM Products AS p > INNER JOIN Orders AS o > ON o.ProdID = p.ProdID > INNER JOIN Clients AS c > ON c.ClientID = o.ClientID > WHERE p.ProdID LIKE 'D%' > GROUP BY p.ProdID > SET @End = CURRENT_TIMESTAMP > SELECT 'Query with join', @Start, @End, DATEDIFF(ms, @Start, @End) AS > Elapsed > SELECT * FROM #Results > DROP TABLE #Results > go > -- Clean up the mess > DROP TABLE Orders > DROP TABLE Products > DROP TABLE Clients > go I was going to perform this test in oracle, but then I came across the following passage: Tom... I was told once that if you only need a single char use CHAR(1) since using VARCHAR2(1) has overhead due to the Oracle having to maintain how long the value is in a VARCHAR2 field Is this a load of rubbish or is there some truth to it? Followup: that is rubbish. A char(n) is a varchar2(n) that is blank padded to it's maximum length. The implementation of a char physically on disk is identical to a varchar2 -- there is a length byte for a char(1) just as there is for a varchar2(1). http://www.jlcomp.demon.co.uk/faq/char_vs_varchar2.html 3. As opposed to a wide-spread misunderstanding, char (1) does *not* use less space in the database than varchar2 (1) when the value 'a' is stored because also char (1) has a length field. This can be proved by the dump () function. I guess this storage quirk doesn't really affect TPC-C benchmarks:) It might be interesting, however, to dig down into the insert performance difference. As I mentioned, 2 extra bytes shouldn't really affect the speed of index unique scan. BTW, I always use INTEGERs for id columns. On 19 Aug 2005 16:18:03 -0700, Mikito Harakiri wrote:
(snip) >Hugo, Hi Mikito,> >I was going to perform this test in oracle, but then I came across the >following passage: (snip) >I guess this storage quirk doesn't really affect TPC-C benchmarks:) I've never worked with Oracle, so I'll just have to take your word for it, I guess. >It might be interesting, however, to dig down into the insert I think the culprint for the inserts is the extra index. The size of one>performance difference. As I mentioned, 2 extra bytes shouldn't really >affect the speed of index unique scan. entry in the index grows from 11 bytes to 15 bytes (a large percentage). And that will mean that less entries fit into one page - and since the inserts are sequential for the PRIMARY KEY, but non-sequential for the UNIQUE, the supporting index for the UNIQUE constraint will face lots of page splits. I include these two statements in the main loop for the inserts to have some indication that the system is still busy: >> IF @c % 100 = 0 If you run the script, you'll see that the speed at which the counter>> PRINT @c increase will graudually slow down as the number of index pages grows. >BTW, I always use INTEGERs for id columns. Yeah, I guess I should have named the key columns ClientCode andProdCode instead of ClientID and ProdID. :-) Best, Hugo -- (Remove _NO_ and _SPAM_ to get my e-mail address) Hugo Kornelis wrote:
> On 18 Aug 2005 13:17:37 -0700, Mikito Harakiri wrote: My bad. I meant> >pondering if > > > >select * from table > > > >is faster than > > > >select col1, col2, ... from table > > No need to ponder that -- all SQL Server DBAs (and presumably all DBAs > for all serious RDBMS's) know that SELECT * should never be used in > production code (except in a EXISTS(..) subquery). select count(1) from table vs. select count(*) from table BTW, you triggered the other example: is EXISTS or IN faster? This question could come up only from somebody who is completely unaware of SQL expression equivalency and query rewrite. Well, making sure the extents and segments are layed out on disk properly, leaves little room for education and abstract thinking. Mikito Harakiri wrote:
Show quote >Hugo Kornelis wrote: "select count(1) from table" and "select count(*) from table" will both > > >>On 18 Aug 2005 13:17:37 -0700, Mikito Harakiri wrote: >> >> >>>pondering if >>> >>>select * from table >>> >>>is faster than >>> >>>select col1, col2, ... from table >>> >>> >>No need to ponder that -- all SQL Server DBAs (and presumably all DBAs >>for all serious RDBMS's) know that SELECT * should never be used in >>production code (except in a EXISTS(..) subquery). >> >> > >My bad. I meant > >select count(1) from table > >vs. > >select count(*) from table > > come up with the same execution plan. They will both count the number of entries in the narrowest index on that table. That's, perhaps, a bad example as the optimiser is specifically designed to deal with that case I believe. See example below (both cases require 227 logical reads and return 121371 rows in this example): StmtText ------------------------------------------- select count(1) from dbo.SalesOrderDetail (1 row(s) affected) StmtText ------------------------------------------------------------------------------------------------------------------ |--Compute Scalar(DEFINE:([Expr1002]=Convert([Expr1003]))) Scan(OBJECT:([AdventureWorks2000].[dbo].[SalesOrderDetail].[IX_SalesOrderDetail_ProductID]))|--Stream Aggregate(DEFINE:([Expr1003]=Count(*))) |--Index (3 row(s) affected) StmtText ----------------------------------------------- select count(*) from dbo.SalesOrderDetail (1 row(s) affected) StmtText ------------------------------------------------------------------------------------------------------------------ |--Compute Scalar(DEFINE:([Expr1002]=Convert([Expr1003]))) Scan(OBJECT:([AdventureWorks2000].[dbo].[SalesOrderDetail].[IX_SalesOrderDetail_ProductID]))|--Stream Aggregate(DEFINE:([Expr1003]=Count(*))) |--Index (3 row(s) affected) >BTW, you triggered the other example: is EXISTS or IN faster? This The EXISTS() predicate is typically a fairly efficient predicate because >question could come up only from somebody who is completely unaware of >SQL expression equivalency and query rewrite. Well, making sure the >extents and segments are layed out on disk properly, leaves little room >for education and abstract thinking. > > it only needs to scan until it gets a match, at which time it returns. The worst case scenario (it finds a match on the last physical row, or it doesn't find any matching row) is the same I/O as the IN() predicate case because IN() will evaluate the entire subquery. Mike Hodgson wrote:
> The EXISTS() predicate is typically a fairly efficient predicate because Why does IN() need to evaluate the entire subquery? Couldn't it in> it only needs to scan until it gets a match, at which time it returns. > The worst case scenario (it finds a match on the last physical row, or > it doesn't find any matching row) is the same I/O as the IN() predicate > case because IN() will evaluate the entire subquery. theory work exactly the same as EXISTS() at the physical level? Paul. Paul wrote:
> Mike Hodgson wrote: Yes it could, and indeed does (Oracle 9i). In the following example,> > The EXISTS() predicate is typically a fairly efficient predicate because > > it only needs to scan until it gets a match, at which time it returns. > > The worst case scenario (it finds a match on the last physical row, or > > it doesn't find any matching row) is the same I/O as the IN() predicate > > case because IN() will evaluate the entire subquery. > > Why does IN() need to evaluate the entire subquery? Couldn't it in > theory work exactly the same as EXISTS() at the physical level? IN and EXISTS are processed the same way, and DO NOT evaluate the entire subquery: SQL> create table t1 as select object_id, object_name from all_objects; Table created. SQL> alter table t1 add constraint t1_pk primary key (object_id); Table altered. SQL> create table t2 as select object_id, object_name from all_objects where rownum=1; Table created. SQL> alter table t2 add constraint t2_pk primary key (object_id); Table altered. SQL> analyze table t1 compute statistics; Table analyzed. SQL> analyze table t2 compute statistics; Table analyzed. SQL> select count(*) from t1; COUNT(*) ---------- 47355 SQL> select count(*) from t2; COUNT(*) ---------- 1 SQL> set autotrace on SQL> select * from t2 where object_id in (select object_id from t1); OBJECT_ID OBJECT_NAME ---------- ------------------------------ 18164 /1005bd30_LnkdConstant Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=30) 1 0 NESTED LOOPS (Cost=1 Card=1 Bytes=30) 2 1 TABLE ACCESS (FULL) OF 'T2' (Cost=1 Card=1 Bytes=26) 3 1 INDEX (UNIQUE SCAN) OF 'T1_PK' (UNIQUE) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 3 physical reads 0 redo size 227 bytes sent via SQL*Net to client 314 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed SQL> select * from t2 where exists (select null from t1 where t1.object_id = t2.object_id); OBJECT_ID OBJECT_NAME ---------- ------------------------------ 18164 /1005bd30_LnkdConstant Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=26) 1 0 FILTER 2 1 TABLE ACCESS (FULL) OF 'T2' (Cost=1 Card=1 Bytes=26) 3 1 INDEX (UNIQUE SCAN) OF 'T1_PK' (UNIQUE) (Cost=1 Card=1 B ytes=4) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 5 consistent gets 0 physical reads 0 redo size 227 bytes sent via SQL*Net to client 314 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed Perhaps it's an implementation detail of Oracle then (I'm not very
familiar with how Oracle performs the various operations). What would happen if you had slightly more realistic data (like both tables with more than 1 row in them)? For example, say you had: InsurancePolicies {policy_id, broker_id, inception_date, ...} - (80000 rows) clustered index on policy_id (don't know what Oracle parlance for "clustered index" is) InsuranceBrokers {broker_id, broker_name, ...} - (200 rows) clustered index on broker_id then you said: select * from InsuranceBrokers b where broker_id in ( select distinct broker_id from InsurancePolicies p where p.inception_date > '20050101' ) select * from InsuranceBrokers b where exists ( select * from InsurancePolicies p where p.broker_id = b.broker_id and p.inception_date > '20050101' ) Both queries should return the same data, namely all the brokers who own 1 or more policies that started this year. Now, with the IN() query, if the physical data in InsurancePolicies is sorted by policy_id, then how does the query engine know it's got all of the policies that started this year unless it goes through every single row in InsurancePolicies? With the EXISTS() version, as soon as the query engine finds that the broker in question owns a single policy that started this year it would stop trawling through the 80000 row policy table. Best case scenario for EXISTS(), the first policy row for that broker started this year so that broker is included in the result set (scan 1 row out of 80000); worst case scenario, the only policy the broker owns that started this year was created yesterday (and so has the greatest policy_id and so is last in the physical order of rows in the table - ie. full index scan; scan 80000 rows out of 80000). For all cases for IN() the query engine needs to go through every policy row that (that started this year) to compile the distinct list to present back to the outer query - i.e. full index scan. Perhaps Oracle have done some particular optimisations in that area, but I believe that's the way Microsoft deal with it. Bit of a dumb example really because an inner join would be the best way to write that query anyway (well it would in SQL Server - I assume the same would hold true for Oracle) but it's the simplest example my poor tired brain would come up with at 10:30 on a Friday night. Show quote >Paul wrote: > > >>Mike Hodgson wrote: >> >> >>>The EXISTS() predicate is typically a fairly efficient predicate because >>>it only needs to scan until it gets a match, at which time it returns. >>>The worst case scenario (it finds a match on the last physical row, or >>>it doesn't find any matching row) is the same I/O as the IN() predicate >>>case because IN() will evaluate the entire subquery. >>> >>> >>Why does IN() need to evaluate the entire subquery? Couldn't it in >>theory work exactly the same as EXISTS() at the physical level? >> >> > >Yes it could, and indeed does (Oracle 9i). In the following example, >IN and EXISTS are processed the same way, and DO NOT evaluate the >entire subquery: > >SQL> create table t1 as select object_id, object_name from all_objects; > >Table created. > >SQL> alter table t1 add constraint t1_pk primary key (object_id); > >Table altered. > >SQL> create table t2 as select object_id, object_name from all_objects >where rownum=1; > >Table created. > >SQL> alter table t2 add constraint t2_pk primary key (object_id); > >Table altered. > >SQL> analyze table t1 compute statistics; > >Table analyzed. > >SQL> analyze table t2 compute statistics; > >Table analyzed. > >SQL> select count(*) from t1; > > COUNT(*) >---------- > 47355 > >SQL> select count(*) from t2; > > COUNT(*) >---------- > 1 > >SQL> set autotrace on >SQL> select * from t2 where object_id in (select object_id from t1); > > OBJECT_ID OBJECT_NAME >---------- ------------------------------ > 18164 /1005bd30_LnkdConstant > > >Execution Plan >---------------------------------------------------------- > 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=30) > 1 0 NESTED LOOPS (Cost=1 Card=1 Bytes=30) > 2 1 TABLE ACCESS (FULL) OF 'T2' (Cost=1 Card=1 Bytes=26) > 3 1 INDEX (UNIQUE SCAN) OF 'T1_PK' (UNIQUE) > > > > >Statistics >---------------------------------------------------------- > 0 recursive calls > 0 db block gets > 5 consistent gets > 3 physical reads > 0 redo size > 227 bytes sent via SQL*Net to client > 314 bytes received via SQL*Net from client > 2 SQL*Net roundtrips to/from client > 0 sorts (memory) > 0 sorts (disk) > 1 rows processed > >SQL> select * from t2 where exists (select null from t1 where >t1.object_id = t2.object_id); > > OBJECT_ID OBJECT_NAME >---------- ------------------------------ > 18164 /1005bd30_LnkdConstant > > >Execution Plan >---------------------------------------------------------- > 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=26) > 1 0 FILTER > 2 1 TABLE ACCESS (FULL) OF 'T2' (Cost=1 Card=1 Bytes=26) > 3 1 INDEX (UNIQUE SCAN) OF 'T1_PK' (UNIQUE) (Cost=1 Card=1 B > ytes=4) > > > > > >Statistics >---------------------------------------------------------- > 0 recursive calls > 0 db block gets > 5 consistent gets > 0 physical reads > 0 redo size > 227 bytes sent via SQL*Net to client > 314 bytes received via SQL*Net from client > 2 SQL*Net roundtrips to/from client > 0 sorts (memory) > 0 sorts (disk) > 1 rows processed > > > Mike Hodgson wrote:
Show quote > select * from InsuranceBrokers b The optimizer should be able to recognise that the distinct is> where broker_id in > ( > select distinct broker_id from InsurancePolicies p > where p.inception_date > '20050101' > ) .... > Best case scenario for EXISTS(), the first policy row for that broker > started this year so that broker is included in the result set (scan 1 > row out of 80000); worst case scenario, the only policy the broker owns > that started this year was created yesterday (and so has the greatest > policy_id and so is last in the physical order of rows in the table - > ie. full index scan; scan 80000 rows out of 80000). For all cases for > IN() the query engine needs to go through every policy row that (that > started this year) to compile the distinct list to present back to the > outer query - i.e. full index scan. unnecessary, so it wouldn't need to go through the whole table. Just because the subquery uses "distinct" it doesn't mean that the DBMS must materialise that internally if it's not necessary. Though I guess index statistics may cause it to choose this option if it thinks it would be faster. If you think about it, both queries are logically identical so it would be possible (at least in theory) for them to use identical query plans, if the query optimizer is clever enough. Paul. Mike Hodgson wrote:
Show quote > Perhaps it's an implementation detail of Oracle then (I'm not very Here is the best simulation of your example I can come up with at the> familiar with how Oracle performs the various operations). > > What would happen if you had slightly more realistic data (like both > tables with more than 1 row in them)? For example, say you had: > InsurancePolicies {policy_id, broker_id, inception_date, ...} - (80000 > rows) clustered index on policy_id (don't know what Oracle parlance for > "clustered index" is) > InsuranceBrokers {broker_id, broker_name, ...} - (200 rows) clustered > index on broker_id > > then you said: > > select * from InsuranceBrokers b > where broker_id in > ( > select distinct broker_id from InsurancePolicies p > where p.inception_date > '20050101' > ) > > select * from InsuranceBrokers b > where exists > ( > select * from InsurancePolicies p > where p.broker_id = b.broker_id > and p.inception_date > '20050101' > ) > > Both queries should return the same data, namely all the brokers who own > 1 or more policies that started this year. Now, with the IN() query, > if the physical data in InsurancePolicies is sorted by policy_id, then > how does the query engine know it's got all of the policies that started > this year unless it goes through every single row in InsurancePolicies? > With the EXISTS() version, as soon as the query engine finds that the > broker in question owns a single policy that started this year it would > stop trawling through the 80000 row policy table. > > Best case scenario for EXISTS(), the first policy row for that broker > started this year so that broker is included in the result set (scan 1 > row out of 80000); worst case scenario, the only policy the broker owns > that started this year was created yesterday (and so has the greatest > policy_id and so is last in the physical order of rows in the table - > ie. full index scan; scan 80000 rows out of 80000). For all cases for > IN() the query engine needs to go through every policy row that (that > started this year) to compile the distinct list to present back to the > outer query - i.e. full index scan. Perhaps Oracle have done some > particular optimisations in that area, but I believe that's the way > Microsoft deal with it. > > Bit of a dumb example really because an inner join would be the best way > to write that query anyway (well it would in SQL Server - I assume the > same would hold true for Oracle) but it's the simplest example my poor > tired brain would come up with at 10:30 on a Friday night. moment. It shows that Oracle is indeed choosing different plans this time, though the elapsed time is similar either way (EXISTS slightly faster, 0.3 secs rather than 0.4) >From the Oracle docs, it appears that "where x in (select PKCOL from y)" is a special case, that can be optimized into a join.SQL> select count(*) from insurance_brokers; COUNT(*) ---------- 67 SQL> select count(*), count(distinct broker_id) from insurance_policies; COUNT(*) COUNT(DISTINCTBROKER_ID) ---------- ------------------------ 30881 30 SQL> select * from Insurance_Brokers b 2 where broker_id in 3 ( 4 select distinct broker_id from Insurance_Policies p 5 where p.inception_date > date '2005-01-01' 6 ); BROKER_ID BROKER_NAME ---------- ------------------------------ 0 SYS 5 SYSTEM 11 OUTLN 18 DBSNMP 20 WMSYS 41 AURORA$JIS$UTILITY$ 29 ORDSYS 30 ORDPLUGINS 31 MDSYS 32 CTXSYS 34 XDB 38 FSC 39 RB 42 OSE$HTTP$ADMIN 56 PORTAL30 121 FLOWS_010600 120 FLOWS_FILES 58 PORTAL30_SSO 60 PORTAL30_SSO_PS 61 PORTAL30_DEMO 62 SCOTT 118 AFOSTER 71 WORKFLOW 122 SX3COM 128 TOAD 132 BMS1 134 DESDIR 141 JREED 145 NARROW 146 ARROW 30 rows selected. Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=94 Card=30 Bytes=720 ) 1 0 HASH JOIN (Cost=94 Card=30 Bytes=720) 2 1 VIEW OF 'VW_NSO_1' (Cost=92 Card=30 Bytes=390) 3 2 SORT (UNIQUE) (Cost=92 Card=30 Bytes=270) 4 3 TABLE ACCESS (FULL) OF 'INSURANCE_POLICIES' (Cost=9 Card=30881 Bytes=277929) 5 1 TABLE ACCESS (FULL) OF 'INSURANCE_BROKERS' (Cost=1 Card= 67 Bytes=737) SQL> select * from Insurance_Brokers b 2 where exists 3 ( 4 select * from Insurance_Policies p 5 where p.broker_id = b.broker_id 6 and p.inception_date > date '2005-01-01' 7 ); BROKER_ID BROKER_NAME ---------- ------------------------------ 0 SYS 5 SYSTEM 11 OUTLN 18 DBSNMP 20 WMSYS 41 AURORA$JIS$UTILITY$ 29 ORDSYS 30 ORDPLUGINS 31 MDSYS 32 CTXSYS 34 XDB 38 FSC 39 RB 42 OSE$HTTP$ADMIN 56 PORTAL30 121 FLOWS_010600 120 FLOWS_FILES 58 PORTAL30_SSO 60 PORTAL30_SSO_PS 61 PORTAL30_DEMO 62 SCOTT 118 AFOSTER 71 WORKFLOW 122 SX3COM 128 TOAD 132 BMS1 134 DESDIR 141 JREED 145 NARROW 146 ARROW 30 rows selected. Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=3 Bytes=33) 1 0 FILTER 2 1 TABLE ACCESS (FULL) OF 'INSURANCE_BROKERS' (Cost=1 Card= 3 Bytes=33) 3 1 TABLE ACCESS (BY INDEX ROWID) OF 'INSURANCE_POLICIES' (C ost=6 Card=1029 Bytes=9261) 4 3 INDEX (RANGE SCAN) OF 'IP_IB' (NON-UNIQUE) (Cost=2 Car d=1029) On 18 Aug 2005 15:13:28 -0700, Mikito Harakiri wrote:
Show quote >Hugo Kornelis wrote: Hi Mikito,>> On 18 Aug 2005 13:17:37 -0700, Mikito Harakiri wrote: >> >pondering if >> > >> >select * from table >> > >> >is faster than >> > >> >select col1, col2, ... from table >> >> No need to ponder that -- all SQL Server DBAs (and presumably all DBAs >> for all serious RDBMS's) know that SELECT * should never be used in >> production code (except in a EXISTS(..) subquery). > >My bad. I meant > >select count(1) from table > >vs. > >select count(*) from table As Mike said: absolutely no difference between the two. >BTW, you triggered the other example: is EXISTS or IN faster? This SQL Server will often produce the same execution for both versions. If>question could come up only from somebody who is completely unaware of >SQL expression equivalency and query rewrite. performance is really critical, always test all versions. If performance is important but not criticat, use EXISTS - I've seen cases where it's faster than IN, but I haven't seen the reverse yet. > Well, making sure the Huh? I don't know what your DMBS of choice is, but SQL Server doesn't>extents and segments are layed out on disk properly, leaves little room >for education and abstract thinking. bother the DBA with extents and segments. Best, Hugo -- (Remove _NO_ and _SPAM_ to get my e-mail address) [...removed comp.databases.theory...]
>> As Mike said: absolutely no difference between the two. However, COUNT comes in different flavors, esp. with NULLs, it is inconsistent at best: COUNT( column ) COUNT( * ) -- for the whole resultset COUNT( constant ) --- this is silly COUNT( some column expression ) COUNT( some column expression involving NULLs ) -- weird results, changes with datatypes! For instance with numeric expressions & values. SELECT COUNT(*), COUNT( 2 ), COUNT( c * 2 + 1 ), COUNT( c * 2 + NULL ), COUNT( c ) FROM ( SELECT 1 UNION SELECT NULL UNION SELECT 2 ) N ( c ) ; >> SQL Server will often produce the same execution for both versions. First of all, not all statements with EXISTS() can be re-written using IN() or vice versa. For simple one-liners and straight forward queries involving a few tables, the plans are often comparable but in certain complex scenarios, with deeply nested SQL expressions this varies significantly. >> If performance is really critical, always test all versions. If I have seen it, though rarely esp. with small tables, when there are >> performance is important but not criticat, use EXISTS - I've seen cases >> where it's faster than IN, but I haven't seen the reverse yet. correlated subqueries nested deeply. Another case is a derived table with hard-wired values can sometimes generate plans which favors IN() over EXISTS(). So as you said, testing & finding it out is the only solution. >> Huh? I don't know what your DMBS of choice is, but SQL Server doesn't True; perhaps DBCC commands, but in reality it can be a benefit rather than >> bother the DBA with extents and segments. botheration. -- Anith On 18 Aug 2005 15:13:28 -0700, Mikito Harakiri wrote:
(snip) >BTW, you triggered the other example: is EXISTS or IN faster? Woops - forgot to add this to my previous reply:Before thinking about speed, you should think about the difference between the two. IN can result in UNKNOWN; EXISTS can only result in TRUE or FALSE. Since UNKNOWN and FALSE are treated the same in a WHERE, WHEN, or HAVING clause, many people doon't notice the difference - until they start combining IN with NOT and getting unexpected results! Only choose the faster version if they truly are equal!! Best, Hugo -- (Remove _NO_ and _SPAM_ to get my e-mail address) Hugo Kornelis wrote:
Show quote > On 18 Aug 2005 15:13:28 -0700, Mikito Harakiri wrote: This tiny difference is really a pain in the butt. This UNNKNOWN based> > (snip) > >BTW, you triggered the other example: is EXISTS or IN faster? > > Woops - forgot to add this to my previous reply: > > Before thinking about speed, you should think about the difference > between the two. IN can result in UNKNOWN; EXISTS can only result in > TRUE or FALSE. Since UNKNOWN and FALSE are treated the same in a WHERE, > WHEN, or HAVING clause, many people doon't notice the difference - until > they start combining IN with NOT and getting unexpected results! > > Only choose the faster version if they truly are equal!! logic is rubbish, and any query involving NULLs still could give a surprising result. Therefore, the life would be much simpler if SQL standard just defined NOT IN and NOT EXIST to be identical. After all, they both are just dumbed down aggregate scalar subquery expressions: --IN, EXIST select * from dept d where 0<(select count(*) from emp where dept.deptno=emp.deptno) --NOT IN, NOT EXIST select * from dept d where 0=(select count(*) from emp where dept.deptno=emp.deptno) (both also known as semijoin and antijoin). Aggregate scalar subquery syntax is more general, and generic solution always rule. On Thu, 18 Aug 2005 22:50:33 +0200, Hugo Kornelis
<hugo@pe_NO_rFact.in_SPAM_fo> wrote: [snip] >The abstraction level where the actual The current name appears to be "Java programmer". <cough>>data type is irrelevant is where you'll find the information analyst, >functional designer, conceptual modeler or whatever the current name of >those people is. [snip] -- mike sherrill >> You have varchar2 limited to 4000 bytes. Then you have text to cover 4K to 2G range. Then you have to rely onsome other option to be able store data bigger than 2G (split it into chunks in your application????). If you suggested such a design to programmic language community, you would be laughed at. << oh really? then how comes C# has 8 types to store integers only, not counting decimal type? Is it funny too? >> In fact, those ugly length limited datatypes are just artifacts of the early SQL days,<< on the contrary, it is a very powerful tool in enforsing data integrity
Show quote
On 19 Aug 2005 10:22:41 -0700, "AK" <AK_TIREDOFSPAM@hotmail.COM> Is there anything different between a string 4096 long and awrote: >>> You have varchar2 limited to 4000 bytes. >Then you have text to cover 4K to 2G range. Then you have to rely on >some other option to be able store data bigger than 2G (split it into >chunks in your application????). > >If you suggested such a design to programmic language community, you >would be laughed at. ><< > >oh really? then how comes C# has 8 types to store integers only, not >counting decimal type? Is it funny too? string 4097 long? Are there any different machine instructions that are required to handle them? No? What about with integers? Yes? There is your difference. >>> In fact, those ugly length Oh? How? Data values have very little to do the difference>limited datatypes are just artifacts of the early SQL days, ><< > >on the contrary, it is a very powerful tool in enforsing data integrity between 4096 and 4097 long strings. Sincerely, Gene Wirchenko >> Oh? How?>>> In fact, those ugly length >limited datatypes are just artifacts of the early SQL days, ><< >on the contrary, it is a very powerful tool in enforsing data integrity << by declaring STATE_CODE CHAR(2) I don't let old style abbreviations 'Fla' and 'Ill' in by declaring COUNTRY_CODE CHAR(3), etc. On 19 Aug 2005 12:31:15 -0700, "AK" <AK_TIREDOFSPAM@hotmail.COM> [??? wrote:]wrote: >>>> In fact, those ugly length [AK wrote:]>>limited datatypes are just artifacts of the early SQL days, >>on the contrary, it is a very powerful tool in enforsing data integrity [Gene Wirchenko wrote:]> Oh? How? True, but that was not the point. The point was that datatypes>by declaring STATE_CODE CHAR(2) I don't let old style abbreviations >'Fla' and 'Ill' in > >by declaring COUNTRY_CODE CHAR(3), etc. limited to arbirary characters are artifacts. (Note: "datatypes" not "columns". Your usage is good.) (Please do not strip attributions when you quote someone.) Sincerely, Gene Wirchenko AK wrote:
Show quote > >> Why not> >>> In fact, those ugly length > >limited datatypes are just artifacts of the early SQL days, > ><< > > >on the contrary, it is a very powerful tool in enforsing data integrity > > > > Oh? How? > << > > by declaring STATE_CODE CHAR(2) I don't let old style abbreviations > 'Fla' and 'Ill' in > > by declaring COUNTRY_CODE CHAR(3), etc. CHECK CONSTRAINT(LENGTH(COUNTRY_CODE)<=2) BTW, this way you could also declare that COUNTRY_CODE has exactly 2 characters. On Fri, 19 Aug 2005 12:00:15 -0700, Gene Wirchenko wrote:
Show quote >On 19 Aug 2005 10:22:41 -0700, "AK" <AK_TIREDOFSPAM@hotmail.COM> Hi Gene,>wrote: > >>>> You have varchar2 limited to 4000 bytes. >>Then you have text to cover 4K to 2G range. Then you have to rely on >>some other option to be able store data bigger than 2G (split it into >>chunks in your application????). >> >>If you suggested such a design to programmic language community, you >>would be laughed at. >><< >> >>oh really? then how comes C# has 8 types to store integers only, not >>counting decimal type? Is it funny too? > > Is there anything different between a string 4096 long and a >string 4097 long? Are there any different machine instructions that >are required to handle them? No? > > What about with integers? Yes? > > There is your difference. So what exactly is the difference between a number that might reach 30,000 or a number that might reach 40,000? And how exactly is that difference different from the difference between a 4096 character string and 4097 character string? Best, Hugo -- (Remove _NO_ and _SPAM_ to get my e-mail address) On Sat, 20 Aug 2005 00:49:14 +0200, Hugo Kornelis
<hugo@pe_NO_rFact.in_SPAM_fo> wrote: [snip] >So what exactly is the difference between a number that might reach Different machine instructions.>30,000 or a number that might reach 40,000? And how exactly is that >difference different from the difference between a 4096 character string This is just iterating a little more on the longer string.>and 4097 character string? Sincerely, Gene Wirchenko On Mon, 22 Aug 2005 14:00:27 -0700, Gene Wirchenko wrote:
>On Sat, 20 Aug 2005 00:49:14 +0200, Hugo Kornelis Hi Gene,><hugo@pe_NO_rFact.in_SPAM_fo> wrote: > >[snip] > >>So what exactly is the difference between a number that might reach >>30,000 or a number that might reach 40,000? And how exactly is that > > Different machine instructions. Isn't an RDBMS all about abstracting away from implementation details? >>difference different from the difference between a 4096 character string Toppling the size needed just over the maximum amount reserved for it in>>and 4097 character string? > > This is just iterating a little more on the longer string. specific internal structures. Or, put more simply, it takes other machine instructions to handle. Best, Hugo -- (Remove _NO_ and _SPAM_ to get my e-mail address) On Tue, 23 Aug 2005 00:38:01 +0200, Hugo Kornelis
<hugo@pe_NO_rFact.in_SPAM_fo> wrote: >On Mon, 22 Aug 2005 14:00:27 -0700, Gene Wirchenko wrote: And here I thought it was about storing/retrieving checked data.> >>On Sat, 20 Aug 2005 00:49:14 +0200, Hugo Kornelis >><hugo@pe_NO_rFact.in_SPAM_fo> wrote: >> >>[snip] >> >>>So what exactly is the difference between a number that might reach >>>30,000 or a number that might reach 40,000? And how exactly is that >> >> Different machine instructions. >Isn't an RDBMS all about abstracting away from implementation details? Whether one uses, for example, a string or an integer for a particular column is an implementation detail. >>>difference different from the difference between a 4096 character string "more" not "other".>>>and 4097 character string? >> >> This is just iterating a little more on the longer string. > >Toppling the size needed just over the maximum amount reserved for it in >specific internal structures. Or, put more simply, it takes other >machine instructions to handle. Sincerely, Gene Wirchenko >>Toppling the size needed just over the maximum amount reserved for it in SHEESH!>>specific internal structures. Or, put more simply, it takes other >>machine instructions to handle. > > "more" not "other". I meant to pass on some amusement last week. I cannot believe you people are still arguing about the miniscule details. -- Peace & happy computing, Mike Labosh, MCSD "Musha ring dum a doo dum a da!" -- James Hetfield "Mike Labosh" <mlab***@hotmail.com> wrote in message SQL experts... give them a bone and they'll chew on it for days.news:%2387KehAqFHA.1136@TK2MSFTNGP12.phx.gbl... > SHEESH! > > I meant to pass on some amusement last week. I cannot believe you people > are still arguing about the miniscule details. | |||||||||||||||||||||||