|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Optimizing an IN clauseMembersTable.MemberID IN ( SELECT ZipcodesTable.MemberID FROM ZipcodesTable.Zipcode IN ( '01234','03631','55902' ... '03036' ) That is, it's looking for entries in the ZipcodeTable where the Zipcode value is any one of a very large set of zipcodes, up to 500 Zipcodes. My Zipcode field is a 5 character field. Would the query run faster if I made the Zipcode field an int instead of 5 chars? Are there other ways to speed up the zipcode-matching part of my query? - Roger Garrett Changing to integer is not possible in this case, cause you got leading
zero in your zip code. Converting them to INT would trim them away. The best way to speed this up would be top store the ZIP codes in a table to join them. It would be even more easy to manage. HTH, Jens Suessmeyer. Jens,
I don't see that leading zeros would cause any problem. I'm suggesting that I change the Zipcodes column from char 5 to int and to just store the numerical integer values of the zipcodes rather than the 5-character string. A zipcode of "03036" would become a 3036 in the column, and when I'm searching for an "03036" it would match with the 3036 value. Of course, I would specify numeric values (e.g. 03036) rathet than quoted strings ('03036'). I'm assuming that a 5-char column occupies at least 6 bytes (to make it on an even byte boundary) and that an int occupies 4 bytes. That at a minium saves some space in the database. And then, when I'm looking for a particular value, it only has to compare 4 bytes (which is mostly likely a single hardware instruction) instead of six bytes, so the queries should run a bit faster. I suggest you don't do this. While it might save a little space
and make some pieces of your application a little faster, the bottom line is that a ZIP code is not a number, but a string, and you'll likely regret any decision to store it as something it isn't. For example, when you need to deal with ZIP+4s some day, you won't enjoy tracking the problems down. select 32837 + '-3420' select 32837 + '-' + '3420' You'll also forget the leading 0 somewhere, and as a result, mail you send to me will go to Texas or the dead letter bin. Steve Kass Drew University Madison, NJ 07940 Show quote "Roger Garrett" <RogerGarr***@discussions.microsoft.com> wrote in message news:19CF2F1A-54E9-4DB4-A51E-F0CA427B7E9F@microsoft.com... > Jens, > > I don't see that leading zeros would cause any problem. I'm suggesting > that > I change the Zipcodes column from char 5 to int and to just store the > numerical integer values of the zipcodes rather than the 5-character > string. > A zipcode of "03036" would become a 3036 in the column, and when I'm > searching for an "03036" it would match with the 3036 value. Of course, I > would specify numeric values (e.g. 03036) rathet than quoted strings > ('03036'). > > I'm assuming that a 5-char column occupies at least 6 bytes (to make it on > an even byte boundary) and that an int occupies 4 bytes. That at a minium > saves some space in the database. And then, when I'm looking for a > particular > value, it only has to compare 4 bytes (which is mostly likely a single > hardware instruction) instead of six bytes, so the queries should run a > bit > faster. > For the performace reason: Try it out, once you converted this into
have your data as a char. (which is like I said and meanwhile also Steve pointed out, preferable, because you don´t have to deal with later problems around this. We has ourselves in Germany a change from 4digit numbers to 5 digits with a trailing zero. I can tell you, that was for many software vendors like the Y2k problem). These is my opinion, my personal experience and advice for you. HTH, Jens Suessmeyer. Relations. The performance of this query would benefit from using a join
instead of using the IN operator. How do the values get into the query? ML --- http://milambda.blogspot.com/ ML,
My application program constructs the query. What's happenning here is this: I have a Zipcodes table. That table has two columns, a MemberID column and a Zipcode column. Each member has one entry in this table, signifying the zipcode of where he lives. At certain times during the running of my application it needs to know the MemebrIDs of all the members that live within a certain radius of a given member. My application program figures out which zipcodes are within that radius and constructs an array of strings signifying that set of zipcodes. It then constructs a query, using that array of strings, in order to get from the database the set of MemberIDs, from the Zipcodes table, of those members who reside in any of those zipcodes. The query looks something like: SELECT ZipcodesTable.MemberID FROM ZipcodesTable.Zipcode IN ( '01234','03631','55902' ... lots of zipcodes here ... '03036' ) As far as I can tell that means that SQL Server has to compare each and every zipocde in the Zipcodes table with (possibly all of) the zipcodes within the IN clause of the query. In fact, for MOST of the rows in the Zipcodes table it will have to do the comparison against ALL of the zipcodes in the IN clause, since most of the members will not be within ANY of those zipcodes. Now, if SQL Server were smart it might order those zipcodes from the IN clause and determine the smallest and largest zipcode values and thereby do a much quicker comparison at each row. But I don't know that I can rely on SQL Server being that smart. So I'm looking for a better way to express the query so that it runs as fast as possible. Have a look at these:
http://www.sommarskog.se/arrays-in-sql.html Arrays & Lists http://www.aspfaq.com/show.asp?id=2248 Arrays & Lists http://www.sommarskog.se/dynamic_sql.html Dynamic SQL http://vyaskn.tripod.com/passing_arrays_to_stored_procedures.htm Arrays & SP's -- Show quoteAndrew J. Kelly SQL MVP "Roger Garrett" <RogerGarr***@discussions.microsoft.com> wrote in message news:6191E649-0F16-4F91-A944-25AA34F8C831@microsoft.com... > ML, > > My application program constructs the query. > > What's happenning here is this: I have a Zipcodes table. That table has > two > columns, a MemberID column and a Zipcode column. Each member has one entry > in > this table, signifying the zipcode of where he lives. > At certain times during the running of my application it needs to know the > MemebrIDs of all the members that live within a certain radius of a given > member. My application program figures out which zipcodes are within that > radius and constructs an array of strings signifying that set of zipcodes. > It > then constructs a query, using that array of strings, in order to get from > the database the set of MemberIDs, from the Zipcodes table, of those > members > who reside in any of those zipcodes. The query looks something like: > > SELECT ZipcodesTable.MemberID > FROM ZipcodesTable.Zipcode IN > ( > '01234','03631','55902' ... lots of zipcodes here ... '03036' > ) > > As far as I can tell that means that SQL Server has to compare each and > every zipocde in the Zipcodes table with (possibly all of) the zipcodes > within the IN clause of the query. In fact, for MOST of the rows in the > Zipcodes table it will have to do the comparison against ALL of the > zipcodes > in the IN clause, since most of the members will not be within ANY of > those > zipcodes. > > Now, if SQL Server were smart it might order those zipcodes from the IN > clause and determine the smallest and largest zipcode values and thereby > do a > much quicker comparison at each row. But I don't know that I can rely on > SQL > Server being that smart. > > So I'm looking for a better way to express the query so that it runs as > fast > as possible. > Have you tried placing the zipcodes in a table and using EXISTS instead of
IN? -- Show quoteAndrew J. Kelly SQL MVP "Roger Garrett" <RogerGarr***@discussions.microsoft.com> wrote in message news:5C55A707-6E72-4CDF-BFC0-7320AA8CDA12@microsoft.com... >I have a fairly straightforward SELECT query that includes the following: > > MembersTable.MemberID IN > ( > SELECT ZipcodesTable.MemberID > FROM ZipcodesTable.Zipcode IN > ( > '01234','03631','55902' ... '03036' > ) > > That is, it's looking for entries in the ZipcodeTable where the Zipcode > value is any one of a very large set of zipcodes, up to 500 Zipcodes. My > Zipcode field is a 5 character field. > > Would the query run faster if I made the Zipcode field an int instead of 5 > chars? > > Are there other ways to speed up the zipcode-matching part of my query? > > - Roger Garrett > Andrew,
I wasn't familiar with the EXISTS operator (I'm very new to all this) so I just now read up on it. I don't see how EXISTS will help. How are you suggesting that the zipcodes be put in a table? Do you mean the set of zipcodes that I'm looking for for the specific current query? WHat would the EXISTS query look like? Please see my reply to Jens for a (hopefully) clearer description of what I'm trying to accomplish. Show quote "Andrew J. Kelly" wrote: > Have you tried placing the zipcodes in a table and using EXISTS instead of > IN? > > -- > Andrew J. Kelly SQL MVP > > > "Roger Garrett" <RogerGarr***@discussions.microsoft.com> wrote in message > news:5C55A707-6E72-4CDF-BFC0-7320AA8CDA12@microsoft.com... > >I have a fairly straightforward SELECT query that includes the following: > > > > MembersTable.MemberID IN > > ( > > SELECT ZipcodesTable.MemberID > > FROM ZipcodesTable.Zipcode IN > > ( > > '01234','03631','55902' ... '03036' > > ) > > > > That is, it's looking for entries in the ZipcodeTable where the Zipcode > > value is any one of a very large set of zipcodes, up to 500 Zipcodes. My > > Zipcode field is a 5 character field. > > > > Would the query run faster if I made the Zipcode field an int instead of 5 > > chars? > > > > Are there other ways to speed up the zipcode-matching part of my query? > > > > - Roger Garrett > > > > > >> Would the query run faster if I made the Zipcode field [sic] an INTEGER instead of CHAR(5)? << ZIP codes are CHAR(5) and not INTEGER. Columns are not fields.>> Are there other ways to speed up the zipcode-matching part of my query? << For a large number of zip codes, you might find using a table insteadof a list is faster. It would have an index on its single column. SELECT member_id FROM Membership WHERE zip_code IN (SELECT zip_code FROM ZipLists); CELKO,
Are you suggesting that, prior to performing the query, I create a table, fill it up with the set of zipcodes that I'm looking for, and perform the query (and presumably then get rid of the table of zipcodes)? Each time that I do the query it is almost guaranteed that I will be looking for a different set of zipcodes. Is it really more efficient to create a new table each time? - Roger Show quote "--CELKO--" wrote: > >> Would the query run faster if I made the Zipcode field [sic] an INTEGER instead of CHAR(5)? << > > ZIP codes are CHAR(5) and not INTEGER. Columns are not fields. > > >> Are there other ways to speed up the zipcode-matching part of my query? << > > For a large number of zip codes, you might find using a table instead > of a list is faster. It would have an index on its single column. > > SELECT member_id > FROM Membership > WHERE zip_code > IN (SELECT zip_code FROM ZipLists); > > >> Are you suggesting that, prior to performing the query, I create a table, fill it up with the set of zipcodes that I'm looking for, andperform the query (and presumably then get rid of the table of zipcodes)? << Why would you want to get rid of that table? Just do inserts, updates and deletes on it. That means other procedures for managing it. Writing a new query with a new zip code list in it will cost you compilation time. If you have standard zip code lists for mailings, then the table is a handy thing: CREATE TABLE Mailings (mailing_id INTEGER NOT NULL, zip_code CHAR(5) NOT NULL, PRIMARY KEY (mailing_id, zip_code)); Now mailing_id becomes a parameter to the original procedure. You are using procedures and not writing in-line code, aren't you? The question as to which is faster and if it is worth it depends on the SQL product and the release and the size of the list of zip codes. For example, Ingres optimizes IN() and EXISTS() the same way. DB2 builds a hash table if the IN() list is over a certain size and a scratch table if it is bigger than that. Sybase and others automatically link PK-FK references in a single index structure. Gross generalization: joining to another table is better as the list gets longer. A small table without a key can be read into main storage and will act like a list of constants. A larger table can be indexed or hashed to speed up searching. --CELKO-- Please post DDL in a human-readable format and not a machine-generated one. This way people do not have to guess what the keys, constraints, DRI, datatypes, etc. in your schema are. Sample data is also a good idea, along with clear specifications. *** Sent via Developersdex http://www.developersdex.com *** > The question as to which is faster and if it is worth it depends on the Read the name of the group!> SQL product and the release and the size of the list of zip codes. For It would help if you kept it to Microsoft SQL Server. Show quote "--CELKO--" <remove.jcelko***@earthlink.net> wrote in message news:e$Jco7gGGHA.1032@TK2MSFTNGP11.phx.gbl... >>> Are you suggesting that, prior to performing the query, I create a > table, fill it up with the set of zipcodes that I'm looking for, and > perform the query (and presumably then get rid of the table of > zipcodes)? << > > Why would you want to get rid of that table? Just do inserts, updates > and deletes on it. That means other procedures for managing it. Writing > a new query with a new zip code list in it will cost you compilation > time. > > If you have standard zip code lists for mailings, then the table is a > handy thing: > > CREATE TABLE Mailings > (mailing_id INTEGER NOT NULL, > zip_code CHAR(5) NOT NULL, > PRIMARY KEY (mailing_id, zip_code)); > > Now mailing_id becomes a parameter to the original procedure. You are > using procedures and not writing in-line code, aren't you? > > The question as to which is faster and if it is worth it depends on the > SQL product and the release and the size of the list of zip codes. For > example, Ingres optimizes IN() and EXISTS() the same way. DB2 builds a > hash table if the IN() list is over a certain size and a scratch table > if it is bigger than that. Sybase and others automatically link PK-FK > references in a single index structure. > > Gross generalization: joining to another table is better as the list > gets longer. A small table without a key can be read into main storage > and will act like a list of constants. A larger table can be indexed or > hashed to speed up searching. > > > > --CELKO-- > Please post DDL in a human-readable format and not a machine-generated > one. This way people do not have to guess what the keys, constraints, > DRI, datatypes, etc. in your schema are. Sample data is also a good > idea, along with clear specifications. > > > *** Sent via Developersdex http://www.developersdex.com ***
Show quote
"--CELKO--" wrote:
> >> Are you suggesting that, prior to performing the query, I create a > table, fill it up with the set of zipcodes that I'm looking for, and > perform the query (and presumably then get rid of the table of > zipcodes)? << > > Why would you want to get rid of that table? Just do inserts, updates > and deletes on it. That means other procedures for managing it. Writing > a new query with a new zip code list in it will cost you compilation > time. > > If you have standard zip code lists for mailings, then the table is a > handy thing: > > CREATE TABLE Mailings > (mailing_id INTEGER NOT NULL, > zip_code CHAR(5) NOT NULL, > PRIMARY KEY (mailing_id, zip_code)); > > Now mailing_id becomes a parameter to the original procedure. You are > using procedures and not writing in-line code, aren't you? > > The question as to which is faster and if it is worth it depends on the > SQL product and the release and the size of the list of zip codes. For > example, Ingres optimizes IN() and EXISTS() the same way. DB2 builds a > hash table if the IN() list is over a certain size and a scratch table > if it is bigger than that. Sybase and others automatically link PK-FK > references in a single index structure. > > Gross generalization: joining to another table is better as the list > gets longer. A small table without a key can be read into main storage > and will act like a list of constants. A larger table can be indexed or > hashed to speed up searching. > > > > --CELKO-- > Please post DDL in a human-readable format and not a machine-generated > one. This way people do not have to guess what the keys, constraints, > DRI, datatypes, etc. in your schema are. Sample data is also a good > idea, along with clear specifications. > > > *** Sent via Developersdex http://www.developersdex.com *** > CELKO,
My application is one in which I have millions of Members. Each member lives in some zipocde. My query is intended to find the other members who live within a specific radius of a given member. In order to perform the query I therefore must create a list of zipcodes within that radius and use that list within the query. Each such query, being as it is for a different member, will have a different list of zipcodes. That means I have to build the query (at least the zipcode portion of it) each time I do the query. I don't see how I could put that zipcode info into a table, seeing as it is different for each query. Or would it be reasonable to have a (temporary) zipcode table that I fill up prior to my query, with the specific zipocdes that I'm looking for, and then reference that table withinthe query, rather than having the list of zipcodes included explicitly within the query? (and then deleting that zipocde table after the query). Roger Garrett >> Each member lives in some zipocde. My query is intended to find the other members who live within a specific radius of a given member.<< Ask the USPS about Zone tables. This is a list the ZIP code bydistance from a given ZIP code. It is updates, free and close enough for practical work. NO, I am suggesting that you go to the USPS can get the Zone tables
tapes of CDs on s subscription basis. Never create data that you can buy from a trusted source. Hi Roger,
Make sure you have an index ZipCode, MemberID on the ZipcodesTable. Also, use EXISTS instead of IN. AND EXISTS ( SELECT * FROM ZipcodesTable zt WHERE zt.Zipcode IN ( ........ ) AND zt.MemberID = MembersTable.MemberID ) Even better if you could put the IN clause into a table of its own... AND EXISTS ( SELECT * FROM #ZCodes zt WHERE zt.MemberID = MembersTable.MemberID ) Show quote "Roger Garrett" <RogerGarr***@discussions.microsoft.com> wrote in message news:5C55A707-6E72-4CDF-BFC0-7320AA8CDA12@microsoft.com... >I have a fairly straightforward SELECT query that includes the following: > > MembersTable.MemberID IN > ( > SELECT ZipcodesTable.MemberID > FROM ZipcodesTable.Zipcode IN > ( > '01234','03631','55902' ... '03036' > ) > > That is, it's looking for entries in the ZipcodeTable where the Zipcode > value is any one of a very large set of zipcodes, up to 500 Zipcodes. My > Zipcode field is a 5 character field. > > Would the query run faster if I made the Zipcode field an int instead of 5 > chars? > > Are there other ways to speed up the zipcode-matching part of my query? > > - Roger Garrett > Just curious...
How long does it take your code to run with the long list of zip codes? Is MembersTable already indexed on Zipcode? Lastly, at the risk of making things much more complicated that they need to be.... How are you determining which zip codes are within a certain area of each other? I am guessing that you have some sort of coordinates attached to each zip code in order to determine its relative location to another zip code? Point being that somewhere you have logic which looks up a list of zip codes. You may very well be able to use that same logic to join to your MembersTable, rather than generating a list then using that list as an in clause. I am just guessing at how you are doing this, and tossing out an idea, so forgive me if I am way off base. Show quote "Roger Garrett" <RogerGarr***@discussions.microsoft.com> wrote in message news:5C55A707-6E72-4CDF-BFC0-7320AA8CDA12@microsoft.com... > I have a fairly straightforward SELECT query that includes the following: > > MembersTable.MemberID IN > ( > SELECT ZipcodesTable.MemberID > FROM ZipcodesTable.Zipcode IN > ( > '01234','03631','55902' ... '03036' > ) > > That is, it's looking for entries in the ZipcodeTable where the Zipcode > value is any one of a very large set of zipcodes, up to 500 Zipcodes. My > Zipcode field is a 5 character field. > > Would the query run faster if I made the Zipcode field an int instead of 5 > chars? > > Are there other ways to speed up the zipcode-matching part of my query? > > - Roger Garrett > |
|||||||||||||||||||||||