|
database
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
many small queries vs one large queryabout 5000 objects from the database using MSSQL and DotNet 2.0. My boss thinks that the recommendation of keeping the connections open for as short a time as possible means that one shoule open a connection, fetch one row, close the connection, open another connection, fetch one row and close the connection and so on. I belive that the extra overhead of opening and closing connections all the time, of generating execution plans for each query, and running the queries many times gives his solution a large performance hit. My recommendation will be to fetch all the rows needed in one large SQL query. I've done several other projects where I have proven this to be true, there is indeed a large performance hit from generating many small queries instead of one large. But my boss just says "no" and disagrees. Can you give me some good arguments and/or point me to some best practice documents that describe this so I can convince my boss he's wrong? Kind Regards, Allan Ebdrup FUT: microsoft.public.sqlserver.programming
Show quote
"Allan Ebdrup" <ebdrup@noemail.noemail> wrote in message Don't argue with your boss. Implement it both ways and show him.news:%23kIhprXWGHA.3332@TK2MSFTNGP02.phx.gbl... >I just had a discussion with my boss, we are running a query that fetches >about 5000 objects from the database using MSSQL and DotNet 2.0. > My boss thinks that the recommendation of keeping the connections open for > as short a time as possible means that one shoule open a connection, fetch > one row, close the connection, open another connection, fetch one row and > close the connection and so on. > I belive that the extra overhead of opening and closing connections all > the time, of generating execution plans for each query, and running the > queries many times gives his solution a large performance hit. My > recommendation will be to fetch all the rows needed in one large SQL > query. > I've done several other projects where I have proven this to be true, > there is indeed a large performance hit from generating many small queries > instead of one large. But my boss just says "no" and disagrees. > > Can you give me some good arguments and/or point me to some best practice > documents that describe this so I can convince my boss he's wrong? > David > Don't argue with your boss. Implement it both ways and show him. Hi DavidUnfortunately it's him who's doing the implementing. SQL Server is quite capable of handling long connections, and as long as
they're actually doing something they make perfect sense. Many short lasting connections may seem to put less stress on the server, but in fact the overhead of many connections versus a single connection, when calls to the server constitute a single business operation, must be multiplied with the number of connections to give a true estimate of the actual (and expected) stress on the server. Will puts it very simply (and very true): 5000 connections where each lasts 50ms might kick the server pretty hard. 2nd grade mathematics. ML --- http://milambda.blogspot.com/ ML (M*@discussions.microsoft.com) writes:
> SQL Server is quite capable of handling long connections, and as long as Yes, if you design a Windows-based application, it makes perfect sense> they're actually doing something they make perfect sense. to have one global conection that stays connected as long as the application is running. At least as long as the number of users is moderate, and most users are active most of the time. Nevertheless, disconnected mode has become the prime model for client programming these days. I guess there are two major reasons for this: 1) For a web-based app, it's not a good idea to have connection per web user, as there could be a lot of user. At the same time, all connections are from the same process: the web server. So pooling seems like a very good idea. 2) The programming becomes easier to handle when you have to drag a global object around, but each function that needs to get or update some data, can connect to perform its operation and so, without interferring with others. For instance, assume that when you run loop with ExecuteReader and in the middle of the loop you call some some other function that needs to access the database. With a global object you run into conflict. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx As I've mentioned im my other post, for parallel tasks one connection may not
be enough, but for consecutive tasks many conections are wasteful. ML --- http://milambda.blogspot.com/ I can guarantee, having had to optimise an application that used your
bosses tactics, it will destroy your app. it's not just the opening and closing of connections, the whole act of retrieving data adds a significant overhead, you see if you were getting a load of records based on record id, you'd be making about 5000 separate scans of a table based on it's id column, each one getting one record. This means that when your index get's fragmented and takes 1 second more, your operation takes 5000 seconds more. If you were making one large statement, your index getting fragmented adds 1 second on to the whole operation. I can't stress this enough - your boss does not know SQL server very well Cheers Will "Allan Ebdrup" <ebdrup@noemail.noemail> wrote in I would have submitted one large bulk query as well.news:#kIhprXWGHA.3332@TK2MSFTNGP02.phx.gbl: > I've done several other projects where I have proven this to be true, > there is indeed a large performance hit from generating many small > queries instead of one large. But my boss just says "no" and disagrees Perhaps multiple small queries would reduce the time a table is locked? Any other solutions? Maybe open a connection - execute a long running SP... and fetch the results after the fact? My post just now seems to have got lost (wonder where it went).
To summarise what I said: your boss is making a massive mistake. I've seen this tactic destroy a large application. it's fine at first, your queries all run nearly instantaneously, and you'll load test it later, no need to worry about that right now. then a table gets big. suddenly the instant query takes 50 milliseconds to get the data. not a problem if you're only doing one select. if you're doing 5000 separate ones it takes 5000x50 milliseconds. IF your query is executing 5000 times against the same tables and retrieving
the same columns with selection criteria against the same columns, then running a single query to retreive all 5000 rows would be faster, without question. i.e. select employee_id, employee_name from employees Will have a connection open for much less time than cycling through each employee with: select employee_id, employee_name from employee where employee_id = 'SOMEID' The first one will also use less CPU on the database, less IO, less network traffic, etc. You can play with the code to control how you open the dataset once you have retrieved it, but the first SQL will almost always be faster. The only exception I can think of is if your client does not have enough memory to handle the entire dataset. If you have more complicated logic, and you are running against all different tables, and combining this into one query means extensive joins and business logic in the where clause, then you need to test each approach and see what the performance gain is with the view. Opening and closing connections for each communication wiht the database is usually most efficient in my experience. Connection pooling manages the resources really well. The connections do not really get opened and closed, rather they stay open and available, but the client is able to free up some of the resources associated with them between each call. This assumes that you actually need to execute multiple queries to begin with. Making 5000 seperate calls to the database will use much more network resources than making a single call to the database to retreive the same data. That part is a no brainer. Dependign on what you have for SQL, it will not necessarily need to generate a plan every time (one would hope you are using stored procedures with parameters and the plans are cached). However, there would be some small overhead associated with running each command. Show quote "Allan Ebdrup" <ebdrup@noemail.noemail> wrote in message news:%23kIhprXWGHA.3332@TK2MSFTNGP02.phx.gbl... > I just had a discussion with my boss, we are running a query that fetches > about 5000 objects from the database using MSSQL and DotNet 2.0. > My boss thinks that the recommendation of keeping the connections open for > as short a time as possible means that one shoule open a connection, fetch > one row, close the connection, open another connection, fetch one row and > close the connection and so on. > I belive that the extra overhead of opening and closing connections all the > time, of generating execution plans for each query, and running the queries > many times gives his solution a large performance hit. My recommendation > will be to fetch all the rows needed in one large SQL query. > I've done several other projects where I have proven this to be true, there > is indeed a large performance hit from generating many small queries instead > of one large. But my boss just says "no" and disagrees. > > Can you give me some good arguments and/or point me to some best practice > documents that describe this so I can convince my boss he's wrong? > > Kind Regards, > Allan Ebdrup > > FUT: microsoft.public.sqlserver.programming > > If the calls to the database are made against isolated objects (i.e. against
different tables or to procedures that target different tables), then individual connections may benefit from being initiated asynchronously. But it would make more sense in issuing a single call to related objects for a single business operation. At least IMHO. ML --- http://milambda.blogspot.com/ All posts made a good point and I agree that your boss is wrong. It's a no
wonder he is a not a SQL Programmer but he is your boss. He signs your paycheck so you may not have a choice. I often had to bow to what my boss wants, hate it and it sucks but what can I do. -- Show quoteGrant Who gives a {censored} if I am wrong. "Allan Ebdrup" <ebdrup@noemail.noemail> wrote in message news:%23kIhprXWGHA.3332@TK2MSFTNGP02.phx.gbl... >I just had a discussion with my boss, we are running a query that fetches >about 5000 objects from the database using MSSQL and DotNet 2.0. > My boss thinks that the recommendation of keeping the connections open for > as short a time as possible means that one shoule open a connection, fetch > one row, close the connection, open another connection, fetch one row and > close the connection and so on. > I belive that the extra overhead of opening and closing connections all > the time, of generating execution plans for each query, and running the > queries many times gives his solution a large performance hit. My > recommendation will be to fetch all the rows needed in one large SQL > query. > I've done several other projects where I have proven this to be true, > there is indeed a large performance hit from generating many small queries > instead of one large. But my boss just says "no" and disagrees. > > Can you give me some good arguments and/or point me to some best practice > documents that describe this so I can convince my boss he's wrong? > > Kind Regards, > Allan Ebdrup > > FUT: microsoft.public.sqlserver.programming > I forget to add this, leave a note so that way the future sql programming
will not blame you for that bad sql programming. I have plenty of notes if only my boss reads my notes LOL. -- Show quoteGrant Who gives a {censored} if I am wrong. "Grant" <em***@nowhere.com> wrote in message news:%23wgVuOZWGHA.4452@TK2MSFTNGP04.phx.gbl... > All posts made a good point and I agree that your boss is wrong. It's a no > wonder he is a not a SQL Programmer but he is your boss. He signs your > paycheck so you may not have a choice. I often had to bow to what my boss > wants, hate it and it sucks but what can I do. > > > -- > Grant > > Who gives a {censored} if I am wrong. > "Allan Ebdrup" <ebdrup@noemail.noemail> wrote in message > news:%23kIhprXWGHA.3332@TK2MSFTNGP02.phx.gbl... >>I just had a discussion with my boss, we are running a query that fetches >>about 5000 objects from the database using MSSQL and DotNet 2.0. >> My boss thinks that the recommendation of keeping the connections open >> for as short a time as possible means that one shoule open a connection, >> fetch one row, close the connection, open another connection, fetch one >> row and close the connection and so on. >> I belive that the extra overhead of opening and closing connections all >> the time, of generating execution plans for each query, and running the >> queries many times gives his solution a large performance hit. My >> recommendation will be to fetch all the rows needed in one large SQL >> query. >> I've done several other projects where I have proven this to be true, >> there is indeed a large performance hit from generating many small >> queries instead of one large. But my boss just says "no" and disagrees. >> >> Can you give me some good arguments and/or point me to some best practice >> documents that describe this so I can convince my boss he's wrong? >> >> Kind Regards, >> Allan Ebdrup >> >> FUT: microsoft.public.sqlserver.programming >> > > If responsiveness of your application is important, you might consider to
fetch a certain amount of objects per fetch (say 100 objects, depending on the size of your objects), instead of fetching 5000 objects together. If you are going to get 5000 objects at once, while responsiveness is not a problem, you should fetch all rows using one query. -- Show quoteMartin C K Poon Senior Analyst Programmer ==================================== "Allan Ebdrup" <ebdrup@noemail.noemail> ¦b¶l¥ó news:%23kIhprXWGHA.3332@TK2MSFTNGP02.phx.gbl ¤¤¼¶¼g... > I just had a discussion with my boss, we are running a query that fetches > about 5000 objects from the database using MSSQL and DotNet 2.0. > My boss thinks that the recommendation of keeping the connections open for > as short a time as possible means that one shoule open a connection, fetch > one row, close the connection, open another connection, fetch one row and > close the connection and so on. > I belive that the extra overhead of opening and closing connections all the > time, of generating execution plans for each query, and running the queries > many times gives his solution a large performance hit. My recommendation > will be to fetch all the rows needed in one large SQL query. > I've done several other projects where I have proven this to be true, there > is indeed a large performance hit from generating many small queries instead > of one large. But my boss just says "no" and disagrees. > > Can you give me some good arguments and/or point me to some best practice > documents that describe this so I can convince my boss he's wrong? > > Kind Regards, > Allan Ebdrup > > FUT: microsoft.public.sqlserver.programming > > you are right, opening / closing a connection is time consuming.
I have no number for this overhead. A developper try to minimize the number of connections and monitor the connection pool used by the .Net 2.0. With SQL 2005 and the MARS capability, you are able to execute multiple commands using the same connection. maybe your boss misunderstand the difference between retrieving 1 row and releasing a connection as quickly as possible. For good performance you have to release the connection as quickly as possible to allow another user to use it and to release the database resources. also you have to retrieve just what you need, don't read 5000rows if you'll display only 100rows. but if you want to cache your result, retrieving more rows is better. using a good transaction option, you can insure that you'll get all the data from all the objects at a point in time. so you insure a good integrity in your data. But this could cause some lock issues. in another hand, retrieving each row individually insure that the user will have the latest version of the row, but the data integrity is not guarantee. if the employeeID change in table 1, and you don't have read the employee name for this specific ID, you'll have some problems if you try to join the data in your application. But this method could fill your application pool more quickly because you'll ask for a new connection before the previous connection has been correctly re-pooled. And each connection in the pool consume your SQL Server resource (the connection still alive) I Think you can simply create a sample application to test the 2 options and monitor the performance. Show quote "Allan Ebdrup" <ebdrup@noemail.noemail> wrote in message news:%23kIhprXWGHA.3332@TK2MSFTNGP02.phx.gbl... >I just had a discussion with my boss, we are running a query that fetches >about 5000 objects from the database using MSSQL and DotNet 2.0. > My boss thinks that the recommendation of keeping the connections open for > as short a time as possible means that one shoule open a connection, fetch > one row, close the connection, open another connection, fetch one row and > close the connection and so on. > I belive that the extra overhead of opening and closing connections all > the time, of generating execution plans for each query, and running the > queries many times gives his solution a large performance hit. My > recommendation will be to fetch all the rows needed in one large SQL > query. > I've done several other projects where I have proven this to be true, > there is indeed a large performance hit from generating many small queries > instead of one large. But my boss just says "no" and disagrees. > > Can you give me some good arguments and/or point me to some best practice > documents that describe this so I can convince my boss he's wrong? > > Kind Regards, > Allan Ebdrup > > FUT: microsoft.public.sqlserver.programming > Allan Ebdrup (ebdrup@noemail.noemail) writes:
> I just had a discussion with my boss, we are running a query that You conclusions are correct, but some of the arguments are not.> fetches about 5000 objects from the database using MSSQL and DotNet 2.0. > My boss thinks that the recommendation of keeping the connections open > for as short a time as possible means that one shoule open a connection, > fetch one row, close the connection, open another connection, fetch one > row and close the connection and so on. I belive that the extra overhead > of opening and closing connections all the time, of generating execution > plans for each query, and running the queries many times gives his > solution a large performance hit. My recommendation will be to fetch all > the rows needed in one large SQL query. I've done several other projects > where I have proven this to be true, there is indeed a large performance > hit from generating many small queries instead of one large. But my boss > just says "no" and disagrees. There is indeed an overhead to open and a close a physical connection. For this reason, ADO .Net maintains a connection pool, so that when you say cn.Close, ADO .Net does not close the physical connection to the server. Instead, it keeps the connection open, and if you request a new connection with the same connection properties within 60 seconds, the connection will be reused. In this way the overhead is minimized. There is still a network roundtrip when you reconnect however, as ADO .Net issues the command sp_reset_connection to make sure that no settings from the previous connection lingers around. The problem with your boss's reasoning is not really that he wants to connect and disconnect. It would be equally bad if he srayed connected and then fetched one row at a time. Of course, if he stays connected, he can use ExecuteReader which will buffer under the hood. When reconnects, he needs to issue a query each time, and all those network roundtrips will be detrimental to the performance. It should be easy to write an sample application that emplooys different models of getting data: 1) The boss method: connect, issue a query, fetch one row, disconnect. 2) Loop over ExecuteRead. 3) Datadatper.Fill. 2 and 3 will probably have very similar performance. 1) will crawl like a snail. If the boss does not change his mind, get a new boss. -- Erland Sommarskog, SQL Server MVP, esq***@sommarskog.se Books Online for SQL Server 2005 at http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx Books Online for SQL Server 2000 at http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx Alan, your question is very vague - which may be leading to the lack of
answers. When you say you are fetching 5000 objects, do you mean records? What is going to be best depends on a lot. In general, yes, continually opening and closing connections is going to incur overhead on both your DB server and your app server. Your concerns about the small queries can be somewhat allayed by the efficient plan reuse from SQL 2000 and 2005. Once a query is run, subsequent queries with similar paramters (same column args and similar cardinality) will use the same compiled plan from the cache. So, that part doesn't disprove your boss. Now, where your argument comes into play is in network traffic. 50 query calls of 100 records each will require 49 more ack/syn TCP handshakes (server and network CPU) and network traversal (network latency)than is necessary - not to mention that you are also sending a simiar query command (as text) to the server 50 times which does take up additional network bandwith. Furthmore, 50 different queries will result in many more sql server RPC's than just one query - which again increases overall CPU. Also, if you are trying to break up 5000 records into smaller symetrical chunks (but in the end you need all 5000), then your queries become more complex. You are now selecting records in a pseudo ordinal format by assigning a numbering sequence - which is likely to increase the complexity of the SQL, require more logic, and require record sorts - which will all increase the CPU. In my experience with web apps, its better to increase the memory on your web/application server by a gig or so and simply return all 5000 records in one fell swoop (as long as all records are required). Let the web server render multiple pages and separate the records into workable chunks for the users. If you are looking for a very active SQL forum, check out http://www.sqlservercentral.com/Forums/Default.aspx Best of luck. Ryan Show quote "Allan Ebdrup" <ebdrup@noemail.noemail> wrote in message news:%23kIhprXWGHA.3332@TK2MSFTNGP02.phx.gbl... > I just had a discussion with my boss, we are running a query that fetches > about 5000 objects from the database using MSSQL and DotNet 2.0. > My boss thinks that the recommendation of keeping the connections open for > as short a time as possible means that one shoule open a connection, fetch > one row, close the connection, open another connection, fetch one row and > close the connection and so on. > I belive that the extra overhead of opening and closing connections all the > time, of generating execution plans for each query, and running the queries > many times gives his solution a large performance hit. My recommendation > will be to fetch all the rows needed in one large SQL query. > I've done several other projects where I have proven this to be true, there > is indeed a large performance hit from generating many small queries instead > of one large. But my boss just says "no" and disagrees. > > Can you give me some good arguments and/or point me to some best practice > documents that describe this so I can convince my boss he's wrong? > > Kind Regards, > Allan Ebdrup > > FUT: microsoft.public.sqlserver.programming > > One point that is perhaps not relevant to SQL, but might be worth
considering - is it necessary to have 5000 objects in a web application. If you're basically turning your database structure into a massive framework, you'll be transferring huge amounts of data (whether done in one big query or lots of small ones), and you'll be storing that in memory on your web server (I would assume in Session), so not only will your web server have to handle the web requests and the presentation layer, you'll also be making it handle a large amount of application logic. Also if you're storing the result of your query in session then you'll be storing that per user, which defeats the scalability benefits of a web application. If you're not persisting these objects throughout the users session, that means you'll be pulling 5000 objects over your network per request! I'm just guessing at the architecture here I admit, but might it be the time to put a bit more logic in the stored procedures, then just return a small result set? ok, that caught me out.
I think everyone was working on the assumption that you were trying to reduce user response times. If it's just a scheduled task that no-one even sees, there's no problem with having a long running sql statement, just remember to set the timeout property on your sqlcommand object. Maybe you should elaborate on this. What exactly is the business requirement?
ML --- http://milambda.blogspot.com/ "ML" <M*@discussions.microsoft.com> wrote in message We're running an agent that matches 5000 XML structures against 50000 news:FAE1AE0D-5CE3-46D5-B220-FEB057783A48@microsoft.com... > Maybe you should elaborate on this. What exactly is the business > requirement? others, we're loading it all into memory and processing there on one big sweep. It's to run in under 3 hours. Much less if possible. "ML" <M*@discussions.microsoft.com> wrote in message We can split them up into large chunks, but running them one at a time is to news:C97F49D4-0F3D-4738-B957-7FDFAF9524C4@microsoft.com... > Do all the objects need to be compared in a single run? Why? slow, running more is faster. What DML is involved here? Only retrieving or is data modified somehow and
written back to the database? Read-only tasks (where no locking is needed) could run in parallel. Parallel execution might also be possible if data needs to be locked (e.g. in order to be modified), if individual processes can be isolated on the database level (i.e. row level locks). ML --- http://milambda.blogspot.com/ "ML" <M*@discussions.microsoft.com> wrote in message Good idea to run the stuff parallel, I'm not sure it's a multi-processor news:4CA62DDA-77A5-4A9A-AEBB-70DC4D0D8235@microsoft.com... > What DML is involved here? Only retrieving or is data modified somehow and > written back to the database? > > Read-only tasks (where no locking is needed) could run in parallel. > > Parallel execution might also be possible if data needs to be locked (e.g. > in order to be modified), if individual processes can be isolated on the > database level (i.e. row level locks). machine it's running on, but I'll check it out. (The sql server is 8-way) Kind Regards, Allan Ebdrup it depends. are you retrieving objects, or rows?
how many bytes is the 5000 objects? is the client capable of receiving the 5000 objects as fast as the server can send them? "Doug" <drmiller***@hotmail.com> wrote in message We're retrieving rows, the objects are of different sizes as there is a XML news:1144814016.810297.282890@z34g2000cwc.googlegroups.com... > it depends. are you retrieving objects, or rows? > how many bytes is the 5000 objects? > is the client capable of receiving the 5000 objects as fast as the > server can send them? Text field in the data that describes a small object model contained in the object, this will not be made into objects by our program, simply modified as XML. I guess the average size of an object is 10-30KB. The client is an application server with a 1Gbs network between the two servers. I think there would likely be a performance degredation of doing 5000
queries instead of one. Not necessarily from opening and closing the connection because.net uses connection pooling,so the actual the connection to the db would likely stay open anyhow and when you close a connection in ..net it just gets returned to the pool, but sql server has to process a query 5000 times instead of once. As well, more dataset objects will get instantiated and the garbage collection will work harder. Put 5000 papers on your boss's desk. Tell him he can run from your desk to his and return 1 paper 5000 times, or he can run once and return all 5000. If he says he won't be working harder by running back and forth 5000 times get him to prove it! If for example, you were needing to query client data and there were 5000 clients applicable, I disagree that 5000 queries would be the right way to go, at least under normal circumstances, maybe there are special cases but I'd think that would be very rare. Sql is designed to deal with SETS of data, a set being 5000 records of data in your case. Lets say you were to go with your boss's idea, how would you determine what the 5000 queries would be anyhow? I mean you need to return some records to the application but which ones? If you wanted all of them, using a standard approach, you'd select * from 1 or more tables. Or maybe you'd add a where clause to return based on a specific criteria. Using your boss's approach, how would you do this? You'd need to know the records you wanted, and how would you know that? Lastly, if you queried 5000 times, internally sql server is going to have to do a bunch of logic, unless you're doing 5000 vastly different queries, there will be a lot of duplication of effort on sql servers' part such as table/index scans, joins, etc. If' I'm understanding your situation correctly I'd say your boss is out to lunch. Show quote "Allan Ebdrup" <ebdrup@noemail.noemail> wrote in message news:#kIhprXWGHA.3332@TK2MSFTNGP02.phx.gbl... > I just had a discussion with my boss, we are running a query that fetches > about 5000 objects from the database using MSSQL and DotNet 2.0. > My boss thinks that the recommendation of keeping the connections open for > as short a time as possible means that one shoule open a connection, fetch > one row, close the connection, open another connection, fetch one row and > close the connection and so on. > I belive that the extra overhead of opening and closing connections all the > time, of generating execution plans for each query, and running the queries > many times gives his solution a large performance hit. My recommendation > will be to fetch all the rows needed in one large SQL query. > I've done several other projects where I have proven this to be true, there > is indeed a large performance hit from generating many small queries instead > of one large. But my boss just says "no" and disagrees. > > Can you give me some good arguments and/or point me to some best practice > documents that describe this so I can convince my boss he's wrong? > > Kind Regards, > Allan Ebdrup > > FUT: microsoft.public.sqlserver.programming > > |
|||||||||||||||||||||||