|
|
Subject:
Performance Tuning of Three Tier Applications
Category: Computers > Programming Asked by: david617-ga List Price: $5.00 |
Posted:
09 Oct 2002 00:03 PDT
Expires: 07 Nov 2002 23:03 PST Question ID: 74341 |
How I optimize an three-tier (Internet Explorer/Java/Any Commercial Database) application for broad retrievals from the database (e.g., select * from table a where var1=x, var2=y, etc.)? |
|
Subject:
Re: Performance Tuning of Three Tier Applications
Answered By: mathtalk-ga on 22 Oct 2002 17:41 PDT Rated: |
Hi, David: About six-seven years ago I was heavily involved in "rolling our own" middleware in C++, a platform that has had a pretty long shelf life. We pooled "worker threads" as well as database connections (DBLIB pointers to a separate SQL Server box), and loaded DLLs dynamically in response to "action" requests. One of the "lessons learned" is worth mentioning. We used a "code generator" (homegrown) that consulted the database to get the definition of a table and construct a C++ class that would instantiate a set of rows from that table (per a where clause property added to a select) and allow updates, deletes. There were two important advantages to this. One is not so relevant to your quest for performance, but still remarkable. Since the data access classes were automated in their creation, we could be pretty efficient in making database changes and checking that middleware code stayed in sync because the C++ syntax changes (if any) were essentially caught at compile/link time. The other aspect was that by using a "lazy evaluation" scheme in the classes, we avoided populating a lot data rows that were not needed for a particular request. I.e. if a request involved little more than looking up an employee's home address, we didn't waste energy pulling the salary history for that request. In a less formal way we came up with what was still an efficient "cookie cutter" approach to supporting data browsing/entry/update screens, by creating parameterized SQL queries and/or stored procedures that effectively dealt with all the special "foreign key" lookups, etc. In later 3-tier designs we even began to "automate" the SQL generation for this sort of stuff. I know that the database connection type makes a big difference. DBLIB was very low level and high performance, but a real pain to code directly and of course tied to MS SQL Server. ODBC was more "universal" (perhaps the closest of any data access technology to universality) but suffered in performance comparatively. We found that a good bit of the performance was recovered with ADO using the OLE DB Provider for SQL Server. Hopefully I've given you some food for thought. I could go on about where we've evolved (XML), but let me stop and give you a chance to ask for clarification. regards, mathtalk-ga | |
| |
|
david617-ga
rated this answer:
Through a series of well-written inquiries, mathtalk was able to create a detailed response to a question which was initially proposed in a very shallow fashion by me. His answers were thorough and he had expertise in all of the aspects needed to understand what is actually a quite complex problem. Based on his advice to date, I would feel confident in hiring him to perform actual services in the complex n-tier and data mining field, as he seems to have broad experience in the area. I feel that I have definitely gotten a tremendous return for the money invested in this question. I only hope that more talented developers and architects use this service to make it even better. Regards, David S. Lawrence President, MVF Associates, Inc. DavidL@MVF.com |
|
Subject:
Re: Performance Tuning of Three Tier Applications
From: mathtalk-ga on 22 Oct 2002 11:57 PDT |
Hi, david617-ga: This is a pretty broad topic (but close to my heart!), so feel free to break in and curtail my going off on tangents. The question of database retrieval performance is often crucial to three-tier applications. The time spent in calculations/representations/transformations of the retrieved data on the middle-tier (here, server-side Java components presumably) is usually much less than the time needed to retrieve the data from the database. So there are two sides of that process which deserve tweaking. One is the time spent creating the database connection, and the general solution here is using a "pool" of database connections. If you are using ADO, Microsoft has tried to automate this pooling for you with COM+/MTS/IIS technology. Your example, however, perhaps indicates a focus on the other side of the coin, which is how to optimize a SELECT query once the database connection is in place. Although your example is for extracting records from a single table, one will ordinarily need a JOIN query to combine data from multiple tables. This is especially true if the database is a highly normalized relational design. The single most important factor in efficient query design is the presence of useful indexes on the underlying table(s). In your single table example, you apparently have values for fields x and y (or were var1,var2 the column names?). So it would be good to have indexes on one or on the combination of these two columns, in order to optimize that particular query. But your question is probably: how to optimize for "broad retrievals", meaning some ad hoc generated queries from the front-end. The best advice I can give is to start with a highly normalized database design. Yes, specific queries can always benefit from a degree of denormalization, but a competent DBMS like Oracle or MS SQL Server will perform well using appropriately factored (normalized) tables that have the right primary keys (with unique indexes). Given that starting point, there are several "tweaks" that can be considered. Some are platform specific, like using clustered indexes on SQL Server. Some are platform independent (standard SQL), like using VIEWs and secondary indexes to help out with specifc queries. And some techniques, like using stored procedures, are generally applicable (for optimizing specific or parameterized queries) but are more useful/easier to implement depending on platform. If you want a deeper comment or answer, the DBMS platform and the method of database connectivity (Java/ODBC?) need to be specified. regards, mathtalk-ga |
Subject:
Re: Performance Tuning of Three Tier Applications
From: david617-ga on 22 Oct 2002 14:28 PDT |
mathtalk, This is a real problem from a real (potential) client. The situation is that they have rolled their own middleware from C++ and Perl which basically has some (though not all) of the characteristics of .NET or Java. Since they own the code base, they could extend it as needed. The database is SQL Server, but I don't think it's as important an issue, since I am really trying to look at middleware options. I was kind of putting this question out as a straw man to test whether I could get good answers on Google (Hey, 5 bucks can't hurt, you know?) You answer covered many aspects that are certainly important. In addition to the direction that you went in your analysis, I would like to hear about creating persistent, cached objects (such as entity beans) in the middleware to speed up performance. What are the pros and cons of such an approach? Of course, without the specifics of the application, its data structure and its sizing at hand, a specific recommendation may be difficult. Do you have any ideas as to the pros and cons of using middleware objects to increase large-scale data retrievals in such circumstances? Any other ideas that would not fit in the context of straight database performance tuning? My client indicated that, no matter what they tried on the server-side, it seemed to be constrained in terms of what they could pull across the network. Connection pooling is obviously a good idea, but I am looking to examine possible middleware alternatives to accelerate data retrieval, given that the database seems to be peaked out. Any thoughts? David |
Subject:
Re: Performance Tuning of Three Tier Applications
From: mathtalk-ga on 23 Oct 2002 07:47 PDT |
This may be more of an outline of questions, instead of suggestions. I hear you saying the client thinks the database is "peaked out". But I would examine that bottleneck carefully. How many processors (CPUs) is it using? How much memory is allocated to the database? It's a cliche, of course, but hardware is cheap, and when I think of the performance we used to milk out of 2 and 3 CPU boxes (at 200MHz speeds!) using much less than a gigabyte of memory, I just have to believe that a good database server these days could chop and slice a few million-record tables with comparative ease. But alright, let's say the database can't be touched for reasons having nothing to do with this project. It's still a limiting factor, unless you want to extract all the data and put it into a new database server! [This isn't as crazy as it sounds, btw, one track of performance enhancement for distributed applications is "database replication".] The next option is to use the highest performance database connections you can. My guess is that they are using ODBC connections with Java, and this is killing their performance. Because ODBC is such a generic data connection, it cannot handle the case of returning very large datasets elegantly. If I remember correctly, if the database connection does not allow all the records to be returned at once, ODBC depends on polling of the database connection to determine whether more records are available. So the question is, what database connections are they using? and is there a better alternative? Finally, I wonder what timing profiles are available or could be done to assess the actual contributions of database retrieval versus calculation time. It is inevitable that statistical calculations being done now with Java and/or Perl could be done more efficiently with C++. My only doubt would be whether this really is taking up much time now. Rewriting the calculations is an expensive task, generally, if not because of the actual coding time, then for the proper regression testing to accept it into production. But if we accept that the database is "peaked out", your focus will naturally have to turn to middleware issues. All the above would point to recommending initial development of a test harness for the client's application. It should be able to measure both timing of responses and log the responses, e.g. into an Access database, for regression analysis. Building this test harness would give you a chance to gain greater familiarity with the client's situation, if there is any uncertainty there. And with more information you might well be able to provide a sharper time and cost estimate for actual performance enhancements. regards, mathtalk-ga |
If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you. |
Search Google Answers for |
Google Home - Answers FAQ - Terms of Service - Privacy Policy |