Google Answers: Performance Tuning of Three Tier Applications

View Question

Q: Performance Tuning of Three Tier Applications ( Answered 5 out of 5 stars

Question

Subject: Performance Tuning of Three Tier Applications
Category: Computers > Programming
Asked by: david617-ga
List Price: $5.00

Posted: 09 Oct 2002 00:03 PDT
Expires: 07 Nov 2002 23:03 PST
Question ID: 74341

How I optimize an three-tier (Internet Explorer/Java/Any Commercial
Database) application for broad retrievals from the
database (e.g., select * from table a where var1=x, var2=y, etc.)?

Answer

Subject: Re: Performance Tuning of Three Tier Applications
Answered By: mathtalk-ga on 22 Oct 2002 17:41 PDT
Rated: 5 out of 5 stars

Hi, David:

About six-seven years ago I was heavily involved in "rolling our own"
middleware in C++, a platform that has had a pretty long shelf life. 
We pooled "worker threads" as well as database connections (DBLIB
pointers to a separate SQL Server box), and loaded DLLs dynamically in
response to "action" requests.

One of the "lessons learned" is worth mentioning.  We used a "code
generator" (homegrown) that consulted the database to get the
definition of a table and construct a C++ class that would instantiate
a set of rows from that table (per a where clause property added to a
select) and allow updates, deletes.

There were two important advantages to this.  One is not so relevant
to your quest for performance, but still remarkable.  Since the data
access classes were automated in their creation, we could be pretty
efficient in making database changes and checking that middleware code
stayed in sync because the C++ syntax changes (if any) were
essentially caught at compile/link time.

The other aspect was that by using a "lazy evaluation" scheme in the
classes, we avoided populating a lot data rows that were not needed
for a particular request.  I.e. if a request involved little more than
looking up an employee's home address, we didn't waste energy pulling
the salary history for that request.

In a less formal way we came up with what was still an efficient
"cookie cutter" approach to supporting data browsing/entry/update
screens, by creating parameterized SQL queries and/or stored
procedures that effectively dealt with all the special "foreign key"
lookups, etc.  In later 3-tier designs we even began to "automate" the
SQL generation for this sort of stuff.

I know that the database connection type makes a big difference. 
DBLIB was very low level and high performance, but a real pain to code
directly and of course tied to MS SQL Server.  ODBC was more
"universal" (perhaps the closest of any data access technology to
universality) but suffered in performance comparatively.  We found
that a good bit of the performance was recovered with ADO using the
OLE DB Provider for SQL Server.

Hopefully I've given you some food for thought.  I could go on about
where we've evolved (XML), but let me stop and give you a chance to
ask for clarification.

regards, mathtalk-ga

Request for Answer Clarification by david617-ga on 22 Oct 2002 19:31 PDT

Mathtalk,

I feel that you have already given great value for my $5. I like the
general direction of your comments. However, I wonder if you could try
to extend your analysis to additional specifics which I will provide
here.

In your response, you seem to indicate that creating persistent
middleware objects to serve, as I understand it, as pointers to the
database and as encapsulated access methods. You mentioned "lazy
loading" as a way to prevent clogging up your application server with
cached data.

However, with this client engagement, the core performance problem is
that they have a central table of "patient outcome observations" which
can be discriminated against by the reporting medical institution,
diagnosis, etc. The client has put this all into one central table
with built-in code references (e.g., Location_code1, Location_code2,
etc. for Location Code Values) which is certainly worth considering
decomposing into several normalized tables.

Still, the basic issue is that this core data easily involves millions
of individual observations, and when they try to analyze a single
diagnosis across all patients or to compare patient outcomes within a
given facility against established norms they need to pull a lot of
records for analysis, which will be performed using SAS-type
techniques.

A possible solution which I had seen in a Java book, "Core J2EE
Patterns" was the pattern which they referred to as a Composite
Entity, where the Composite Entity represents a set of interrelated
persistent objects rather than representing each object as an
individual Entity Bean. (I would guess that in a .NET environment we
would be talking about Data Access Objects and related concepts). Have
you ever used such a construct to cache large amounts of data? What
types of considerations arise with regards to the need to perform
garbage cleaning of the Composite Entity or similar construct? Are
there any other methods you might recommend in these circumstances?

Thanks for your responses to date. I was getting worried that this
question would expire before anybody got around to answering it!

Clarification of Answer by mathtalk-ga on 22 Oct 2002 21:15 PDT

Thanks for the additional specifics, David.  The emerging picture is
that of a "data mining" application, an arena that SAS and many other
vendors are interested in staking out.  Presumably the client wants to
do exploratory queries and validate hypothesized relations (e.g.
elective surgery patient morbidity by nursing workload, as in today's
news), but also the ability to run a "standardized" batch of "peer
review" reports.

SAS and its smaller competitors on the analytical side might seem at
first glance to bring the most important capabilities to the project. 
Yet what I believe is that the care and feeding of those computational
routines from the database side is where some real performance
opportunities are.  That, at any rate, is what the DBMS vendors, like
Microsoft and Oracle, are hoping to prove.  Here's a Web link that
periodically updates the "OLAP Wars":

http://www.olapreport.com/Comment_APIs.htm

I can validate the "composite entity" approach, within the framework
of our homegrown middleware, as follows.  On top of generating the
individual "table" classes in C++, these were aggregated as members of
a "composite" class.  This was done in such a way that tables which
shared a field like SSN as part of their composite key could be
coordinated through that common value.  I don't see any reason the
same sort of technique couldn't be pulled off in Java.

Lookup tables turned out to be lumped into their own composite class,
not directly integrated into the related "entity" classes.

Although these applications were not "data mining" applications, they
did involve feeding a calculation intensive middleware from a database
with millions of records.

The quickest hit you might provide to your potential client is some
database indexes to speed up the routine queries, or since you don't
seem to ever have requirements to update the database, perhaps the
"data warehouse" approach of storing pre-computed summaries in the
database would be useful.

Best wishes,
mathtalk-ga

david617-ga rated this answer: 5 out of 5 stars

Through a series of well-written inquiries, mathtalk was able to
create a detailed response to a question which was initially proposed
in a very shallow fashion by me. His answers were thorough and he had
expertise in all of the aspects needed to understand what is actually
a quite complex problem.

Based on his advice to date, I would feel confident in hiring him to
perform actual services in the complex n-tier and data mining field,
as he seems to have broad experience in the area. I feel that I have
definitely gotten a tremendous return for the money invested in this
question. I only hope that more talented developers and architects use
this service to make it even better.

Regards,

David S. Lawrence
President, MVF Associates, Inc.
DavidL@MVF.com

Comments

Subject: Re: Performance Tuning of Three Tier Applications
From: mathtalk-ga on 22 Oct 2002 11:57 PDT

Hi, david617-ga:

This is a pretty broad topic (but close to my heart!), so feel free to
break in and curtail my going off on tangents.

The question of database retrieval performance is often crucial to
three-tier applications.  The time spent in
calculations/representations/transformations of the retrieved data on
the middle-tier (here, server-side Java components presumably) is
usually much less than the time needed to retrieve the data from the
database.

So there are two sides of that process which deserve tweaking.  One is
the time spent creating the database connection, and the general
solution here is using a "pool" of database connections.  If you are
using ADO, Microsoft has tried to automate this pooling for you with
COM+/MTS/IIS technology.

Your example, however, perhaps indicates a focus on the other side of
the coin, which is how to optimize a SELECT query once the database
connection is in place.  Although your example is for extracting
records from a single table, one will ordinarily need a JOIN query to
combine data from multiple tables.  This is especially true if the
database is a highly normalized relational design.

The single most important factor in efficient query design is the
presence of useful indexes on the underlying table(s).  In your single
table example, you apparently have values for fields x and y (or were
var1,var2 the column names?).  So it would be good to have indexes on
one or on the combination of these two columns, in order to optimize
that particular query.

But your question is probably:  how to optimize for "broad
retrievals", meaning some ad hoc generated queries from the front-end.
 The best advice I can give is to start with a highly normalized
database design.  Yes, specific queries can always benefit from a
degree of denormalization, but a competent DBMS like Oracle or MS SQL
Server will perform well using appropriately factored (normalized)
tables that have the right primary keys (with unique indexes).

Given that starting point, there are several "tweaks" that can be
considered.  Some are platform specific, like using clustered indexes
on SQL Server.  Some are platform independent (standard SQL), like
using VIEWs and secondary indexes to help out with specifc queries. 
And some techniques, like using stored procedures, are generally
applicable (for optimizing specific or parameterized queries) but are
more useful/easier to implement depending on platform.

If you want a deeper comment or answer, the DBMS platform and the
method of database connectivity (Java/ODBC?) need to be specified.

regards, mathtalk-ga

Subject: Re: Performance Tuning of Three Tier Applications
From: david617-ga on 22 Oct 2002 14:28 PDT

mathtalk,

This is a real problem from a real (potential) client. The situation
is that they have rolled their own middleware from C++ and Perl which
basically has some (though not all) of the characteristics of .NET or
Java. Since they own the code base, they could extend it as needed.
The database is SQL Server, but I don't think it's as important an
issue, since I am really trying to look at middleware options.

I was kind of putting this question out as a straw man to test whether
I could get good answers on Google (Hey, 5 bucks can't hurt, you
know?)

You answer covered many aspects that are certainly important. In
addition to the direction that you went in your analysis, I would like
to hear about creating persistent, cached objects (such as entity
beans) in the middleware to speed up performance. What are the pros
and cons of such an approach? Of course, without the specifics of the
application, its data structure and its sizing at hand, a specific
recommendation may be difficult.

Do you have any ideas as to the pros and cons of using middleware
objects to increase large-scale data retrievals in such circumstances?
Any other ideas that would not fit in the context of straight database
performance tuning? My client indicated that, no matter what they
tried on the server-side, it seemed to be constrained in terms of what
they could pull across the network. Connection pooling is obviously a
good idea, but I am looking to examine possible middleware
alternatives to accelerate data retrieval, given that the database
seems to be peaked out.

Any thoughts?

David

Subject: Re: Performance Tuning of Three Tier Applications
From: mathtalk-ga on 23 Oct 2002 07:47 PDT

This may be more of an outline of questions, instead of suggestions.

I hear you saying the client thinks the database is "peaked out".  But
I would examine that bottleneck carefully.  How many processors (CPUs)
is it using?  How much memory is allocated to the database?  It's a
cliche, of course, but hardware is cheap, and when I think of the
performance we used to milk out of 2 and 3 CPU boxes (at 200MHz
speeds!) using much less than a gigabyte of memory, I just have to
believe that a good database server these days could chop and slice a
few million-record tables with comparative ease.

But alright, let's say the database can't be touched for reasons
having nothing to do with this project.  It's still a limiting factor,
unless you want to extract all the data and put it into a new database
server!  [This isn't as crazy as it sounds, btw, one track of
performance enhancement for distributed applications is "database
replication".]  The next option is to use the highest performance
database connections you can.  My guess is that they are using ODBC
connections with Java, and this is killing their performance.  Because
ODBC is such a generic data connection, it cannot handle the case of
returning very large datasets elegantly.  If I remember correctly, if
the database connection does not allow all the records to be returned
at once, ODBC depends on polling of the database connection to
determine whether more records are available.

So the question is, what database connections are they using?  and is
there a better alternative?

Finally, I wonder what timing profiles are available or could be done
to assess the actual contributions of database retrieval versus
calculation time.  It is inevitable that statistical calculations
being done now with Java and/or Perl could be done more efficiently
with C++.  My only doubt would be whether this really is taking up
much time now.  Rewriting the calculations is an expensive task,
generally, if not because of the actual coding time, then for the
proper regression testing to accept it into production.  But if we
accept that the database is "peaked out", your focus will naturally
have to turn to middleware issues.

All the above would point to recommending initial development of a
test harness for the client's application.  It should be able to
measure both timing of responses and log the responses, e.g. into an
Access database, for regression analysis.  Building this test harness
would give you a chance to gain greater familiarity with the client's
situation, if there is any uncertainty there.  And with more
information you might well be able to provide a sharper time and cost
estimate for actual performance enhancements.

regards, mathtalk-ga

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy