Google Answers: Web server scaling / concurrency

View Question

Q: Web server scaling / concurrency ( No Answer, 1 Comment )

Question

Subject: Web server scaling / concurrency
Category: Computers > Internet
Asked by: wily-ga
List Price: $20.00

Posted: 06 Apr 2005 13:25 PDT
Expires: 06 May 2005 13:25 PDT
Question ID: 505925

I am developing a web application where clients will poll the web
server once per minute for new data.  Assume the HTTP request sent
from client to server will be 166 bytes.  When there is no new data,
assume the HTTP response will also be 166 bytes.  Assume that once per
hour there will be new data and when there is the HTTP response will
be 512 bytes.

The web server will be a dedicated Linux box running Apache / MySQL /
PHP.  Server hardware:
512 MB RAM
1.8 GHz Celeron CPU
80 GB ATA 7200 RPM HD

I think you can assume that bandwidth is not limiting.  (You can also
comment on the validity of that assumption for this situation if you
want.)

How would you determine a reasonable maximum number of clients that
the web server could handle for this situation?  What would be the
maximum number of clients for this situation?

Request for Question Clarification by alienintelligence-ga on 06 Apr 2005 13:51 PDT

Hi wily...

Is your question specifically oriented
around using a Celeron for the server?

You are aware that a Celeron is not
a good choice for a dedicated, high-
volume web server? You can of course
disregard that, if it's just a question
for the sake of scholastic research.

But if you are actively considering
using a Celeron for a business and
require a dependable platform, then
you should choose a different CPU.

Regardless of your response, realize
the CPU will more than likely be the 
limiting factor.

thanks,
-AI

Clarification of Question by wily-ga on 06 Apr 2005 16:44 PDT

No, the question is not geared around the type of CPU.  I just used
that because that's a quote for a cheap web hosting solution was
using.  If CPU is a limiting factor, then I'll pay more for a better
one if it makes sense.  Perhaps assume a more standard CPU, or
multiple CPUs.  If that changes the question to much, assume dual XEON
2 GHz.

I'm at the applciation design stage.  The criteria is to handle
100,000 clients.  I can alter the polling frequency, but for this
specific application the higher the frequency the better, but of
course the trade-off is higer server load and associated costs.

I guess I really need to know how best to handle 100,000 clients
polling the same database.  If I have to load balance with 100
different servers, I need to change the design.  Or if the 100,000
clients can maintain open socket connections to the web application
servers, I can really reduce the bandwidth because the server can then
notify clients with the new data instead of the clients continuously
polling.

I realize that polling is not ideal, and am looking at alternatives. 
Your answer will help me evaluate them.

Answer

There is no answer at this time.

Comments

Subject: Re: Web server scaling / concurrency
From: willcodeforfood-ga on 07 Apr 2005 11:28 PDT

Theory is great by practical testing is what you need.

Since you are assuming that bandwidth is not your limiting factor,
write a testing harness on one or more different machines on your
local network and hammer your development server with 100,000 requests
to see how long it takes your server to process them.  Be sure each
harness closes its socket connection before making the subsequent
request to force the server to open 100,000 sockets and perform
100,000 data fetches.  You need to have a certain portion of your test
harness requests cause your server to fetch data to better simulate
production conditions.  The actual proportion of your requests that
return data should be caclulated as follows:

   time between data updates
   -------------------------  = proportion of requests that will return data
     time between polls

So 1 minutes polling and hourly data updates means return data to 60/1
= 1.7% of the test requests.

Now you have a benchmark based on your dev server's hardware.  If your
server can easily handle 100,000 requests in 1 minute, then a 1 minute
polling interval will work.  If the server requires more time, then
set your polling interval to slighly longer than the time it takes to
process all 100,000 requests.  Rerun your test with the new proportion
of requests returning data to verify.

If all clients will be getting the same data at the same time, then
your testing should place all of the data-returning requests at the
very end or very beginning of the test.  This won't simulate potential
data access concurerrency problems but it will help you determine if
your system is going to become backlogged with requests each time data
arrives, since those requests will take slightly longer to process.

Ideally you would run multiple instances of the test harness on each
computer to better simulate concurrency.  Once you have some idea how
this system will behave you can make decisions regarding your
production hardware based on empirical analysis rather than guesswork.

You might want to post another question and see if anyone knows a good
piece of shareware/freeware that can do the hammering for you.  That
would save you some development time and good testing software could
help you identify potential concurrency problems.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.

Search Google Answers for

Google Home - Answers FAQ - Terms of Service - Privacy Policy