I've set up several clusters, both for testing and for production use
(a large real time simulation). I will provide a general solution and
then suggest some additional tools you may find helpful - depending on
your specific application.
First, I will assume your "server" has a pair of network interfaces,
with a set up like this...
(users) -- network -- server -- cluster switch -- (the 10 clients)
So, the users of the network would only see the "server" system (which
manages the cluster). Alternatively, you could login directly on the
"server" using its console (if you don't have an outside network).
Make a clarification request if you have dual network interfaces on
the clients (and three on the server) and I can describe some
additional steps to "channel bond" the network interfaces to improve
performance within the cluster (though I will also highlight some
drawbacks as well). If you have some exotic hardware like Myrinet, let
me know about that as well - I can provide further information as well
on that kind of set up.
Second, I assume your "clients" (or I would call them "compute nodes")
are expected to run the same configuration of Linux. The application
may vary by node, but the OS and system applications would be the
same. If this is not true, please make a clarification request and I
can suggest some methods to handle heterogenous clusters.
 Set up your server system using SuSE Linux. Install all the
software you expect to use on that system. I recommend you set up DHCP
on the cluster network to assist in the booting of the client systems.
There is a nice tutorial at IBM (registration required) referenced
(be sure to click on "read more") which describes the steps necessary
to set up a DHCP server. There is a shorter explanation (of assigning
dynamic addresses) at
which describes both DNS set up & DHCP. Note that DNS is optional
(using fixed addresses & names in /etc/hosts on each system).
The system imager manual at
describes how mkdhcpserver can be used if the DHCP server software is
To give you a couple options, the production clusters I set up, used
fixed addresses, mapped to each client MAC address. This was because
the production cluster clients had slightly different hardware
configurations and it was important each client machine had the same
address each time it rebooted. You can generally "discover" the MAC
addresses by network booting the client machines & viewing the
messages on the console and or the DHCP log file on the server system.
If you don't have that constraint, it is simpler to set up a range of
addresses for your client systems. I also forced my DHCP server to
give out addresses only on the cluster network (and not the more
general network) - you may want to do the same.
 To set up the client systems I suggest using a tool such as
"System Imager" or a more complete package such as "SI Suite". These
work in a "client" & "server" mode, similar to your set up.
I have used both, but tend to prefer to use System Imager dircectly
(your mileage may vary). I also found Brian Finley (System Imager lead
developer) very helpful as well. There is a very nice "how to" at
which describes the basic steps or the more comprehensive System Imager guide at
which goes into much more detail, describing several alternatives.
The "how to" describes the steps for a Debian style system - the RPM
equivalent (for SuSE) is described at
Be sure to scroll down and look at the dependencies - when I used Red
Hat a few years ago I had to download those packages before System
Imager would work. The steps by the way (once the dependencies are
satisfied) go something like this for the image server:
(bring down the install script)
chmod +x install
(see the available packages)
(download and install the packages you want as root)
./install --verbose <package_one> <package_two> ...
Assuming X86's (not IA64's - I can't be sure from your system
descriptions), the current packages are
(I assume you don't want "flame thrower" for the small number of systems you have)
Note the comment about enabling the System Imager service once
everything else is set up on your image server.
Look at SI Suite if you want the other tools - they are installed in
basically the same way as well.
At this point, you should have an image server ready to set up.
 Now, you should set up a "golden client" to be cloned onto the
other nine systems. Perhaps the simplest method is to swap the golden
client with your server, install SuSE Linux for that configuration &
download (similar to the image server) the packages and make sure
systemimager-client (and the dependencies) are installed on that
system. Don't worry about host names or addresses at this point, some
tools that System Imager uses will help fix that later. Be sure you
have some way to login to the client from the network (the "r"
commands are OK - rsh, rlogin, rcp) since the network is private, but
some prefer using the secure versions like ssh. The guides I've
referenced also describe possible issues with a firewall - I suggest
disabling the firewall (if any) on the client systems and on the
cluster network interface on the server.
 Prepare the image of the golden client. Connect the server /
client the way you intend to eventually use. The steps in the "how to"
are pretty much the ones you need to follow. Note the addresses used /
fix them if needed for your configuration. Note also that the options
on the getimage command line need to match your DHCP set up /
 You did not explain if your client systems have a floppy or CD
ROM. If so, you can follow the "how to" to generate the boot floppy
(or use mkinstallcd for a CD ROM). If you want to network boot the
clients to do the first install, see
for an explanation of the steps required for that set up. (it uses
network boot for the "first time" and then boots from the local hard
 Add the rest of the clients and the "how to" describes in its step
5 to load the boot media & reboot (make sure the BIOS is set up
correctly) to get the system image on that system. I found this step
doesn't take too long - it obviously varies by image size but with a
10G limit, it should not be too long.
At this point, you should have your "server" and 10 clients set up and
connected to each other & up and running. There are a variety of
methods you can use to operate the cluster.
For initial testing, I suggest logging into the server system & trying
to login & moving files betwen the server system & the client,
starting applications, etc. Shell scripts go a long way to automate
the steps for this or there are some tools that help automate that as
For the production systems I used, we used
o shell scripts on the server
o an "image" (like system imager) on the server that was pushed to clients by rsh
o rsh to run the shell script on each client
to download & run our application on the cluster. The cluster nodes
would communicate across the network or a high speed interface we
used. We had our own sychronization software - you'll probably need
some as well. We also had a shell script on the server that would kill
the jobs on all the clients (if necessary) using rcp (the client
script) and rsh (to run the job killing script on each client).
Other resources or tools that can help in your cluster include:
The original clusters were named "Beowulf", see
for links to the mailing list and some references (unfortunately, the
tools section is empty) such as turnkey vendors.
Cluster schedulers such as "Maui"
or the more comprehensive Moab cluster suite
There are other schedulers, try a search with
for several others such as the "Portable Batch Scheduler".
MPI (message passing interface)
or more recently "Open MPI"
are used on several high performance clusters. MPI provides services
to exchange data across the network and is generally bundled with
tools to start / stop applications. Versions work over the network as
well as with high speed interconnects such as SCI or Myrinet.
The NCSA has a number of applications that can help set up / manage a cluster at
Ganglia is an EXCELLENT cluster monitoring application
which allows your server to collect information about the clients
(e.g., CPU time, I/O) and display it on a web client. This requires
you to run a web server (I've used Apache) as well but I found was
extremely valuable when diagnosing problems. There are several
publically available reports of Ganglia on the internet - for example
is a cluster in Italy.
When setting up clusters & trying to fix application design, another
tool "netpipe" was also helpful
Netpipe gives you some nice analysis of packet size / throughput for
your network. I considered channel bonded network interfaces, but had
some odd results from netpipe which illustrated some problems in the
Linux network stack (related to out of order packets) - may be fixed
now since I haven't rerun the tests in over five years.
To get the best performance, you should consider how many calculations
should be performed on a system for each byte (word, float, etc.)
exchanged on the network. If it takes longer to pass the data across
the network, it can be better to recalculate a value on several nodes
than to exchange the data.
If some part of the answer is unclear or if you need additional
references, please make a clarification request. I would be glad to