Hello Harrisbn,
There are a number of different ways to do this - at various price /
performance levels. This answer will describe a typical low cost
solution, a mid-range solution, and a high performance solution. It
will address equipment as well as software that can be used to set up
your "super computer".
Some of the initial work on clusters of PC's was done by Donald Becker
when he was at Goddard Space Flight Center in the mid to late 1990's.
He is now at Scyld Computing Corporation - a short summary is at.
http://www.scyld.com/corporate.html#becker
They hosted a site at
http://www.beowulf.org/
though I had problems accessing it - try
http://216.239.37.100/search?q=cache:BfgJkuJJ-mYC:www.beowulf.org/+beowulf&hl=en&start=1&ie=UTF-8
to get it from Google's cache.
The basic approach for a Beowulf cluster is a set of heterogeneous
PC's connected by switched Ethernet. Often there is a "head node" that
connects to a public network. A set of compute nodes operate on one or
more private LAN's behind the head node. This has the advantages of
good performance at low cost. Its main disadvantage is the need to
divide the task into relatively large pieces and minimize the amount
of data exchanged between systems. A price per node of $1000 to $1500
complete (computers, network, racks) is easily achieved.
Another site describing this is
http://www.beowulf-underground.org/
which has a variety of hardware and software resources.
Search using phrases such as
pc cluster
beowulf cluster
to get more sites with this kind of information.
The next best level of performance is to do something like
- switched Gigabit Ethernet (roughly 10x I/O performance)
- USB-2
- IEEE 1398 (or Firewire)
or a similar low cost / high performance connections. The price per
connection starts to increase at this level. Using Gigabit Ethernet as
an example, the price for an 8 port switch is several hundred dollars
instead of $50 to $100 for an unmanaged 8 port 10/100 switch. The I/O
performance increase starts to be significant and makes many more
applications run in an effective manner. The other two mentioned are
an alternative for smaller clusters if you can use a handful of
systems.
At the highest level of performance there are a number of specialized
interfaces including
- Myrinet http://www.myri.com/
- Scaleable Coherent Interface (SCI) http://www.dolphinics.com/
- Reflective Memory http://www.vmic.com/products/reflectivememory/
- Scramnet http://www.systran.com/scmain.html
These interfaces have a high speed interface with custom cards. For
example, SCI has a link rate of 1 Gbyte / second and the interface is
generally limited by the bus of the computer connected. On a 33 Mhz/32
bit PCI bus, this limit is about 100 Mbyte/sec. On a 66 Mhz/64 bit PCI
bus, the limit is about 400 Mbyte/sec. Latencies for data transfer is
generally in the 2 to 5 microsecond range. This compares well compared
to 100-1000 microseconds for Ethernet. Price for the cards and
switches varies by vendor - expect $2000 to $5000 per node for these
interfaces.
Software to run on these systems is available from several sources. A
few examples include:
- Message Passing Interface (MPI) - http://www-unix.mcs.anl.gov/mpi/
- Parallel Virtual Machine (PVM) -
http://www.csm.ornl.gov/pvm/pvm_home.html
- Mosix - http://www.mosix.org/
- Netpipe - http://www.scl.ameslab.gov/netpipe/
- Portable Batch System (PBS) - http://www.openpbs.org/
and so on. MPI and PVM provide message passing and synchronization
between parallel applications. Mosix is an add on to Linux to operate
the cluster as a single large system. Netpipe measures performances of
the network connection and your software (e.g., MPI). PBS can be used
to schedule tasks, stage and start jobs, and retrieve the results.
A few references from the Google Directory include
http://directory.google.com/Top/Computers/Parallel_Computing/Vendors/?il=1
http://directory.google.com/Top/Computers/Parallel_Computing/Beowulf/Vendors/
http://directory.google.com/Top/Computers/Supercomputing/Companies/
the latter to provide a comparison with traditional super computers.
There are also a number of "how to" resources - try
http://directory.google.com/Top/Computers/Parallel_Computing/Beowulf/Documentation/
for a good list to start from. I found the SCL Cluster Cookbook
http://www.scl.ameslab.gov/Projects/ClusterCookbook/
to be a good reference when I first started to work with clusters.
--Maniac |