Dear dirtech,
Although I have no direct experience with GFS or VxFS, I am familiar
with the inadequacies of NFS in a terascale distributed environment. At
the information-retrieval laboratory where I worked previously, we used
NFS to share files among our 24 Linux boxes, half of which were hosting a
terabyte corpus and its associated index. We consistently had performance
problems, even after virtual-memory thrashing was cured by upgrading to
the 2.6 kernel. Ultimately, we migrated the terabyte corpus to a novel
distributed storage system that was designed by one of our researchers
and implemented by a graduate student for his thesis.
I was interested to read that GFS also began life as a thesis
project. This is neither good nor bad in itself, but the recent provenance
of GFS suggests that it may still be maturing.
GFS originally developed as part of a thesis project at the
University of Minnesota. At some point it made its way to
Sistina Software, where it lived for a time as an open source
project. Sometime in 2001 Sistina made the choice to make GFS a
commercial product -- not under an open source license. OpenGFS
forked from the last public release of GFS.
In December 2003 Red Hat purchased Sistina. In late June 2004,
Red Hat released GFS and many cluster infrastructure pieces under
the GPL. Red Hat's current goal for the project (aside from the
normal bug fixing and stabilization) envisages inclusion in the
mainline Linux kernel. GFS now forms part of the Fedora Core
4 distribution and can be purchased as a commercial product on
top of Red Hat Enterprise Linux.
Wikipedia: Global File System
http://en.wikipedia.org/wiki/Global_File_System
There are several academic papers by the authors of GFS outlining its
theoretical advantages, but the few snippets of practical feedback I
have found on the web have been negative. Part of this, of course, is
merely due to the human propensity to raise one's voice in irritation
and to stay quiet in contentment. Observe, however, that the author of
the following passage compares GFS unfavorably to the SGI file system,
XFS, about which I shall say more later.
> does anybody out there have a configuration with at least a 2
> terabyte filesystem? i am using gfs 5.1 and in even doing simple
> things like 'ls -l' it takes MINUTES to return a result. basically
> we have a large ftp site currently running on an sgi. the same
> command on the same directory structure behaves normally, or as
> one would expect it to, on the sgi. on our linux box, running
> gfs 5.1, the results take up to two minutes to return a listing.
Global File System general discussion: "RE: really poor performance on
a 2 terabyte gfs"
http://permalink.gmane.org/gmane.comp.file-systems.gfs.user/29
The following research paper points out a weakness in the GFS design
that may explain gripes such as the one above.
In GFS, whenever a transaction modifies a buffer, a copy is
made to preserve its old contents. If the transaction must be
aborted, GFS simply restores all affected buffers by using their
frozen copies. Such a scheme is expensive in terms of its memory
footprint and copying overhead.
USENIX: "yFS: A Journaling File System Design for Handling Large Data
Sets with Reduced Seeking"
http://www.usenix.org/events/fast/tech/full_papers/zhang/zhang_html/index.html
The following report states outright that GFS is ill-suited to accessing
large files and to intra-file sharing.
Another example of a shared-disk cluster file system is the
Global File System (GFS) [20], which originated as an open source
file system for Linux. The newest version (GFS-4) implements
journaling, and uses logging, locking, and recovery algorithms
similar to those of GPFS and Frangipani. Locking in GFS is closely
tied to physical storage. Earlier versions of GFS [21] required
locking to be implemented at the disk device via extensions to
the SCSI protocol. Newer versions allow the use of an external
distributed lock manager, but still lock individual disk blocks
of 4kB or 8kB size. Therefore, accessing large files in GFS
entails significantly more locking overhead than the byte-range
locks used in GPFS. Similar to Frangipani/Petal, striping in GFS
is handled in a ?Network Storage Pool? layer; once created,
however, the stripe width cannot be changed (it is possible to add
a new ?sub-pools?, but striping is confined to a sub-pool,
i.e., GFS will not stripe across sub-pools). Like Frangipani,
GFS is geared more towards applications with little or no intra-
file sharing.
IBM Almaden Research Center: "GPFS: A Shared-Disk File System for Large
Computing Clusters"
http://www.almaden.ibm.com/StorageSystems/File_Systems/GPFS/Fast02.pdf
I haven't found much user feedback on VxFS, but what I have seen has
been positive. In the following message, one fellow opines that VxFS
works well for terascale file systems.
> So I would like to ask for your experiences with filesystems
> larger than 4 TB.
[...]
Doug Hughes said "not a problem with something like VxFS. Less
of a problem with UFS+ with logging turned on, but VxFS has a
marginal edge with larger sizes. The larger you get, the faster
it gets for crash recovery (in comparison). It goes up to 32TB
and many sites use the whole thing (or more, depending upon OS
version and Veritas version)"
SunManagers: Summaries: January 2005
http://www.sunmanagers.org/pipermail/summaries/2005-January.txt
More praise here.
Most of us at this point are [used] to building fileystems with
UFS. However[,] Veritas offers the Veritas File System[, a]
journaling filesystem with performance advantage over UFS. My
favorite use of VxFS, however, is that very large filesystems
can be created very quickly.
Cuddletech: A Brief Discussion of VxFS
http://www.cuddletech.com/veritas/advx/x69.html
Based on this feedback, which I admit is insufficient in quantity to
make for a statistically valid sampling, I would venture to guess
that VxFS is more reliable, for the time being, as a terascale SAN
solution. Before shelling out any bucks, however, I would certainly look
into the possibility of running Clustered XFS, or CXFS, on Linux.
SGI: CXFS
http://www.sgi.com/products/storage/tech/file_systems.html
The core of CXFS is XFS, which has a long history of supporting large
storage systems. The guys at Gelato, who were early advocates for 64-bit
Linux, are among those who have plumped for XFS.
At present XFS looks like the most appropriate file system for
large file work (but check out JFS, reiserfs version 4, and ext3
with large block sizes).
Gelato: Large File System support in Linux 2.5.x
http://www.gelato.unsw.edu.au/~peterc/lfs.html
Another word of support.
A modern journaling file system designed for large disks is XFS,
which is included in Linux and has no real limitation in disk
or file size (multi exabyte). So that's the configuration we're
running now.
Volker Gabler: Large disks with Linux (multi-terabyte)
http://www.lsw.uni-heidelberg.de/users/vgaibler/comp.html
CERN has interesting things to say about CXFS.
The Clustered XFS file system technology
(http://www.sgi.com/products/storage/software.html) is developed
by Silicon Graphics for high-performance computing environments
like their Origin. It is supported on IRIX 6.5, and also Linux
and Windows NT. CXFS is designed as an extension to their XFS
file system, and its performance, scalability and properties are
for the main part similar to XFS, for instance, there is an API
support for hierarchical storage management. Quite good.
Like XFS, CXFS is a high-performance and scalable file system,
journaled for fast recovery, and has 64-bit scalability to support
extremely large files and file system. Size limits are similar
to XFS: maximum file size 9 EB, maximum file system size 18 EB,
block and extends (contiguous data) size are configurable at
file system creation, block size from 512 B to 64 kB for normal
data and up to 1 MB for real-time data, and single extents can
be up to 4 GB in size. There can be up to 64k partitions, 64k
wide stripes and dynamic configurations.
CXFS differs from XFS by being a distributed, clustered shared
access file system, allowing multiple computers to share large
amounts of data. All systems in a CXFS file system have the same,
single file system view, i.e. all systems read and write all
files at the same time at near-local file system speeds. CXFS
performance approaches the speed of standalone XFS even when
multiple processes on multiple hosts are reading from and writing
to the same file. This makes CXFS suitable for applications
with large files, and even with real-time requirements like
video streaming. Dynamic allocation algorithms ensure that a
file system can store and a single directory can contain millions
of files without wasting disk space or degrading performance.
CXFS extends XFS to Storage Area Network (SAN) disks, working with
all storage devices and SAN environments supported by SGI. CXFS
provides the infrastructure allowing multiple hosts and operating
systems to have simultaneous direct access to shared disks,
and the SAN provides high-speed physical connections between
the hosts and disk storage.
CERN: DataGrid: Data Access and Mass Storage Systems
http://edg-wp2.web.cern.ch/edg-wp2/docs/DataGrid-02-D2.1-0105-2_0.doc
I have found it challenging but instructive to work on your question. I
hope you are pleased with my findings. If you are not, please advise me
through a Clarification Request and give me a chance to fully meet your
needs before you rate this answer.
Regards,
leapinglizard |