DBM files are "files with holes". A file that contains "holes"
(sometimes called a "sparse" file) means that the application wrote
some data then seeked to another offset later in the file and wrote
some more data. On most (all?) Unix systems the part of the file that
was skipped will not be stored to disk and will be zero-filled if read
by another application.
I'd guess that the 3 GB file was either copied, archived and extracted
(tar), or transferred over the network (ftp/scp). See the --sparce
option of the GNU cp command for an example of how to efficiently copy
these files.
In this old Sun manpage (the first I found),
http://mirrors.ccs.neu.edu/cgi-bin/unixhelp/man-cgi?dbm+3
says,
---
The .pag file will contain holes so that its apparent size
may be larger than its actual content. Older versions of the
UNIX operating system may create real file blocks for these
holes when touched. These files cannot be copied by normal
means ( cp(1), cat(1), tar(1), ar(1)) without filling in the
holes.
---
You'll also find that those files are platform dependent. The standard
dbm interfaces are really only for backward compatability now. You
should really consider moving to something like GDBM (which you can
bind to directly in Perl. See the AnyDBM module -- the good news is
that you'll probably only have to change a few lines of code to
switch)
http://aspn.activestate.com/ASPN/docs/ActivePerl/lib/AnyDBM_File.html
to get (among other things) longer key/value pairs, non-sparse and portable files.
http://www.delorie.com/gnu/docs/gdbm/gdbm_16.html
Good luck. |