Hi ddb-ga,
My initial guess was that the time would increase by a factor less
than 2, and I ran some tests that supported this guess.
Test Results
I compared the time it took to copy a set of files with the time it
took to extract the same set from a compressed archive. For both
operations, the time to write the output files should be the same, so
the difference should be due to the difference between reading
uncompressed data and reading compressed data and decompressing it.
Unfortunately, I couldn't tell how the time was divided among input,
decompression, and output, but still, the results should be
indicative.
I used the popular WinZip program for compression and decompression on
an 800 MHz Pentium III running Windows XP. The disk drive was similar
to the one you specified. I used big data sets and timed the
operations with the stopwatch function of a watch.
First I tried about 375 megabytes of mostly audio files. It took
35.26 seconds to copy them, 141.48 seconds to compress them, and 41.91
seconds to extract them. If we guess that the copying time was
equally divided between input and output, each was 17.63 seconds. The
decompression added 6.65 seconds, or about 38% of the input time.
(I'm reporting hundredths of a second because that's what my watch
shows, but due to the limitations of my nervous system, the
measurements are not really that precise.)
My second test used about 233 megabytes, mostly image and
word-processing files. It was a better test in that I got more
compression of the data, but worse in that there were about 20 times
as many files involved, and the file system overhead appeared to be
significant. It took 123.59 seconds to copy the files, 141.54 seconds
to archive them, and only 91.97 seconds to extract them. So in this
case, decompressing them was actually faster. I guess that this is
because in the decompression case, the program avoided the file system
overhead of reading thousands of files, since it was only reading from
the archive file.
Additional Links
I searched the Web for information about this, but didn't find much.
Most of the performance evaluation of compression software seems to
focus on how much it compresses the data rather than how fast it runs.
The ones that do measure speed generally don't separate the time
consumed by the compression algorithm from input/output time, but just
run a number of programs with the same data and measure their elapsed
time.
Jeff Gilchrist's Archive Comparison Test site reports extraction as
well as compression times for a number of programs.
http://compression.ca/
Another site that reports compression speed is the Compression
Comparison Guide on Adrian Wong's Rojakpot:
http://www.rojakpot.com/default.aspx?location=3&var1=4&var2=0
It reports speeds ranging from 24.3 to 3,492.8 KB per second. These
times include disk input/output as well as compression. It doesn't
report on decompression.
Another site with test results is Maximum Compression.
http://www.maximumcompression.com/
Although it only reports results on how much compression programs
attained, not how fast they ran, the list of programs might be of
interest.
Both commercial and open source software components for data
compression are available. The Google Directory offers a starting
point.
http://directory.google.com/Top/Computers/Software/Data_Compression/
Search Strategy
Search terms included:
data compression software
data compression software performance
data compression software components
data compression speed benchmark OR benchmarks
Conclusion
Many variables will affect the impact of compression on your
application's performance, such as how compressible your data is, what
compression software or algorithm you use, whether the application is
CPU-bound or I/O-bound, and what else is going on in the computer.
However, the processor is fast compared to the disk drive, and
compression can save some of the input time by reducing the amount of
data that has to be read, so if doubling the time seems like an
acceptable worst case, compression is probably worth further
investigation.
If you would like me to look further for relevant test results on the
web, or if any of this answer is unclear or you need any other
information, please ask for a clarification. I hope this information
is helpful.
--efn-ga |