Google Answers Logo
View Question
 
Q: Performance: Decompression vs. disk access ( Answered 5 out of 5 stars,   2 Comments )
Question  
Subject: Performance: Decompression vs. disk access
Category: Computers > Software
Asked by: ddb-ga
List Price: $20.00
Posted: 31 Oct 2003 13:29 PST
Expires: 30 Nov 2003 13:29 PST
Question ID: 271514
Is there a reasonable compression/decompression scheme 
which will allow me to access and decompress a portion of 
a file on disk (approximately 512K chunks, random point in 
file) with speeds that would compare to the time it would take 
to pull that same chunk from disk uncompressed?  I am 
targeting a Pentium4 system with ~9ms access/7200 rpm hard 
drive.  If it is possible, can you identify at least one 
of the decompression schemes and relative performance.  
If it is not possible, can you explain why.

Clarification of Question by ddb-ga on 31 Oct 2003 16:19 PST
Thanks for considering this question and the comments.

By scheme, I meant either source code, algorithm or 
vendor library/dll which I could integrate into my project
to request the data.  I would hope to work with a 
dataset that has at least 2x compression.  I would 
be reading a smaller chunk that decompresses into 
the 512K data set.

The goal is to use less disk space without penalizing the
runtime loading tremendously.

As to the reasoanble statement, I am trying
to find out if this decompression will cost 2x-10x-100x 
the time it would take to read the corresponding 
uncompressed segment.  I do not have a frame of reference 
for decompression and just need someone with expertise in 
the area to give me a WAG based on their results.
Answer  
Subject: Re: Performance: Decompression vs. disk access
Answered By: efn-ga on 31 Oct 2003 20:40 PST
Rated:5 out of 5 stars
 
Hi ddb-ga,

My initial guess was that the time would increase by a factor less
than 2, and I ran some tests that supported this guess.


Test Results

I compared the time it took to copy a set of files with the time it
took to extract the same set from a compressed archive.  For both
operations, the time to write the output files should be the same, so
the difference should be due to the difference between reading
uncompressed data and reading compressed data and decompressing it. 
Unfortunately, I couldn't tell how the time was divided among input,
decompression, and output, but still, the results should be
indicative.

I used the popular WinZip program for compression and decompression on
an 800 MHz Pentium III running Windows XP.  The disk drive was similar
to the one you specified.  I used big data sets and timed the
operations with the stopwatch function of a watch.

First I tried about 375 megabytes of mostly audio files.  It took
35.26 seconds to copy them, 141.48 seconds to compress them, and 41.91
seconds to extract them.  If we guess that the copying time was
equally divided between input and output, each was 17.63 seconds.  The
decompression added 6.65 seconds, or about 38% of the input time. 
(I'm reporting hundredths of a second because that's what my watch
shows, but due to the limitations of my nervous system, the
measurements are not really that precise.)

My second test used about 233 megabytes, mostly image and
word-processing files.  It was a better test in that I got more
compression of the data, but worse in that there were about 20 times
as many files involved, and the file system overhead appeared to be
significant.  It took 123.59 seconds to copy the files, 141.54 seconds
to archive them, and only 91.97 seconds to extract them.  So in this
case, decompressing them was actually faster.  I guess that this is
because in the decompression case, the program avoided the file system
overhead of reading thousands of files, since it was only reading from
the archive file.


Additional Links

I searched the Web for information about this, but didn't find much. 
Most of the performance evaluation of compression software seems to
focus on how much it compresses the data rather than how fast it runs.
 The ones that do measure speed generally don't separate the time
consumed by the compression algorithm from input/output time, but just
run a number of programs with the same data and measure their elapsed
time.

Jeff Gilchrist's Archive Comparison Test site reports extraction as
well as compression times for a number of programs.

http://compression.ca/

Another site that reports compression speed is the Compression
Comparison Guide on Adrian Wong's Rojakpot:

http://www.rojakpot.com/default.aspx?location=3&var1=4&var2=0

It reports speeds ranging from 24.3 to 3,492.8 KB per second.  These
times include disk input/output as well as compression.  It doesn't
report on decompression.

Another site with test results is Maximum Compression.

http://www.maximumcompression.com/

Although it only reports results on how much compression programs
attained, not how fast they ran, the list of programs might be of
interest.

Both commercial and open source software components for data
compression are available.  The Google Directory offers a starting
point.

http://directory.google.com/Top/Computers/Software/Data_Compression/


Search Strategy

Search terms included:

data compression software
data compression software performance
data compression software components
data compression speed benchmark OR benchmarks


Conclusion

Many variables will affect the impact of compression on your
application's performance, such as how compressible your data is, what
compression software or algorithm you use, whether the application is
CPU-bound or I/O-bound, and what else is going on in the computer. 
However, the processor is fast compared to the disk drive, and
compression can save some of the input time by reducing the amount of
data that has to be read, so if doubling the time seems like an
acceptable worst case, compression is probably worth further
investigation.

If you would like me to look further for relevant test results on the
web, or if any of this answer is unclear or you need any other
information, please ask for a clarification.  I hope this information
is helpful.

--efn-ga
ddb-ga rated this answer:5 out of 5 stars and gave an additional tip of: $10.00
Excellent service and team

Comments  
Subject: Re: Performance: Decompression vs. disk access
From: efn-ga on 31 Oct 2003 14:58 PST
 
I think you will get a better answer if you can provide more
information, such as answers to the following questions.  By "scheme,"
do you mean an algorithm?  What makes a scheme "reasonable" or not? 
Is there some minimum amount of compression required?  How much of a
difference is implied by "speeds that would compare"?  With
compression, would you be reading a smaller chunk that decompresses to
512K, or would you read 512K and decompress it to something larger? 
Is the goal to use less disk space, or is it something else?

Decompression inevitably takes some time, so decompression plus
reading will be longer than just reading.  But the processor is much
faster than the disk drive, so the decompression time may be
relatively insignificant.  But it's up to you to decide how much is
significant.
Subject: Re: Performance: Decompression vs. disk access
From: efn-ga on 01 Nov 2003 07:56 PST
 
Thanks for the rating and the tip!

--efn-ga

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy