Google Answers Logo
View Question
 
Q: data compression ( Answered 5 out of 5 stars,   3 Comments )
Question  
Subject: data compression
Category: Computers > Algorithms
Asked by: walton-ga
List Price: $2.00
Posted: 16 Nov 2004 19:37 PST
Expires: 16 Dec 2004 19:37 PST
Question ID: 429962
what's the best i can expect for english text-only lossless
data-compression, in terms of ratio of raw data/compressed data, using
a current state-of-the-art public domain algorithm?
Answer  
Subject: Re: data compression
Answered By: maniac-ga on 20 Nov 2004 12:31 PST
Rated:5 out of 5 stars
 
Hello Walton,

The "best" ratio of raw data (English text) to compressed data is
roughly a ratio of
  8 : 2.3
[or a single byte (8 bits) of text is encoded into 2.3 bits]
  http://www.cs.waikato.ac.nz/~singlis/ratios.html

For "popular" methods, the ratios are roughly
  8 : 4 Lempel-Ziv or (.zip)
  8 : 2.7 Gzip
  8 : 2.3 Bzip2
Source codes for each of these compression algorithms are available on
line. For example:
  http://www.gzip.org/
has links to both algorithms and source code for gzip.

Notice also that some algorithms are adjustable. For example, with
bzip2 there are several settings possible as described at
  http://www.mkssoftware.com/docs/man1/bzip2.1.asp
[scroll down for a table of results with the Calgary corpus]
The results will also vary based on the input file (size, type of text).

A few other references:
  http://en.wikipedia.org/wiki/Data_compression

Search phrases:
  lossless text compression ratio

  --Maniac
walton-ga rated this answer:5 out of 5 stars and gave an additional tip of: $5.00
exactly the answer i wanted.  i especially appreciated the links for
additional info.

Comments  
Subject: Re: data compression
From: 12345a-ga on 16 Nov 2004 19:52 PST
 
Not sure if its the best compression ratio but it is good.
http://www.7-zip.org/7z.html
Subject: Re: data compression
From: bluephoenixalpha-ga on 24 Dec 2004 10:22 PST
 
rzip takes a bit longer, but uses a larger sample space resulting in
significantly smaller compressed files.  http://rzip.samba.org
Subject: Re: data compression
From: nanyuki-ga on 28 Dec 2004 10:36 PST
 
RAR provides very good compression ratio for plain text files ( about 8:2 )
http://www.rarlab.com/

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy