Hi,
Here is a short summary of the history of text compression. You can
also access more detailed information using the links at the end of
the summary:
One of the earliest forms of text compression is Morse code, which is
invented in 1838. The compression is simply using shorter codes for
more commonly used letters. With the development of information
theory, statistical compression also developed further. In 1949,
Claude Shannon and Robert Fano introduced the method of compressing
the information using probabilities of blocks.
In 1977, Abraham Lempel and Jacob Ziv published a paper with a
different approach to the symbol coding problem. Instead of assigning
codewords to the symbols (or messages) in advance, they assigned
codewords to repeating source words, or patterns in the text. This
method is still the basis of most lossless modern data compression
techniques today.
In 1984, Terry Welch improved the Lempel-Ziv method known as LZ78, and
introduced the method known as LZW. This algorithm is used by
commercial compression softwares such as PKZIP, and by the mainstream
image formats GIF and some versions of TIFF. LZ78 and LZW algorithms
are patented (1981 and 1983, respectively) but they both are
presumably expired.
Lempel-Ziv method has been improved by many auxiliary algorithms, such
as LZSS (Lempel-Ziv-Storer-Szymanski), LZARI (Lempel-Ziv + Arithmetic
encoding), LZH (Lempel-Ziv + Huffman encoding, used in the archiving
softwares LHArc and ARJ).
There are both online and offline resources on the topic:
Data Compression Information
http://datacompression.info/
This is a very good site on data compression history, background and
algorithms.
A Brief History of Data Compression
http://www.eee.bham.ac.uk/WoolleySI/All7/intro_3.htm
"The late 40's were the early years of Information Theory, the idea of
developing efficient new coding methods was just starting to be
fleshed out. Ideas of entropy, information content and redundancy
were explored. One popular notion held that if the probability of
symbols in a message were known, there ought to be a way to code the
symbols so that the message will take up less space."
History of Data Compression
http://www.wolframscience.com/reference/notes/1069b
Lossless Data Compression
http://datacompression.info/Lossless.shtml
"Lossless compression methods that can't be pinned down to one of the
more refined topics, such as LZW or ZIP. In some cases, the items here
are esoteric algorithms that don't merit their own topics. In other
cases, they span one or more existing topics. The unifying theme is of
course that they are lossless. This means that after compressing and
then decompressing, the data set in quesition will be bit-for-bit
identical."
Lempel-Ziv 1977 (LZ77) Coding Algorithm
http://www.cs.technion.ac.il/Labs/Isl/Project/Projects_done/VisionClasses/DIP/Lossless_Compression/node16.html
"The LZ77 coding algorithm also referred to as sliding-window coding
algorithm. It keeps n-Ls last seen symbols (Ls is maximal length of
the string to be coded and n is the size of input buffer) to find the
maximal substring in these n-Ls symbols which is equal to the prefix
of the string to be coded. Then the found prefix is coded by the
position of the found substring, its length and by the first symbol in
the input stream coming after the found prefix. The length of the
found prefix may vary between 0 and Ls-1 and the greater it is the
more coding efficiency is achieved."
Lempel-Ziv 1978 (LZ78) Coding Algorithm
http://www.cs.technion.ac.il/Labs/Isl/Project/Projects_done/VisionClasses/DIP/Lossless_Compression/node17.html
"The LZ78 coding algorithm also referred to as dictionary-based coding
algorithm. Instead of searching for maximal substring among constant
number of last seen symbols which is equal to the prefix of the string
to be coded, as LZ77 does, the underlying idea of LZ78 is to search
for maximal substring among all last seen symbols which is equal to
the prefix of the string to be coded. Hence for infinitely long input
string the position (and therefore the binary representation of it) of
that maximal substring is unbound, thus complicating coding."
Info on LZ77/LZSS and Derivatives
http://datacompression.info/LZSS.shtml
"LZ77 algorithm and its descendant, LZSS. This is one of two seminal
LZ compression algorithms developed in the late 70s. LZSS forms the
core of the popular deflate algorithm when combined with a Huffman
encoder on the back end."
Lempel-Ziv-Welch (LZW) Coding Algorithm
http://www.cs.technion.ac.il/Labs/Isl/Project/Projects_done/VisionClasses/DIP/Lossless_Compression/node18.html
"LZW is a so-called commercial version of LZ78. The LZW algorithm uses
the same principles and ideas as LZ78 does, while adding several
technical improvements over LZ78."
LZW (Lempel-Ziv-Welch) Algorithm
http://www.wikipedia.org/wiki/Data_compression/LZW
"The key insight of the method is that it is possible to automatically
build a dictionary of previously seen strings in the text being
compressed. The dictionary does not have to be transmitted with the
compressed text, since the decompressor can build it the same way the
compressor does, and if coded correctly, will have exactly the same
strings that the compressor dictionary had at the same point in the
text."
Hope this helps,
Regards,
Bio
Google Answers Resarcher
Search strategy:
"history of text compression"
"data compression" algorithm
"lempel ziv"
lzw
pkzip |