Google Answers Logo
View Question
 
Q: LZW Algorithm For GIF89A Formats ( Answered 5 out of 5 stars,   3 Comments )
Question  
Subject: LZW Algorithm For GIF89A Formats
Category: Computers > Programming
Asked by: thinkingskull-ga
List Price: $10.00
Posted: 02 Jul 2002 15:10 PDT
Expires: 01 Aug 2002 15:10 PDT
Question ID: 35987
Specifically, I would like a better definition of building the bit
blocks after the LZW code size character.  I know the LZW code size
character defines how long each bits length is, and the next character
is the block size (how many characters).  I also know the bits are
placed right to left, but I am having trouble understanding how the
bits should be placed correctly.  The test image I am creating is a 2
pixel by 2 pixel image with a checkered pattern of black/white.  A
graphics program ends up with the 3 bytes being 12, 16 and 5.  I
assume the 5 is the end of the data block (bit size*2)+1, but don't
understand the arrangement of the 12, 16 bits in binary (00001100 and
00010000).  I have the test image (2 pixel by 2 pixel) online at:
http://goldborder.com/test.gif

Thank you for your help.

Request for Question Clarification by googlebrain-ga on 02 Jul 2002 15:40 PDT
Your test image doesn't seem to have a header. "GIF89a" should be the
first 6 characters of any gif file. Did you create the file with your
own software, or did you use some other program?

Clarification of Question by thinkingskull-ga on 02 Jul 2002 15:44 PDT
I may have changed the header with my new program.  I have resaved
through Photoshop and resent to the server.

Please try the image:  http://goldborder.com/test.gif

dave

Request for Question Clarification by googlebrain-ga on 02 Jul 2002 16:24 PDT
Are you aware that you image isn't a checkerboard? This is what it
looks like.

*0
00

Where * is a Black square, and 0 is a White square.

I will continue to investigate your problem, but perhaps this is what
has been causing your confusion?

Anyway, how do you wish to procede?

Request for Question Clarification by googlebrain-ga on 02 Jul 2002 16:58 PDT
"0C 10"

The first byte is 	0x0c (12) 00001100 
the second byte is  	0x10 (16) 00010000

together they read like this

0001000000001100

reading Right to left.

the first three bits (100) represent a "Clear Code" 2**<code size> or
100

The next three bits 001
The next three bits 000
The next three bits 000
The next three bits 001

These would seem to represent (progressing Left to right, and Top to
Bottom)
-------------
! 001 ! 000 !
-------------
! 000 ! 001 !
-------------

Which looks awfully like the grid you were trying for.

However, that would only be the case if there were no LZW compression
being applied to your data. Are you, in fact, doing the compression,
or just storing the bits raw?

Request for Question Clarification by googlebrain-ga on 02 Jul 2002 17:26 PDT
Alright, this is getting weird. Originally, I used Paint Shop Pro to
view your image, and it showed the broken version of the graphic (*0,
00) but all of the Microsoft viewers I use (Internet Explorer, and
Paint) show the grid the way I assume you ment for it to be. (*0, 0*)

Anyway, there's obviously something hinky going on here. I'd bet it
has to do with the compression/non-compression of those bits I
mentioned before.

Clarification of Question by thinkingskull-ga on 02 Jul 2002 17:33 PDT
I looked at the image and it is:
   0X
   X0    (0 = black, X = white)
Img Location: http://goldborder.com/test.gif

I pulled it up for a co-worker and he also saw the checkered pattern. 
I then opened it up in Photoshop at 1600 percent zoom and saw the same
pattern.

I then created a byte checker program to look at the GIF byte by byte.
 Here is the test.gif file byte by byte:

<table border=0 cellpadding=3 cellspacing=1>
<tr>
<td><b>CHAR</b></td>
<td><b>ASCII</b></td>
<td><b>Position</b></td>
</tr>
<tr><td>G</td><td>71</td><td>1</td></tr>
<tr><td>I</td><td>73</td><td>2</td></tr>
<tr><td>F</td><td>70</td><td>3</td></tr>
<tr><td>8</td><td>56</td><td>4</td></tr>
<tr><td>9</td><td>57</td><td>5</td></tr>
<tr><td>a</td><td>97</td><td>6</td></tr>
<tr><td></td><td>2</td><td>7</td></tr>
<tr><td> </td><td>0</td><td>8</td></tr>
<tr><td></td><td>2</td><td>9</td></tr>
<tr><td> </td><td>0</td><td>10</td></tr>
<tr><td>€</td><td>128</td><td>11</td></tr>
<tr><td> </td><td>0</td><td>12</td></tr>
<tr><td> </td><td>0</td><td>13</td></tr>
<tr><td>ÿ</td><td>255</td><td>14</td></tr>
<tr><td>ÿ</td><td>255</td><td>15</td></tr>
<tr><td>ÿ</td><td>255</td><td>16</td></tr>
<tr><td> </td><td>0</td><td>17</td></tr>
<tr><td> </td><td>0</td><td>18</td></tr>
<tr><td> </td><td>0</td><td>19</td></tr>
<tr><td>,</td><td>44</td><td>20</td></tr>
<tr><td> </td><td>0</td><td>21</td></tr>
<tr><td> </td><td>0</td><td>22</td></tr>
<tr><td> </td><td>0</td><td>23</td></tr>
<tr><td> </td><td>0</td><td>24</td></tr>
<tr><td></td><td>2</td><td>25</td></tr>
<tr><td> </td><td>0</td><td>26</td></tr>
<tr><td></td><td>2</td><td>27</td></tr>
<tr><td> </td><td>0</td><td>28</td></tr>
<tr><td> </td><td>0</td><td>29</td></tr>
<tr><td></td><td>2</td><td>30</td></tr>
<tr><td></td><td>3</td><td>31</td></tr>
<tr><td></td><td>12</td><td>32</td></tr>
<tr><td></td><td>16</td><td>33</td></tr>
<tr><td></td><td>5</td><td>34</td></tr>
<tr><td> </td><td>0</td><td>35</td></tr>
<tr><td>;</td><td>59</td><td>36</td></tr>
</table>

Clarification of Question by thinkingskull-ga on 02 Jul 2002 17:41 PDT
I was saving a new gif image in Photoshop and then playing with it.  I
was leaving the LZW up to Photoshop and then backwards engineering the
file to understand how it was building the bit arrangement.

Your example of the bit count makes sense but I thought Position 30
which is ASCII 2 is stating the bit count, and Position 31 is stating
the Byte length of the Block.  By your example, the bit count is 3?

Thank you in advance

Clarification of Question by thinkingskull-ga on 02 Jul 2002 17:46 PDT
The table code didn't show, so I put up a cleaner version at:

http://goldborder.com/testgif.htm

Request for Question Clarification by googlebrain-ga on 02 Jul 2002 18:22 PDT
Output codes are a minimum of <code size>+1 and a maximum of 12 bits
long. The extra bit is to allow for the clear code (100) which would
never occur naturally with a 2 bit code size. Basically, if the
highest bit is a 1, it's a special code. Otherwise, it's data.
Answer  
Subject: Re: LZW Algorithm For GIF89A Formats
Answered By: googlebrain-ga on 02 Jul 2002 18:31 PDT
Rated:5 out of 5 stars
 
"0C 10" 
 
The first byte is  0x0c (12) 00001100  
the second byte is   0x10 (16) 00010000 
 
together they read like this 
 
0001000000001100 
 
reading Right to left. 
 
the first three bits (100) represent a "Clear Code" 2**<code size> or
100
 
The next three bits 001 
The next three bits 000 
The next three bits 000 
The next three bits 001 
 
These would seem to represent (progressing Left to right, and Top to
Bottom)
------------- 
! 001 ! 000 ! 
------------- 
! 000 ! 001 ! 
------------- 

Output codes are a minimum of <code size>+1 and a maximum of 12 bits
long. The extra bit is to allow for the clear code (100) which would
never occur naturally with a 2 bit code size. Basically, if the
highest bit is a 1, it's a special code. Otherwise, it's data. 


Additional Links:

I believe you can find everything else you need to know on this web
page
http://256.com/gray/docs/gifspecs/

The LZH block is discussed here
Appendix F. Variable-Length-Code LZW Compression
http://256.com/gray/docs/gifspecs/appendices.html


Search Strategy:

gif file format 89a rfc
://www.google.com/search?sourceid=navclient&q=gif+file+format+89a+rfc

Graphics Interchange Format  compuserve
://www.google.com/search?sourceid=navclient&q=Graphics+Interchange+Format++compuserve

Clarification of Answer by googlebrain-ga on 02 Jul 2002 18:34 PDT
Please feel free to ask for more clarifications if the information
already provided doesn't completely explain how the bits should be
placed.

Request for Answer Clarification by thinkingskull-ga on 03 Jul 2002 07:17 PDT
Thank you for the hardwork on answering this.

I understand your answer, but if I try to apply it, it doesn't work
out with a different arrangement.  For example, in the pattern we have
been discussing:
------------- 
! 001 ! 000 ! 
------------- 
! 000 ! 001 ! 
-------------   2 byte code: 0001000000001100  (12 and 16 ASCII)
Your answer seems to work:
  001 000 000 001 100 (clear code)

But if I want to make the pattern the opposite (i.e. switch the black
white positions)
------------- 
! 000 ! 001 ! 
------------- 
! 001 ! 000 ! 
-------------  

I should be able to use the 2 byte code: 0000001001000100 (2 and 68
ASCII)
to produce the reverse affect.  But after replace (12 and 16) with (2
and 68), the pattern shows up the same in graphic programs, browsers..
etc..  I placed the altered GIF image on-line using the (2 and 68) so
you can see:
  http://goldborder.com/test2.gif

Request for Answer Clarification by thinkingskull-ga on 03 Jul 2002 08:05 PDT
To further the research into the pattern I took the same 2 pixel image
and instead of just having black, white, black, white, I put Red (255
0 0), Green (0 255 0), Blue (0 0 255) and Black (0 0 0) into a 3rd
test image:
http://goldborder.com/test3.gif

Looking at the byte values in positions 50, 51 and 52:
  50 = 24 (00011000)
  51 = 50 (00110010)
  52 = 148 (10010100)

stringed together: 000110000011001010010100
Table reference for this image at:
http://goldborder.com/test3gif.htm

I can not seem to find the pattern you discussed above (except the 100
clear code at the end).

Maybe this 4 colored image will help?

Clarification of Answer by googlebrain-ga on 03 Jul 2002 10:44 PDT
In your latest example image, you have increased your code size. Also,
you have strung your bytes together backwards.

10010100 00110010 00011000 would be the correct sequence.

Since you increased your color depth (by adding more colors) your code
size now 3, and your output codes are <code size>+1

1000 Clear code. 2**<code size> (The clear code is always Highest bit
set, low bits clear, no matter how many low bits there are.)

0001
0010
0011
0100

Which correspond to your four colors. 

The last four bits (1001) are probably a stop code, but it's been a
long time since I did this, so I don't remember exactly.

Because of the way LZW compression works, you can't simply rearrange
these bits to get a different color sequence, as the computer is
building a decoding matrix as it goes.

LZW compression is enough to give people fits. It's the difficult part
of what you are trying to accomplish.

You WILL need to fully and thoroughly understand how LZW compression
works before you can hope to build a gif encoder or decoder. The small
examples you have been working have only seemed to be easy because of
a quirk in the way the compression works. The first 256 codes (0000 -
00FF in this example) are already assumed to be (00 - FF) so no
compression seems to be going on. But get more than a few bytes in,
and you will quickly begin seeing codes that won't make any sense
without a THROUGH understanding of LZW. (For example, since you are
currently at code size 3, you can't have 256 codes, but the code size
can increase as you get further into the image.)

I am providing some links to help. *Teaching* you LZW is beyond the
scope of this question, and my ability. I understand it enough to help
you with your original question. (The Code Sizes and Block Sequencing)

If you move slowly and carefully through these files, (plus the ones
I've already mentioned) I'm sure you will be able to build a working
encoder/decoder.
http://www.msg.net/utility/whirlgif/lzw.html

This page has some sample encoders-decoders written in psuedocode.
http://www.daubnet.com/formats/GIF.html

Also, know that GIF is a copyrighted standard. It was created by
CompuServe (Unisys holds the standard now) a few years ago, and there
was a big push a few years ago for everyone to stop using GIF files,
and switch to PNG, which was a free standard.

Here you can find information about the Unisys patent, the
controversy, and how to get a license.
http://directory.google.com/Top/Computers/Data_Formats/Graphics/2D/GIF/

About the images showing up correctly... I am beginning to think that
Microsoft isn't following the standard as strictly as one would hope.
When I try to save these images, it seems convinced that they are .bin
instead of .gif files. That may be why they display in some programs,
but not others (like Paint Shop Pro.) It's really nice that they still
show the image, but it makes for a lousy way to confirm if a gif is
built correctly or not.

Clarification of Answer by googlebrain-ga on 03 Jul 2002 14:17 PDT
I erred when I said that GIF was a copyrighted standard. It's the LZW
compression that's copyrighted.
thinkingskull-ga rated this answer:5 out of 5 stars
Awesome job.  You definetly opened up the light on the question and
gave an answer that allows me to pursue the solution on my own.

I wish I was rich and could have put a higher price tag on the
question.

*big cheers*  (hip hip hoory, hip hip hoory, hip hip hoory)

dave

Comments  
Subject: Re: LZW Algorithm For GIF89A Formats
From: googlebrain-ga on 03 Jul 2002 11:15 PDT
 
>I wish I was rich and could have put a higher price tag on the
question.

Me too. :D

Seriously, tho.. I'm just glad I could help. This is the kind of
question I became a researcher for.

googlebrain-ga
Subject: Re: LZW Algorithm For GIF89A Formats
From: googlebrain-ga on 06 Jul 2002 14:50 PDT
 
While putzing around the internet, I found an even better written
explanation for LZW compression. It even comes complete with a C
implementation.

http://dogma.net/markn/articles/lzw/lzw.htm
Subject: Re: LZW Algorithm For GIF89A Formats
From: thinkingskull-ga on 06 Jul 2002 20:32 PDT
 
Thank you.

I have implemented a couple of files with LZW compression successfully
using my new program.

Thank you again for your digelence in this subject.

dave

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy