LZW GIF compression?

Forum

Forum
General C++ Programming
LZW GIF compression?

LZW GIF compression?

Oct 28, 2010 at 5:17am

I've been trying to figure this out for days now - how to decode GIF image data. Dozens of Google searches and articles describing it, and I still have no idea how it works, so I can't write a decoder... Can anyone explain it simply, or give a good article on it?

Oct 28, 2010 at 9:38pm

PiMaster (359)

Bump?

Oct 28, 2010 at 9:43pm

Albatross (4553)

Are you looking for a library that does this for you, like http://sourceforge.net/projects/giflib/ or are you seeking to write your own?

-Albatross

Oct 28, 2010 at 9:54pm

PiMaster (359)

I was thinking I'd try to make it by myself.

Oct 29, 2010 at 3:18am

PiMaster (359)

Hello? O.o

Oct 29, 2010 at 3:39am

Duthomhas (13277)

All compression works by using probabilities.

For example, take the text "Hello world!". That's composed of twelve ASCII characters, all eight bits in length (seven, technically, but the characters are stored in eight-bit bytes). The first thought is that there are way too many bits in use per character. There's only nine distinct letters used in the message -- we can fit each character in only four bits. That effectively halves the space used.

But we can do better. The trick is that some characters appear more often than others. In this example, there are three 'l's and two 'o's. Since it is more likely that an 'l' or an 'o' will appear in the message than any of the other letters, what we can do is redistribute the number of bits needed to store each character so that letters that appear more often use fewer bits.

'l' 00
'o' 01
'H' 100
'e' 101
' ' 1100
'w' 1101
'r' 1110
'd' 11110
'!' 11111

Now our original message of 48 bits (using four bits per character) is now only 38 bits long. That's a pretty good start.

This is the basis of all data compression.

(Notice that the bit patterns are specifically designed such that it is impossible to have ambiguities. You can see this by drawing it as a binary tree, where a 0 leads to a left branch and a 1 leads to a right branch. The exact methods of coming up with these binary trees is not important in this particular discussion, but you may want to google around "Huffman coding" for some examples.)

LZW works by finding common patterns ("strings" of bits) and representing them with shorter strings. For example, the word "he" appears often in English. "When the dog followed them to the store they bought it licorice." Fancy noting that the "he" pattern is also in another couple of patterns: "the" and "he ".

In order to keep track of all these fancy bit-patterns, we must keep a dictionary. (In C++ that would be a std::map.) Also, the decoder must have a copy of the dictionary the encoder used in order to understand the message.

LZW uses a trick to embed the dictionary in the data stream. That is, when a new pattern is found and added to the dictionary, a special bit code is outputted, followed by information about the pattern it represents. The encoder uses this information to build its dictionary.

Also, the LZW starts out with a minimum number of bits, and the code word size (number of bits per symbol) may grow as needed. Both the encoder and decoder automatically detect this condition.

However, there are also other special codes used to signal that the decoder should behave in a certain way (for example, to throw away the dictionary and start over).

Finally, a GIF LZW stream is packed in to packets of up to 255 (IIRC) bytes, which your decoder must first unpack before sending the raw bitstream to the actual decode. Oh, and did I mention that you must be able to read a single bit at a time?

For more help, google around "wilhite" and check out the links at the bottom of the Wikipedia page. http://en.wikipedia.org/wiki/Graphics_Interchange_Format

Good luck!

Oct 29, 2010 at 5:55am

helios (17607)

Fun times.

Oct 29, 2010 at 7:23am

closed account (S6k9GNh0)

/me learned something here.

Oct 30, 2010 at 2:28am

Duthomhas (13277)

Ah, I knew I could find it!

Here's Steven A. Bennett's GIF decoder. It is fast. (It is in C.)
http://read.pudn.com/downloads3/sourcecode/graph/9721/DECODER.C__.htm

Keep in mind this only handles the LZW stream. You'll still need to write the stuff to handle the GIF specification stuff -- like extracting the colormap and the like.

Something else I found that was fun:
http://www.olsenhome.com/gif/

Topic archived. No new replies allowed.