• Forum
  • Lounge
  • Lossless data compression algorithm ques

Lossless data compression algorithm question

Hello, I have the following question about LZ77, a lossless data compression algorithm, wondering if anyone here would know how to solve this:

Given the initial codebook (symbol is left column, code is right column):
Symbol | Code
(space) - | 110
B | 1110
H | 11110
I | 10
L | 0

We are told that the length of both the search and lookahead buffer are 8, meaning 3 bits for coding the distance and the match length. The coding window is positioned as follows: ILL_B which is in the search buffer, and ILLI which is in the lookahead buffer:
...ILL_B | ILLI...

We are asked to provide the binary code for this coding step, can someone tell me what it is and how they arrived at it?
Last edited on
TTBOMK, LZ77 (in its "pure" form) is not concerned with how symbols are encoded binary! Instead it just deals with "symbols", regardless of how they are represented. It uses a "sliding window" approach, which means that all input symbols that are processed also are appended to the "window" buffer. Once the "window" has reached its maximum size, the oldest symbols are dropped, so that the size limit won't be exceeded.

Every sequence of input symbols is stored as a triple of:
1. The number of symbols to copy (i.e re-use) from the current "window" buffer
2. The offset within the "window" buffer where to start copying
3. The next symbol (literal) after the symbols that were copied from the buffer

Note: Values for (1) and (2) are chosen by the compressor in such a way that the longest possible sequence is copied from the "window" buffer, before the literal symbol is stored. This maximizes the compression. Still, the sequence to be copied may be of length zero, e.g. if the next input symbol is not found in the buffer at all.

In practice, we usually use a combination of LZ77 (or one of its variants) and entropy coding. It's the entropy coding that is concerned with how symbols (literal) are encoded in binary. And not just the symbols, also the offsets and lengths! Common methods of entropy coding include Huffmann-Trees and Arithmetic-Coding.

For example, Deflate, the compression algorithm used in ZIP and also in many other applications (probably the most widely used compression algorithm), is a combination of LZSS (a variant of LZ77) and Huffmann-Trees.

Last edited on
Topic archived. No new replies allowed.