Since, apparently [1], you've been struggling with this for almost three
months without any progress, here's something to get you started:
A simple method you can use is run-length encoding [2].
Suppose you have the following number sequence:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10 |
An alternative representation to the above is to write it as the first
element followed by the differences between consecutive elements:
1, 1, 1, 1, 1, 1, 1, 1, 1, 1 |
You can now use run-length encoding to compress the above into:
There are several improvements you can make here.
Consider the following sequence:
1, 2, 3, 4, 5, 7, 10, 12, 13, 15 |
The sequence of differences is:
1, 1, 1, 1, 1, 2, 3, 2, 1, 2 |
And the run-length encoding is:
1, 5, 2, 1, 3, 1, 2, 1, 1, 1, 2, 1 |
This is a problem. The compressed version of your number list is actually bigger than the original one. Notice that the problem is the second half of the original list. Wouldn't it be great if you could somehow have the first half compressed and leave the second part uncompressed? I have good news for you: You can do that! Since -1 can't be an element of the sequence you have to compress, you can use it to signal interpretation changes in the compressed sequence:
Actually, you can use all negative numbers to signal similar changes in the compressed sequence. For example, you could use -13 to indicate that, in the following 4 entries, the interpretation changes, remains the same, changes, and then changes again (since 13 is 1101 in binary). This could be useful in encoding, for example:
1, 2, 3, 4, 6, 9, 10, 11, 12, 13, 16, 20, 25
1, 1, 1, 1, 2, 3, 1, 1, 1, 1, 3, 4, 5
1, 4, -13, 2, 3, 1, 4, 3, 4, 5 |
Using only -1 you would have to write it as:
1, 4, -1, 2, 3, -1, 1, 4, -1, 3, 4, 5 |
And without negative numbers you'd have to write it as:
1, 4, 2, 1, 3, 1, 1, 4, 3, 1, 4, 1, 5, 1 |
Also, notice that since negative powers of 2 are actually
shifted -1s, you could assign a different meaning to them.
Here's some python code that implements the simplest version of the compression method I described above:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
|
from itertools import chain, groupby
from random import random
maxNumber = 20
numbers = [n for n in range(1, maxNumber + 1) if random() < 0.75]
def seq2dif(l):
return [l[0]] + [l[i + 1] - l[i] for i in range(len(l) - 1)]
def dif2seq(l):
c = 0
s = []
for i in range(len(l)):
c += l[i]
s.append(c)
return s
def flatten(l):
return list(chain.from_iterable(l))
def encode(l):
return flatten([[k, len(list(g))] for k, g in groupby(l)])
def decode(l):
return flatten([[l[i * 2]] * l[i * 2 + 1] for i in range(int(len(l) / 2))])
print(numbers)
compressed = encode(seq2dif(numbers))
print(compressed)
print(dif2seq(decode(compressed)))
|
http://ideone.com/WqvrQw
Try to rewrite it in a language you're familiar with (perhaps C++). Then, try to extend it using negative numbers. And when you're done playing, go check the links
Catfish666 posted in your other thread.
[1]
http://www.cplusplus.com/forum/general/119456/
[2]
http://en.wikipedia.org/wiki/Run-length_encoding