Decompressing RLE

Hi, I have an RLE compresser which works perfectly fine, but how do I decompress it? I have no idea how I would come about doing it. Any help? Below is compressor.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <fstream>
using namespace std;

int main(){
    ifstream data1;
    ofstream data2;
    
    data1.open("decomp_txt.txt",ios::in);
    data2.open("comp_txt.txt",ios::out);
    
    char s1,s2;
    int sk;
    
    data1.get(s1);
    while(!data1.eof()){
        sk=1;
        data1.get(s2);        
        while(s1==s2 && !data1.eof()){
            sk++;
            data1.get(s2);
        }
        if(sk==1) 
            data2<<s1;
        else data2<<s1<<s1<<sk;

        s1=s2;
    }
    data1.close();
    data2.close();
}
Looping until your file is at eof is going to not end well. Read the last line/character in the file and the flag for end-of-file still isn't set. You will then read past the end of the file and get bogus data.
Doesn't answer nor help the given question.
Last edited on
your format seems a little odd, but it looks like you get
xyy5zz3 to mean xyyyyyzzz ?
and it looks like you need to read 2 characters. If they are the same, the third one is how many of that to put in. If they are not the same, write the first one to the file, make the second one the first, and read the next one...
similar to what you have.

this is more complicated than just doing <letter count> pairs eg x1y5z3.
@jonnin i've fixed it by removing the not needed << s1 << by the else statement. You have an idea on how to decrypt RLE?
you just reverse the process. read in the symbol and how many, then write that many out. If you do something more complicated, then you have to reverse that too. RLE isnt a format, its a concept that you can do bitwise, bytewise, entity (object/class) wise, and so on ... if you write the compression, you reverse that to decompress. If you got a format from somewhere, follow it.
RLE is usually a waste of time. Very little data has enough repeated chunks to justify it, and even when the data is good for RLE, the better compression algorithms handle repetition better.
Last edited on
I know im begging and you already explained it, but can you show it in code? Like the full thing... Im new to this and kinda dont understand.
its ok to be new.
I won't do the whole thing for you. But here is a quick short example that may help you see what you are up against.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
string compress(string input)
{  //a function should do one thing well.  So, file I/O should be elsewhere. 
  int ct{};
  string result;  
  char a,b;
  b = input[0];
  for(int i{}; i < input.length();i++)
  {
    a = input[i];
	if(a==b) 
	{
		ct++;
	}
	else 
	{
		result+=b; 
		result +=to_string(ct); 
		ct = 1; 
		b = a;
	}//else		
  }	//for
   //be sure to catch the last char and its count!
   result+=b; 
   result +=to_string(ct); 
  return result;
}
string decompress(string input)
{
  string result;
  char c;
  int n;
  for(int i{}; i < input.length(); i+=2)
	  {
	    c = input[i];
        n = input[i+1]-'0'; //i leave it to you to handle 
		//the problem of getting the int from text when 
		//it is > 9 and figuring out what to do with a number
		//that was compressed like 1122233 compressed to 122332
		//which will break if you have >9 allowed as the count...
		for(int j{}; j < n; j++) result+=c;		  
	  }
	  return result;
}

int main()
{
  string s = "Seek the  allmighty  hoopajoo";	
  string r = compress(s);
  cout << r << endl;
  cout << decompress(r);
}



S1e2k1 1t1h1e1 2a1l2m1i1g1h1t1y1 2h1o2p1a1j1o2
Seek the  allmighty  hoopajoo


As you can see, the decompress just reads a char and the count then stuffs that many copies of the letter into the output, then moves to the next...

so, its bigger than the original, by a fair margin, and we didn't handle the problem of multi-digit counts (like 100 x's in a row) nor what to do with numerical input. Solve those problems first, THEN you can add file I/O to it. There is no use for files until you have debugged the tool and gotten exactly what you want for many test cases.
Also, your original ignored whitespace, mine does not but that depends on how you read the file later. If you skip ws when you read it and pass that along, it won't add any.
Last edited on
Looks to me like hevi is just asking for code and trying to avoid understanding his assignment.

hevi wrote:
Doesn't answer nor help the given question.

I am disinclined to help people who are so rude to begin with. George P is absolutely correct to point out that you are goobering your encoded data, and decoding it properly isn’t going to work —or be worth talking about— until you fix that.

[edit]
Of course, with a nicer attitude, you might get people happier to help explain how things work for you... RLE encoding and decoding is very easy... if you understand what is going on AND have a specific RLE encoding scheme decided.
Last edited on
@hevi. Is this an exercise - or have you found this compress code on-line somewhere?

What are trying to do? As has been pointed out above, if you try to compress something like 111222233333444444 (giving 13243546) and then decompress the result, you're in real trouble. As it stands, the compressor can't work on any text containing digits. If this is an exercise and it says to only use alpha-chars in the text, then OK. But otherwise... ???

D, the OP asserts his compression routine works. Without actually seeing the code I suspect the compression routine is likely as borked as this decompression one is.

It probably just APPEARS to work correctly, with a limited data set for testing. I've made that mistake more often than I almost care to admit.

GIGO....lather, rinse, repeat.
He clearly didn't write the compress code since the decompress code is easier.
I'm not sure the RLE scheme is sound, though.
It converts double letters to three letters, e.g., aa --> aa1
And I don't see how you can tell the difference between these cases:

1
2
3
4
5
6
7
aaaaaaaaaaaab --> aa11b
aa1b          --> aa11b

or

aaaaaaaaaaaaaaaaaaaaab  -->  aa20b
aaa0b                   -->  aa20b

Last edited on
To be fair, I believe every text based RLE scheme can be broken with the right input to the decompression. If your data is well defined, that is not an issue. Whatever symbols you use, that 'could' appear in the input file in a way that creates trouble. A (bought) program I used a few years back added about 20 characters of spew to the fields to limit this issue to only be broken for insane people, but even so, you *could* break it.
No code can be made idiot proof, 'cuz the universe keeps creating more and more idiots.

Some even write the code...
It is not unusual for an RLE compression scheme to only kick in if the run size exceeds the space needed to store it — though that doesn’t explain using both at the same time...

Though, honestly, I’m pretty sure I’ve seen this before somewhere.
hevi, STOP reporting posts you don't like. That makes people consider you to be a troll.
Topic archived. No new replies allowed.