However, what I need to do is to put each character and its position into memory (RAM), rejecting already existing combinations. I found some references on how to read an entire file into memory. But that is not what I need. I need these positions and characters in memory, not just file contents.
Here is my code. Please let me know what should be changed. Thanks.
I found some references on how to read an entire file into memory. But that is not what I need. I need these positions and characters in memory, not just file contents.
From what you've said, I don't understand the difference.
I also don't understand what you mean by "rejecting existing combinations".
First off - thank you for the reply, Helios.
I will try to answer your questions. Let me tell you WHY this is being done to begin with. I want to try to compress a text file (megabytes or gigabytes) by decomposing each character and its position and linking to already existing combinations. This is just something I set as as one of interesting problems in order to learn C++ programming.
1) So the file is just a long string of characters?
Yes. There are millions of lines, each one containing 30-35 numbers or characters. Like this:
9758740009416740011848200935694
856567567034945668884234566556
...
2) From what you've said, I don't understand the difference.
Well, the difference is if I read the entire file with ifstream it will put the entire file (line by line) into memory. I need to take it apart by getting sequentially each character and its position and putting THESE values into memory.
so, instead of 9758740009416740011848200935694 we will get <position> <character> (and this is what is supposed to be in memory):
0 9
1 4
2 7
3 1
4 1
5 3
6 0
7 0
8 0
9 9
10 8
11 5
12 6
13 1
14 9
15 0
16 0
17 0
18 9
19 4
20 6
21 4
22 2
23 4
24 0
25 0
26 1
27 0
28 2
29 9
30 6
31 4
32 0
I will then try to work with that code to compress what is being put into memory by discarding already existing sequences. So, if next line also starts with 9758740009416, but the rest is different, the difference will point to position of the last similar character.
3) I also don't understand what you mean by "rejecting existing combinations".
Sorry, I did not make myself clear. That is what I meant by "rejecting existing combinations" of positions and characters and linking them instead to memory address of the last similar position and character. The file will contain partially repeating strings (could be half a line or more).
Please keep in mind I am just a beginner at C++, so code examples and correct code will be greatly appreciated. Thanks.
Well, if you load the file into an array, you already have an implicit (position;character) pair, formed by (index;array[index]).
By the way, the LZSS algorithm is exactly what you're looking for. I've previously posted about it: http://www.cplusplus.com/forum/general/17012/#msg85158
The implementation isn't optimal, specially in space, but it's straightforward and easy to understand.
Thanks, Helios.
Does LZSS work by compressing data and then putting it into memory? I am by no means an expert but I think it is a disk compression algorithm.
Thanks alot.