So I want to make a data compression program to test out my idea, but I'm met with a problem. The algorithm I will eventually develop requires that I have access to ALL of the file to be compressed at once, which means RAM usage would be through the roof. Am I going to have to redesign the entire algorithm or is there some tricky way to get around this?
Why does the file need to be in memory all at once? Can't you read it piece by piece, and then re-read it over if you need to go back. The file's not going anywhere as long as your program has an open stream attached to it...
Okay, the idea takes the numeric value of a file and translates it into a series of mathematical operations. To do this I need to know the numeric value of the file and therefore need to have access to it in its entirety at all times
No, I take the integer value, such as 10000000002 and "reduce" it to a series of mathematical operations, such as 10^10+2. Also, if you could elaborate on this biginit or send a link, that would be nice.
For loading, you may just have to load the whole file at once. Perhaps you could load up to a certain size (say... 500MB), then compress that chunk, then load another. The compression ratio for HUGE files would be hurt, but not too bad I wouldn't think.
I made a dummy program to test this concept, with *awesome results. I managed to get a 894 byte text file into just over 34 bytes. However, my formula generating algo took almost 40s to execute, and I'm pretty sure its not creating the smallest (or close to) formula. I already have some optimizations in mind that will reduce size and execution time, but have yet to implement them. I will post some code when everything is done.
*with some not so awesome results as well
EDIT: I'm an idiot. The measured time was from a test run containing debug output, significantly slowing down the program. Without the output, the program took just over 10s to compress a 5.14kb file.