Hello there, I have been doing some exploring over ways of dealing with large sums of information, hopefully leading to a fun project for myself, but at the minute I'm just concerned over some basic stuff that maybe someone could advise on:
I'll represent the issue using terms familiar to anyone who has ever played a game called Minecraft. In that, you have a world made of blocks. Millions of them. I'm hoping (one day) to build some sort of voxel-based-terrain (lets call it a very far off day then..) program, purely for my own amusement, but I have to come up with a way of storing the vast amounts of data that could be generated effectively.
So I have a certain maximum area of terrain, which we'll say is 12,000 blocks by 48,000 blocks, and can be as high as 300 blocks (a very very very very far off day then). Thats approximately 1.728^11 blocks (172,800 Million).
Obviously i'm not going to load up that many blocks, and I'm not asking you to help me regarding making this insane project, but the basic question I have is this:
What is the best way to store (as in file storage) this information?
I have decided to partition this data into sets (or minecraft chunks) to make it more easy to deal with. So "blocks" have chunk x and y coordinates, then the blocks have x,y and z coordinates within the chunk. At present, all my explorations are based around the size of a single chunk. There is also a last value to store which, we'll call the blocktype, which im giving a range of 0-9999. Now let me show you my test program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
|
// basic file operations
#include <iostream>
#include <fstream>
#include <cmath>
using namespace std;
int chunk_x = rand() % 240;
int chunk_y = rand() % 960;
int pos_x = rand() % 50;
int pos_y = rand() % 50;
int pos_z = rand() % 300;
int blocktype = rand() % 9999;
int main () {
ofstream myfile;
myfile.open ("data.txt");
myfile << chunk_x << chunk_y << pos_x << pos_y << pos_z << blocktype << "\n";
myfile.close();
return 0;
}
|
At present, this generates a file 17bytes in size. Very small, but when you multiply that by 172,800 Million, that comes to over 2,700 GigaBytes of information. Hard disk drives aren't that big! Is there any particular thing I could do to compress, or simply reduce the bytes-per-block? Storing perhaps as hexadecimal information is perhaps an option, but that only allows up to 255 numbers per two digits, so would not really reduce the characters.
Also, not forgetting, the file will also have to be read, with a decimal number string, as per the code above, its easy to tell the program "the first 3 digits are variable ##, the next 3 digits are variable ##" etc, I'm not exactly sure how one would go about reading the data if presented in other number systems...
Any advise or signposting in the right direction is appreciated!