I am using the method below to read a Large space delimited txt files(About 900 Mb). It took me 879s to load the data into memory. I am wondering if there is a more efficient way to read the txt file?
Another associated question is: is it a good idea to store such a huge data set using a 2D vector?
Have you tried with the highest optimization setting? The reason that I'm asking, is that stream usage often benefit a lot from optimization. Which compiler are you using?
I would have chosen to use a one-dimensional vector, and then index it by (row*ncols+col).
This will at least reduce memory consumption, but it may also ahave a signmificant impact on speed.
I don't remember whether a 'vector of vectors' is an endorsed idiom by the standard, but there is a risk that too much copying and memory reallocation is going on, if there is no special handling of the 'vector of vectors' case.
Have you tried with the highest optimization setting? The reason that I'm asking, is that stream usage often benefit a lot from optimization. Which compiler are you using?
Sorry, I don't know how to use those "optimization settings" you referred to... , and the compiler I am using is visual studio 2008.
I would have chosen to use a one-dimensional vector, and then index it by (row*ncols+col).
This will at least reduce memory consumption, but it may also ahave a signmificant impact on speed.
I don't remember whether a 'vector of vectors' is an endorsed idiom by the standard, but there is a risk that too much copying and memory reallocation is going on, if there is no special handling of the 'vector of vectors' case.
I am new to c++, I followed the suggestion given by a post in this forum(I could not find it now...) to use to 2D vector to contain the large data set. But I will try to follow your suggestion. Thanks for your help, and you have a nice day!
while (ss >> strtmp) //what are you doing here?
{
istringstream(strtmp) >> ncols; //the value of ncols will be over written
}
while( ss>>ncols ) //quasi-equivalent code
;
I am trying to read in a string (e.g. "1234"), and convert it to a number.
It seems that C++ is not capable to load large txt files into memory efficiently? Doubt about it...
Looking at your code, you load the data in parsed form, so it is never completely loaded in memory per-se.
std::deque is better suited for arbitrary growth, because it does not copy all elements every time it has to increase the space. It just allocates additional storage and links it. I think it is implemented as vector of pointers to fixed-size arrays. When it has to grow, it creates a new array and adds its pointer to the vector.
while( input.read(&c, sizeof(char)) ) //1 byte at the time
v.push_back(c);
it took ~ 1m30s
with 4 bytes at the time -> 36s
ne555, your response makes me see the hope! Could you please kindly give further help by providing the code to load the following sample data I made? I guess the code you posted is used to read in strings, right? While, I need to convert the strings of numbers to float numbers. I tried to mimic your code but failed.
Looking at your code, you load the data in parsed form, so it is never completely loaded in memory per-se.
std::deque is better suited for arbitrary growth, because it does not copy all elements every time it has to increase the space. It just allocates additional storage and links it. I think it is implemented as vector of pointers to fixed-size arrays. When it has to grow, it creates a new array and adds its pointer to the vector.
Regards
Simeonz, and Duoas,
Thanks for your responses. I modified my vector to std::deque, however it seems that the code is even slower... Probably I am idiot, I am really new to c++.