My project reads in a lot of data from a textfile. Each line looks like this:
[1char] [several chars] [int1] [int2] [int3]
Depending on the first char, different things are done with the integers.
I currently read it by calling getline() to copy a line into a string, converting the string to a stringstream, using a switch() on the first char to determine the result and then extracting the numbers by doing myStringStream >> mytempstring >> mytempstring >> myint1 >> myint2 >> myint3;
Currently, this takes several minutes for large files (several GBs), which is annoying, as the algorithm itself runs in a few seconds. Which leads to my question:
First, profile the code to find out where exactly is the time wasted.
1. If it is wasted waiting for the I/O subsystem (e.g. you're mostly blocked in the read() system calls), then you could give your input stream a larger buffer, by calling filestream.rdbuf()->pubsetbuf(your_big_buffer, size) before doing any I/O (and even before opening the file, to be safe). Or you could try memory mapped file I/O at this time. Either way, if I/O is the bottleneck, your answer may be in the filesystem/hardware - perhaps that's just how long it takes to read that file, C++ notwithstanding.
2. If the time is wasted in burning the CPU and accessing memory while executing getline() and the subsequent operator>>'s, rework that part to reduce memory allocations and formatted I/O calls. Skip the intermediate getline and read from the buffer.
How would I go about profiling this? I have very little experience with profiling; I've only used the MVS profiler, which generally is only useful for top-level analysis [in my hands].