Optimizing I/O when writing text file

Jun 20, 2008 at 9:16am
Hello, I have been looking everywhere for some information on how to improve the output speed of large text files. Right now it takes ~3.25 minutes to write a formatted text file (~62MB). Commenting out all the outfile writes the calculations finish within seconds. My hard drive light hardly glows during this entire process, unlike when copying a file of similar size.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
//Program performs simple calculations on several vectors (holding integers).
//Output is tabulated and the inner loop will produce 3 lines for every
//iteration of the main loop.
//open ofstream outfile in text mode, etc.
float gdiv, fa, ft, fc, fg, fn;
float ns = mega.ftaxa;
//codons is usually a large number (~716000)
for(unsigned int l=0;l<codons;l++)
{
     for(int m=0;m<3;m++)
     {
          int n = l*3 + m;
          fa = Avec[n]/ns;
          ft = Tvec[n]/ns;
          fc = Cvec[n]/ns;
          fg = Gvec[n]/ns;
          fn = Nvec[n]/ns;
          gdiv = 1 - (fa*fa + ft*ft + fg*fg + fn*fn);
          outfile << (n + 1) << "\t"
                  << (m + 1) << "\t";
          if(gdiv > 0)
          { 
               outfile.precision(4);
          } else {
               outfile.precision(5);
          }
          outfile << showpoint << gdiv << noshowpoint;
          if(synon_vec[n] == -1)
          {
               outfile << "\t" << "*";
          } else {
               outfile << "\t" << synon_vec[n];
          }
          outfile << "\t" << Avec[n] << "\t" << Tvec[n]
                  << "\t" << Cvec[n] << "\t" << Gvec[n]
                  << "\t" << Nvec[n] << endl;
     }
}
//...close file, etc. 


Any help or insight would be very much appreciated. :)
Last edited on Jun 20, 2008 at 8:03pm
Jun 20, 2008 at 9:21pm
It's not something I have tried in C++, but my first thoughts would be to try building a whole line and then using a single outfile << theLine type statement to send it to the disk.
Jun 20, 2008 at 9:57pm
Agreed. The << operator actually has quite a bit of overhead.
The other thing to do is not flush (don't use endl) until you are finished with all output.

Hope this helps.
Jun 21, 2008 at 12:20am
Thank you both for your insight. Minimizing the frequency of the << operator has had significant effects on how fast the text file is outputted. The convenience of the << operator << is useful in providing formatted text output, however, this apparently comes with significant cost in the output time efficiency in large files.

By formatting the text output myself into a string (acting as a buffer), and then moving the contents of the string to the file using the << operator, the output time dropped significantly from ~3.25min to ~1min for a 62MB file. Changing the size of the buffer (in lines) did not significantly increase the output time of the file. In other words, by changing the amount of lines that the string holds before dumping its content into the file did not affect the rate.

To determine the processing bottleneck that was taking place, I decided to forego the line formatting and see how fast it would take to output the same amount of lines consisting of a constant string. The answer: 7 seconds. So the next bottleneck has become my own text formatting code. At least it is significantly more efficient than formatting text using the << operator!

So my next endeavor will be to figure out a way to optimize my own text-formatting code. A good lesson here: Use << for small files for ease of output and coding; format the text yourself for large files and control over efficiency.

Thanks again both of you. If I get my time down any further I will try to finish up the thread with my code. As always comments are welcome. Thx :)
Topic archived. No new replies allowed.