Speed up C++ code

I have a 243 gigabyte text file that I'm trying to break up into smaller pieces. Specifically, it's a textfile with a lot of numbers in it. I want to break the file into smaller pieces so that each smaller piece has 1328098 numbers in it. I tried this code which went at a decent speed:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <iostream>
#include <fstream>
#include <string.h>
#include <stdio.h>
#include <sstream>

using namespace std;

int main()
{
    int numbers = 0;
    int i = 1;
    string line;
    string file;
    string convert;
    ostringstream files;

    ifstream values ("../../matrix.txt");
    ofstream values2 ("row_0.txt");

        while(getline(values, line, '\n')){

            char * str;
            str = &line[0];
            char * pch;
            pch = strtok (str,", ");
            while (pch != NULL)
            {
                cout << numbers <<'\n';
                numbers = numbers + 1;
                values2 << pch <<", ";
                pch = strtok (NULL, ", ");
                if (numbers % 1328098 == 0){
                    file = "row_";
                    files << i;
                    convert = files.str();
                    file += convert;
                    file += ".txt";
                    values2.close();
                    ofstream values2 (file.c_str());
                    files.str("");
                    i = i + 1;
                }
            }
        }

    return 0;
}


The issue was that the code would copy the first 1328098 numbers to a text file called row_0 and then would continue counting the how many numbers passed through and would create the subsequent rows (i.e. row_1.txt, row_2.txt), but the code WOULD NOT output the numbers from the input file to the output file. So I changed the code to look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <iostream>
#include <fstream>
#include <string.h>
#include <stdio.h>
#include <sstream>

using namespace std;

int main()
{
    int numbers = 0;
    int i = 1;
    string line;
    string file = "row_0.txt";
    string convert;
    ostringstream files;

    ifstream values ("../../matrix.txt");
    ofstream values2 ("row_0.txt");

        while(getline(values, line, '\n')){

            char * str;
            str = &line[0];
            char * pch;
            pch = strtok (str,", ");
            while (pch != NULL)
            {
                values2.close();
                ofstream values2 (file.c_str(), ios::app);
                cout << numbers <<'\n';
                numbers = numbers + 1;
                values2 << pch <<", ";
                pch = strtok (NULL, ", ");
                if (numbers % 1328098 == 0){
                    file = "row_";
                    files << i;
                    convert = files.str();
                    file += convert;
                    file += ".txt";
                    files.str("");
                    i = i + 1;
                }
            }
        }

    return 0;
}


which goes too slow for what I need. Could someone advice me on how to get the first one working or how to even increase the speed of the first one?
Last edited on
When I tested this, the thing which slowed it down was this cout at line 29:
 
                cout << numbers <<'\n';


The problem is line 40
 
                   ofstream values2 (file.c_str());

Here values2 is a completely separate fstream, unrelated to the other fstream on line 39. When the closing brace at line 43 is reached, that new fstream goes out of scope and is destroyed at that point. Hence nothing can ever be written to it.

Instead of declaring a new object, just re-open the existing stream.

My version of more or less the same code looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#include <iostream>
#include <fstream>
#include <cstring>
#include <stdio.h>
#include <sstream>

using namespace std;

const int block = 1328098;

string filename()
{
    static int count = 0;
    ostringstream oss;
    oss << "row_" << count++ << ".txt";
    cout << oss.str() << endl; // optional, for debugging
    return oss.str();   
}

int main()
{
    int numbers = 0;
    string line;
    ifstream values_in("../../matrix.txt");
    ofstream values_out;
    values_out.open(filename().c_str());

    while (getline(values_in, line, '\n'))
    {

        char * str;
        str = &line[0];
        char * pch;
        pch = strtok (str,", ");
        while (pch != NULL)
        {
            //cout << numbers <<'\n';
            numbers = numbers + 1;
            values_out << pch <<", ";
            pch = strtok (NULL, ", ");
            
            if (numbers % block == 0)
            {
                values_out.close();
                values_out.open(filename().c_str());
            }
        }
    }
    values_out.close();

    return 0;
}


When I've run into similar problems in the past, I've found that increasing the size of the write buffer has helped. To do this in C++, I think you use a std::filebuf instead of ofstream, and call pubsetbuf() to assign a large buffer. As a starting point, try 1MB.
Thank you both so much! That helped a ton. Now I'm understanding input/output streams much better! Although, I'm still not completely understanding why the numbers were not being outputted to the file specified in the first code. Was it because I neglected to reopen values2?
Was it because I neglected to reopen values2?
Yes.
Topic archived. No new replies allowed.