Questions about the use of fstream::seekg

Hello, everyone, I was doing an exercise to write the relative starting position of each line to the end of file "test.txt". The code is as below, and it can be compiled and run but the result is weired to me. Could someone explain why leads to this result ?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <iostream>
#include <fstream>

using namespace std;

int main()
{
	fstream inOut("test.txt", fstream::ate | fstream::in | fstream::out);
	if (!inOut) {
		cerr << "Unable to open file!" << endl;
		return EXIT_FAILURE;
	}

	auto end_mark = inOut.tellg();
	inOut.seekg(0, fstream::beg);
	size_t cnt = 0;
	string line;

	while (inOut && inOut.tellg() != end_mark && getline(inOut, line)) {
		cnt += line.size() + 1;
		auto mark = inOut.tellg();
		inOut.seekp(0, fstream::end);
		inOut << cnt;
		if (mark != end_mark)
			inOut << " ";
		inOut.seekg(mark);
	}
	inOut.seekp(0, fstream::end);
	inOut << "\n";
	return 0;
}

The test.txt file is below and the last line is the output written to this file. (The expected last line should be: 5 9 12 14.)
abcd
efg
hi
j
5 6 7 8
Last edited on
Hello maple,

Lets have a better look at your code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include <iostream>
#include <string>  // <--- Added. For "std::getline".

#include <fstream>

using namespace std;

int main()
{
    fstream inOut("test.txt", fstream::ate | fstream::in | fstream::out);

    if (!inOut)
    {
        cerr << "Unable to open file!" << endl;

        return EXIT_FAILURE;
    }

    auto end_mark = inOut.tellg();

    inOut.seekg(0, fstream::beg);

    //size_t cnt = 0;
    string line;

    while (inOut && inOut.tellg() != end_mark && getline(inOut, line))
    {
        int cnt{};  // <--- Moved.

        cnt += line.size() + 1;

        auto mark = inOut.tellg();

        inOut.seekp(0, fstream::end);

        inOut << cnt;

        if (mark != end_mark)
            inOut << " ";

        inOut.seekg(mark);
    }

    inOut.seekp(0, fstream::end);

    inOut << "\n";

    return 0;
}

Your use of "seekg" and "seekp" work, but you are not using them correctly.

First question I have is on line 23 why do you feel the need to define this variable as an "unsigned long"? https://en.cppreference.com/w/c/types/size_t Do not make the mistake that I di thinking the "size_t" is always an "unsigned int". The only guarantee you MAY have is that it is "unsigned".

By moving the definition of "cnt" inside the while loop it is created each time the loop is entered and destroyed each time the loop ends. That alone does not solve your problem because "cnt" is the size of the string + 1 and not the position in the file. So all you are printing to the file is the size of each string in each line excluding the added line at the bottom.

When I ran the program I got this output:

5 4 3 2



And you say the numbers should be:

5 9 12 14



And I am thinking the numbers should be:

0 5 9 12


Or just " 5 9 12".

After the while condition reads the first line of the file the "input" file pointer would be at the first character of the next line. This is what you need. As to whether or not you decide to make (0)zero your first number is up to you.

Andy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main()
{
	fstream inOut("test.txt", fstream::ate | fstream::in | fstream::out);

	if (!inOut) {
		cerr << "Unable to open file!\n";
		return EXIT_FAILURE;
	}

	const auto end_mark {inOut.tellg()};
	size_t cnt {};
	string line;

	inOut.seekg(0, fstream::beg);

	while (inOut && (inOut.tellg() != end_mark) && getline(inOut, line)) {
		cnt += line.size() + 1;

		const auto mark {inOut.tellg()};

		inOut.seekp(0, fstream::end);
		inOut << cnt;

		if (mark != end_mark)
			inOut << " ";

		inOut.seekg(mark);
	}

	inOut.seekp(0, fstream::end);
	inOut << '\n';
}


Given the test.txt file as above, this changes the file to the expected:


abcd
efg
hi
j
5 9 12 14


and if run again, it gives:


abcd
efg
hi
j
5 9 12 14
5 9 12 14 24


as expected.
Last edited on
First thanks for your replies, Handy Andy and seeplus.

@Handy Andy, I have to admitted that I am not very clear about when to choose size_t and other unsigned integer type... thanks for pointing it out.

As for this code running, after running on Codeblocks IDE(including gcc), and I get the following(which is wrong)

abcd
efg
hi
j
5678


while for you when you(Handy Andy and seeplus) run this code, guess the IDE you used is VS IDE(which I am not using...), you two seems get the correct result conforming to the code.

Then I copied the .cpp source file and create the same "test.txt" file into ubuntu on VMware, run again and get a different result.

abcd
efg
hi
j

5 9 12 14 


everything is as expected except the additional blank line before the last line.

So I guess there is something wrong on my Codeblocks IDE to use seekg and seekp, (maybe undefined behavior...)
For the blank line on the second result tested on Ubuntu, I do not know what happened, Is it just the behavior of the compiler or system ?
The problem isn't your IDE - it's just a fancy editor.
You could have written the same code in notepad and compiled from the command line, and you'd still have the same problem.

It isn't your compiler either, or for that matter your OS.
It all boils down to the standard library implementation that your program is linked with, and your choice of text file format.

For example, I took seeplus's code, compiled it with C::B on Windows.
If "test.txt" is a normal Windows file with CRLF line endings, then I get the results you first posted.
If "test.txt" is a Unix file (which is dead easy to achieve using notepad++), then I get the results you expect.

Same everything except for the choice of line endings on the text file, and completely different answers.

The basic problem is seek/tell is very poorly specified for text files (it's probably UB).
Especially when those text files undergo CRLF <-> LF transformations on input and output.

You could try opening the text file in binary mode -> https://en.cppreference.com/w/cpp/io/ios_base/openmode
But you'd end up with a file containing both CRLF and LF line endings, since it effectively turns off all the text mode translations (specifically your inOut << '\n';).

Linux doesn't have a problem because there is no distinction between text or binary, as lines in text files just end with \n.

Mixing up (seek/tell)g with (seek/tell)p probably isn't helping matters either.

@salem c, thanks a lot and I totally agree with your explanation.

Indeed, there is no difference of running the code with Codeblocks or from the command line on windows.

Later I changed the open mode, adding fstream::binary, ran again and I got almost correct result, (the only difference is CRLF as you said above)

abcd
efg
hi
j
6 11 15 18
6 11 15 18 29


Here, CRLF is "\r\nā€œļ¼Œ ā€˜\r' is the extra character of each line which explains the small difference between the result and expected.
Then ran again(in binary mode) to see the effect of inOut << '\n'.
From the value 29, we can know that the last second line ending with just '\n' instead of "\r\n".
Topic archived. No new replies allowed.