Tellg() fails with .csv on Windows (in Linux works great)

Hi everybody,

I'm creating a program which reads from a csv file and stores each column in an array of structs. Anyway, in Linux with Kdevelop everything works fine. My problem is that I want to use it also in windows. I've compiled the code with code::blocks in windows, and at first glance there's no problem, except for de text encoding. As I'm trying to create a german training programm, I must assure input and output cp works correctly.

First, I use SetConsoleCP (1252) and SetConsoleOutputCP (1252), everything great.

The problem come here:

First, I saved the csv file in Linux with windows-1252 encoding. Works perfectly with windows, and my program is completely usable, except that when I open the .csv file with notepad, it appears without each newline.

After some searching, I notice the \r\n (windows) vrs \n (linux) issue. I understand that the newline sequence is still there, but notepad needs \r\n in order to read it properly (wordpad opens it correctly, though).

To get the file completely "windowsed", I open the file with libreoffice, and I save it again. Now with notepad each line shows correctly, but for my surprise, My program doesn't run properly anymore!

It seems that the program works fine with only \n, but it fails with \r\n.

to check where is the mistake, I've put a simple loop at the beginning of the program:

1
2
3
4
5
6
7
8
9
10
11
fstream diccionari ("diccionari.csv");

string linia;

for (i=0; i<n; i++) // n has been previously assigned with the total amount of entries
    {
        ID [i] = diccionari.tellg (); //ID [] is an array with each newline position
        getline (diccionari, linia); 
        cout <<linia <<endl;  //it outputs the first line correctly, but not the others. 
    }
    cout <<ID[1]; // Just to check, but it doesn't point correctly, actually it points to where the getline WRONGLY starts getting the second line 


As I say in the comment, it seems like the tellg pointer doesn't point to the new line, but elsewhere. (actually, it points to the third line, at the same point where the second line would finish, and of course there are lines empty, as they are shorter than the previous --> this makes me think with some issue with carriage return...?)

I'm really stuck, and I repeat, it works perfectly with the .csv encoded form linux OR FORM NOTEPAD (ANSI) without newlines showing. The problem is when I try to have the file with newlines showing in notepad.

Thanks!
Last edited on
Ah, the infamous \r\n thing.

Perhaps try specifying the new line character for delimiting your lines with this:
getline(diccionario, linia, '\n');

P.S. Notepad really is a terrible program. Using notepad++, excel, or anything else might be a more fair way of testing.
Last edited on
Thanks Stewbond for your answer. I've already tried this solution, but didn't work. And yes, I'm aware that notepad is the worst thing ever, but I can't expect others to install notepad++ in order to modify the .csv. furthermore, I want to have the opcion to create your own .csv file to practice on your own vocabulary, etc. thus whenever the windows user creates his own .csv it won't work...

Anyway, after A LOT of checking, I've realised the problem is assigning the tellg to the dynamic array ID[i].

Anybody knows why?

This works:
1
2
3
4
5
6
for (i=0; i<n; i++) // n has been previously assigned with the total amount of entries
    {
        //ID [i] = diccionari.tellg (); //ID [] is an array with each newline position
        getline (diccionari, linia); 
        cout <<linia <<endl;  //it outputs the first line correctly, but not the others. 
    }


when I uncomment the ID [i] assignation, it becomes crazy and assigns wrong tellg positions (it should point to the start of each line), thus making getline start in the wrong position. (I really need this ID [] assignation!)

Seems like the assignment makes tellg to change, somehow...

And remember that it must has something to do with \r\n, as with \n only files works perfectly.
Last edited on
More checkings....

the problem is not assigning, but simply using tellg.

Not even printing it works:

1
2
3
4
5
6
for (i=0; i<n; i++) // n has been previously assigned with the total amount of entries
    {
        cout << diccionari.tellg ();
        getline (diccionari, linia); 
        cout <<linia <<endl;  //it outputs the first line correctly, but not the others. 
    }


Whenever I use tellg, it becomes crazy, pointing the wrong position and therefore making getline work badly.

Please tell me it isn't a bug, it isn't a bug....
I've found this (six years old?¿) bug report...:

http://cygwin.com/ml/cygwin/2006-06/msg00232.html

Anyone knows if there is a solution?
In here they say tellg seekp are broken unless you open the file in binary mode. Well, it's not completely true, as it's broken only with windows...

http://www.cplusplus.com/forum/general/45157/

Is this true (really?)?
Are you sure the file is opened correctly? Is the ID array large enough to hold n elements?

cout <<ID[1]; // Just to check, but it doesn't point correctly, actually it points to where the getline starts getting the second line

This sounds very correct to me. ID[0] will be the first position in the file. Then you call getline that will read the whole line. ID[1] will then be the first position on the next line which is "where the getline starts getting the second line".
Thanks for participating Peter87. Sorry, I explained this wrong in the first post, what I mean is that ID[1] doesn't point to the start of the second line as it should do, but to where getline starts "getting" (wrongly) the line (first post edited).

Nevertheless, As I pointed out earlier, even if you forget about the array, and just try to print each tellg (), it starts to malfunction. Whenever you use tellg(), it starts failing. However, If you don't use tellg() at all, it does work as expected.

Let's take this example code:

Considering that you have a testing file (test_file.csv) with some rows, and two columns (that's what I'm using) filled:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <iostream>
#include <fstream>
#include <string.h>

using namespace std;

int main(int argc, char *argv []) {

fstream csv  ("test_file.csv");
string line;
int i, n;

     n=-1;      //.csv file always end with an empty line, that's why n=-1
    
while (! csv.eof ())
    {
        n++;
        getline (csv, line);
    }

csv.clear();
csv.seekg (0);
csv.seekp (0);

for (i=0; i<n; i++)
    {
        cout << csv.tellg ();
        getline (csv, line); 
        cout <<line <<endl;  //it outputs the first line correctly, but not the others. 
    }

return 0;
}


This code in Linux (Kdevelop--Cmake) , with a linux .csv, works as expected.

In windows (Code::Blocks--GNU GCC) whit a windows .csv, prints out the lines wrongly, as well as the number that tellg() should point to. However when you comment the cout <<csv.tellg(); it does work as expected.

Also, In windows with a Linux .csv, it does work as expected!

Summing up, it seems like a problem with tellg() and \r\n newlines... ¿? My confusion couldn't be bigger...
Last edited on
it seems like a problem with tellg() and \r\n newlines...
Yes there is. Opening the file in text mode means modifing the line endings. That's the default behavior.

You need to open it as binary:
http://www.cplusplus.com/reference/iostream/ifstream/open/
However, if you don't use tellg(), it does work as expected, and you are still opening the file in text mode...

And, if the only solution is opening in binary, could I still store each line in strings, and modify them, etc..? (never used binary mode)
Last edited on
Yes, read/write work on the translated while tell/seek on the real content.

and getline() simply ignores the end of line.

It's wise to use the binary mode since the conversion takes time when it comes to hugh files.

I didn't check if getline() works with binary mode though
Thanks, I'll learn how to use binary mode.

However, I'm still confused, why using tellg() messes everything up.
The windows line ending \r\n is translated to (unix) \n hence the translated and the untranslated content will be very different. As you observed when you use unix endings there're no problems.

But in text mode there might be other translations too.
thanks coder777!

I've checked and, as you pointed out, when using binary mode i can still use getline, the only difference is that it won't translate \r\n to \n, thus I should manually erase the \r.

Summing up:
fstream csv ("test_file.csv", ios::in | ios::out | ios::binary);
solves my problem.

Thanks!
Last edited on
Topic archived. No new replies allowed.