#include <iostream>
#include <fstream>
int main () {
std::ifstream is ("example.txt");
if (is) {
// get length of file:
is.seekg (0, is.end);
int length = is.tellg();
is.seekg (0, is.beg);
char * buffer = newchar [length];
std::cout << "Reading " << length << " characters... ";
// read data as a block:
is.read (buffer,length);
if (is)
std::cout << "all characters read successfully.";
else
std::cout << "error: only " << is.gcount() << " could be read";
is.close();
// ...buffer contains the entire file...
std::cout<<std::endl;
for(int i = 0; i < length; i++)
{
std::cout<<buffer[i];
}
delete[] buffer;
}
return 0;
}
Say i have a file with only a newline in it
My question is how is the new line character stored in buffer and why is the newline character considered as 2 characters and why only 1 of these 2 characters is extracted.
How did you create your example.txt? did you use Unicode characters? they are 2 bytes long. If so, bytes are inverted in the buffer, that means that the LSB is at left, and MSB at right. Thus, '\n' character is 0x000D, but in buffer is stored as 0x0D00, and you are using char*.
That's what I think is happening..
Um i just created example.txt in notepad on windows. I'm not sure but i think it is unicode characters.
If so, bytes are inverted in the buffer, that means that the LSB is at left, and MSB at right. Thus, '\n' character is 0x000D, but in buffer is stored as 0x0D00, and you are using char*.
I don't exactly understand what you are saying here could you elaborate a little please?. If i use is.read() to extract stuff from the file into the char pointer called buffer the bytes of the extracted stuff become inverted?. In anycase even if the newline character is 2 bytes long its still only "1" character right? so how come tellg() which returns the position of the current character in the input stream, returns 2 instead of 1
I don't exactly understand what you are saying here could you elaborate a little please?
Yes, sure. If we take a single character like 'H', its ASCII code is 72, but its unicode value is also 72, but ocuppying two bytes instead of one.
So you can see it as:
0(MSB) 72(LSB),
but in memory they are inverted, so you are going to see them as
72(LSB) 0(MSB).
I elaborated this little code for you to see how unicode characters are stored in memory.
On the other hand, if your file example.txt is not saved with unicode characters, I don't know what else could be happening... may be notepad adds extra characters?? don't know.
It is due to line endings. On most unix like systems, you may see the line ending being declared as LF (linefeed, '\n'). However, for some reason or another, on Windows they give 'extra flexibility' and by default their line endings are CR+LF (carriage return + line feed, "\r\n"). The only times you will ever notice, though, is really when you open your file in binary mode (which disallows Windows from doing things like that).
\r\n newlines will automatically be read as \n on Windows. This is useful because it makes it possible to handle text files the same way on Windows, Linux and everywhere else, despite the difference in how line endings are marked. If you don't want this behaviour open the file in binary mode.
std::ifstream is ("example.txt", std::ios::binary);
Thanks peter. So if i'm understanding right this is what happens when example.txt has only a newline: is.tellg() returns 2 as the default line ending is "\r\n" which is "2 characters". Then is.read(buffer , length) reads the "\r\n" as 1 character and there isint anymore characters but you tell it to read for 2 characters which is why you get