Currently I'm having problems reading input files. The problem I'm encountering is that I actually want to read the newline character instead of using it as a delimiter.
After doing some testing it looks like std::getline, which I'm using with an ifstream, reads the newline and puts it in the string as follows: "\\n ". Does anyone know why this is happening? And more importantly how I should read the newline so that it still works like a newline in the resulting string.
I think the problem is pretty self explanatory, if you'd like to see some code containing the problem please say so, then I'll post my code.
std::ifstream is(filename.c_str(), std::ios::in);
intermediate_mapper_format::source s;
int line_number = 1;
while(!is.eof()) {
//Get the tag name, which is first part before the separator
std::string name;
std::getline(is, name, ',');
boost::algorithm::trim(name);
//Get the value
std::string value;
char c = is.peek();
//Skip possible white space
while(c == ' ') {
is.ignore(1);
c = is.peek();
}
if (c == '\'' || c == '"') {
//If it starts with ' or " then we should also copy \n characters that are within the parenthesis pair
std::string partial_value;
is.ignore(1);
bool done = false;
do {
std::getline(is, partial_value, c);
//Check if we're done or if the '/" has another occurence in the value
if ('\n' == is.peek()) {
is.ignore(1);
value += partial_value;
done = true;
} else {
value += partial_value;
value += c;
}
} while (!done);
} else {
std::getline(is, value);
}
//Add the name and value to the source format
intermediate_mapper_format::source::iterator i = s.find(name);
if (i != s.end()) {
intermediate_mapper_format::tag_values tvs = i->second;
tvs.push_back(std::make_pair(line_number, value));
} else {
intermediate_mapper_format::tag_values tvs;
tvs.push_back(std::make_pair(line_number, value));
s.insert(std::make_pair(name, tvs));
}
line_number++;
}
intermediate_mapper_format::name_description nd;
m_imf = intermediate_mapper_format_ptr(new intermediate_mapper_format(s, nd));
is.close();
Here's the code. To give a bit more background information, it is used to parse CSV files which contain per line a name and value. However if the value starts with ' or " it is possible to have large pieces of text containing newlines. I've tested this with a CSV with the following lines:
1 2 3 4 5
naam, "individual model 1"
omschrijving,'raar
vreemd's
lelijke omschrijving'
puntjes,101
I've got a work around where I replace "\\n " with "\n" which works, but is ofcourse an ugly solution.
'\n' in memory is represented by '\\n' in C code, because by default, when the Compiler will see \ in a string, it will not handle it as a normal char, instead it will look at the next character to find a special haractert code (like \n for example :-P ). That means that the '\' char has to be written '\\' in a C code string, giving '\\n' in C code for '\n' in memory
'\n' in memory is represented by '\\n' in C code, because by default, when the Compiler will see \ in a string, it will not handle it as a normal char, instead it will look at the next character to find a special haractert code (like \n for example :-P ). That means that the '\' char has to be written '\\' in a C code string, giving '\\n' in C code for '\n' in memory
That would be the case if I was reading my input from memory. I'm reading from a file, so this should not be the case.
Ok, so if I understand this correctly. Whenever you read from a file and you read the newline and don't discard it, you never end up with an actual newline character and what I want to do is impossible.
On a sidenote, I understand why this would give me a "\\n" where there should be a newline. But why does it also add another space? I'm ending up with "\\n " where a newline is encountered in the input file.
You still have the solution of reading your file in binary mode, you'll get a unique string containing the whole file, with the new lines (encoded as CR or CR+LF depending on your system), then manually looking for the lines
Most people don't even use delimiters. They'll read the file entirely into memory and parse it using some algorithm (which may use delimiters), turning it into something useful...