Curious problem here with splitting a std::string at a desired delimeter whereby the split seems to be happening at the wrong place.
I am using the suggested code from the C++ Cookbook (Recipe 4.6) as follows (with my own var names):
(EDIT: the following code contains an error, the correct code is in a post further down the page)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
void MyClass::splitQueries(const std::string sql, char delim, std::vector<std::string> &allQueries)
{
string::size_type i = 0;
string::size_type j = sql.find(delim);
while (j != string::npos) {
string ss = sql.substr(i, j-1);
allQueries.push_back(ss);
i = ++j;
j = sql.find(delim, j);
if (j == string::npos) {
allQueries.push_back(sql.substr(i, sql.length()));
}
}
So I send it a string to be split at a ';' with content such as:
std::string sql = "select * from table1; select * from table2;";
and naturally enough j would initially be 20 but as soon as the while loop starts and the substring, ss, is made, the value of ss is actually
select * from table
where an extra character before the ';' is also getting chopped.
'find' returns an index. substr's 2nd parameter is a count. If you use an index as a count, it will take up to but not including that index.
Walk it through:
1 2 3 4 5 6 7
string example = "abc;def";
int f = example.find(';'); // f = 3
string sub = example.substr(0, f); // get 3 chars, starting from index 0: "abc".
// Note this doesn't include the semicolon
// what you're doing:
string badsub = example.substr(0, f-1); // get 2 chars: "ab"
I like to use wide chars and I program for Windows only, so I defined CharType to be wchar_t. I don't know if this fits your needs, so change the CharType definition to char or maybe a 32-bit data type if that's what you need/want.
Thanks for your helpful explanation and example.
So why, I wonder, does the original example from the book minus 1 from the count. Surely this would never be what you want?
So after making the small edit to remove the minus 1, passing in a string that would be split in three places, say,
abc;def;ghi;
returns three substrings as expected but those values are
1 2 3
abc
def;gh
ghi;
Is this just a bad example out of the book, say a misprint?
I think I'm going to have to rewrite the function myself.
The book sample is probably OK. You are misreading: It is not a '1', it is a 'i'. It is the length of the substring, which is the position of the found delimeter (j) minus the start of the substring (i): j - i
webHose, that looks like a really useful solution, I'll check it out in more detail.
Also, at a second look, you are right that I misread the correct code of j-i as j-1.
So for future readers, the correct code is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
void MyClass::splitQueries(const std::string sql, char delim, std::vector<std::string> &allQueries)
{
string::size_type i = 0;
string::size_type j = sql.find(delim);
while (j != string::npos) {
string ss = sql.substr(i, j-i); // j-i NOT j-1
allQueries.push_back(ss);
i = ++j;
j = sql.find(delim, j);
if (j == string::npos) {
allQueries.push_back(sql.substr(i, sql.length()));
}
}
I should also say that I wasn't suggesting the book was bad as it has been really useful to me so far and since the author has written a book on C++ he knows infinitely more about the topic than I do!
This is one more example of just how important variable names are. If you/the book have used 'subStart' and 'subEnd' instead of 'i' and 'j', you would have never been misled. Lesson: Write code as if it were to be read by others.