splitting std string with delim

Hi,

Curious problem here with splitting a std::string at a desired delimeter whereby the split seems to be happening at the wrong place.
I am using the suggested code from the C++ Cookbook (Recipe 4.6) as follows (with my own var names):

(EDIT: the following code contains an error, the correct code is in a post further down the page)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
void MyClass::splitQueries(const std::string sql, char delim, std::vector<std::string> &allQueries)
{
    string::size_type i = 0;
    string::size_type j = sql.find(delim);

    while (j != string::npos) {
        string ss = sql.substr(i, j-1);
        allQueries.push_back(ss);

        i = ++j;
        j = sql.find(delim, j);

        if (j == string::npos) {
            allQueries.push_back(sql.substr(i, sql.length()));
        }
    }


So I send it a string to be split at a ';' with content such as:
 
std::string sql = "select * from table1; select * from table2;";


and naturally enough j would initially be 20 but as soon as the while loop starts and the substring, ss, is made, the value of ss is actually
 
select * from table

where an extra character before the ';' is also getting chopped.

How is this happening?
C.
Last edited on
string ss = sql.substr(i, j-1);

The -1 here is telling it to chop 1 character.

'find' returns an index. substr's 2nd parameter is a count. If you use an index as a count, it will take up to but not including that index.


Walk it through:

1
2
3
4
5
6
7
string example = "abc;def";
int f = example.find(';');  // f = 3
string sub = example.substr(0, f);  // get 3 chars, starting from index 0:  "abc".
  // Note this doesn't include the semicolon

// what you're doing:
string badsub = example.substr(0, f-1);  // get 2 chars:  "ab" 
FYI, a much simpler method would be to use std::getline() with a custom delimeter. I tested in ideone.com for your particular case and it works OK.

http://ideone.com/wvEtJ

I like to use wide chars and I program for Windows only, so I defined CharType to be wchar_t. I don't know if this fits your needs, so change the CharType definition to char or maybe a 32-bit data type if that's what you need/want.
Thanks for your helpful explanation and example.
So why, I wonder, does the original example from the book minus 1 from the count. Surely this would never be what you want?

So after making the small edit to remove the minus 1, passing in a string that would be split in three places, say,
 
abc;def;ghi;

returns three substrings as expected but those values are
1
2
3
abc
def;gh
ghi;

Is this just a bad example out of the book, say a misprint?
I think I'm going to have to rewrite the function myself.
The book sample is probably OK. You are misreading: It is not a '1', it is a 'i'. It is the length of the substring, which is the position of the found delimeter (j) minus the start of the substring (i): j - i
Last edited on
Thanks to all for the replies.

webHose, that looks like a really useful solution, I'll check it out in more detail.
Also, at a second look, you are right that I misread the correct code of j-i as j-1.
So for future readers, the correct code is as follows:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
void MyClass::splitQueries(const std::string sql, char delim, std::vector<std::string> &allQueries)
{
    string::size_type i = 0;
    string::size_type j = sql.find(delim);

    while (j != string::npos) {
        string ss = sql.substr(i, j-i); // j-i NOT j-1
        allQueries.push_back(ss);

        i = ++j;
        j = sql.find(delim, j);

        if (j == string::npos) {
            allQueries.push_back(sql.substr(i, sql.length()));
        }
}

I should also say that I wasn't suggesting the book was bad as it has been really useful to me so far and since the author has written a book on C++ he knows infinitely more about the topic than I do!
This is one more example of just how important variable names are. If you/the book have used 'subStart' and 'subEnd' instead of 'i' and 'j', you would have never been misled. Lesson: Write code as if it were to be read by others.
Yes, good lesson learned there. Thanks for the tip!
Topic archived. No new replies allowed.