boost::split does not work as expected

Aug 22, 2009 at 4:00pm
http://www.boost.org/doc/libs/1_39_0/doc/html/string_algo/usage.html#id3408774

There is an example on using boost::split :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <vector>
#include <string>
#include <boost/algorithm/string.hpp>

int main() {

    std::string str1("hello abc-*-ABC-*-aBc goodbye");

    typedef std::vector< std::string > split_vector_type;

    split_vector_type SplitVec; // #2: Search for tokens
    boost::split( SplitVec, str1, boost::is_any_of("-*") ); // SplitVec == { "hello abc","ABC","aBc goodbye" }

    for ( unsigned i = 0; i < SplitVec.size(); ++i ) {
        std::cout << "\"" << SplitVec[i] << "\"\n";
    }

    return 0;
}


It says in the comment that SplitVec contains { "hello abc", "ABC", "aBc goodbye" }, but when you run the code the following is printed:
1
2
3
4
5
6
7
"hello abc"
""
""
"ABC"
""
""
"aBc goodbye"


?? Shouldn't it discard those empty tokens?
Aug 22, 2009 at 7:50pm
It could be that is_any_of() will check for any of the tokens, and split it like this:

[hello abc]-[]*[]-[ABC]-[]*[]-[aBc goodbye]

[] are the strings, others are the "splitting" tokens.
Aug 22, 2009 at 9:23pm
I understand that, I just wanted to show the difference between the documentation and the implementation.
Do you think is it possible to achieve the expected behavior with some sort of similar call?
Aug 22, 2009 at 11:21pm
Ahh, I see now...It does seem like their documentation is incorrect in this case...

The only way I can see of doing it is: (although it's not really a "similar" call like you wanted)

1
2
3
for (unsigned int i = 0; i < SplitVec.size(); ++i ) {
    if(!SplitVec[i].empty()) std::cout << "\"" << SplitVec[i] << "\"\n";
}

Aug 24, 2009 at 12:20pm
1
2
splitVec.erase( std::remove_if( splitVec.begin(), splitVec.end(), 
    boost::bind( &std::string::empty, _1 ) ), splitVec.end() );


will remove the zero-length entries from the vector after the call to boost::split.
Topic archived. No new replies allowed.