Weird Strtok Issue

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

int search(char* keyw)
{
	char* 				tkn;					
	int 				tokCnt=0;						    unsigned char* 		result;
	char 				delims[] = " ,.:;_\n\r\t*-=()";

		/* Tokenizes word parameter to search */
	//tkn= strtok( kw, ",");	
	tkn=strtok(kw,"' ' ,.:;_\n\r\t*-=()");
	while (tkn != NULL) {					// tokenizes kw and adds tokens to vector kWords
    		printf("tkn %s: \n", tkn );
    		kWords.push_back(tkn);	
    		//tkn=strtok (NULL, ",");
    		tkn=strtok(NULL, "' ' ,.:;_\n\r\t*-=()");
    		tokCnt++;
  	}		


When I input "./client --search hello sick papa dont " all in the terminal it doesn't print anything, but because of code I haven't shown hello is read.
Output:
./client --search hello sick papa dont
tkn: 

Last edited on
C++:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <iostream>
#include <string>
#include <vector>
#include <sstream>
#include <iterator>

std::vector<std::string> tokenize( std::string keyw )
{
    const std::string delims = ",.:;_\n\r\t*-=()" ;

    // replace each delimiter with a space
    for( char& c : keyw ) if( delims.find(c) != std::string::npos ) c = ' ' ;

    // construct an input stream which reads from the string
    std::istringstream stm(keyw) ;

    // read whitespace seperated tokens from the stream into a vector and return it
    return { std::istream_iterator<std::string>(stm),
             std::istream_iterator<std::string>() } ;
}

int main()
{
    for( const auto& s : tokenize( "./client --search hello sick papa dont " ) )
        std::cout << s << '\n' ;
}

http://ideone.com/05pee2
Thank you.
Is there something less 'heavy'?

Your solution uses vectors.
> Your solution uses vectors.

What is wrong about using vectors?

And what was this? kWords.push_back(tkn); Isn't kWords a sequence container?
Yeah, but that vector will be used for very important things. It is necessary to deal with a lot of information.

There is nothing wrong with them in general. But, I am looking for something less heavy resource wise for this "trivial" tokenization. I am looking for something light weight that would get the job done.

Thank you for the help so far, don't get me wrong.
> I am looking for something less heavy resource wise for this "trivial" tokenization.
> I am looking for something light weight that would get the job done.

If a one-time scan of the strings for tokens is all that is needed, and the tokens need not be extracted and stored in a sequence container for later use, it is hard to beat boost::tokenizer in either time or space.
http://www.boost.org/doc/libs/1_53_0/libs/tokenizer/char_separator.htm

std::vector<std::string> is moveable and does not incur an 'unnecessary-copy' performance penalty.

If the parsed tokens need to be extracted and stored, using boost::split to parse the tokens into a vector of references (iterator_range) would be a high performance option.
(The references would be to positions in the original string; that string needs to be kept).
http://www.boost.org/doc/libs/1_53_0/doc/html/string_algo/usage.html#idp163440592


Last edited on
Thank you. I don't know much about C++, but you have given me good advice.
I will use your first option and not boost.

Thank you again.
Topic archived. No new replies allowed.