Getting Tokens

Aug 11, 2010 at 10:23pm
I'm trying to write a program that will receive a string a user inputs, break the string up into "words", and then add these words to a vector. The problem is, it crashes when it tries to add the first word to the first vector slot.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
void getTokens(string s, vector<string> &tkn) {
  char *pch;
  int slen = s.length();
  char chr[slen];
  for (int i = 0; i < slen; i++) {
    chr[i] = s[i];
  }
  pch = strtok(chr," ,.-");
  int x = 0;
  while (pch != NULL) {
    string schr;
    pch = strtok(NULL, " ,.-");
    x++;
    for (int i = 0; i < (int)strlen(pch); i++) {
      schr[i] = pch[i];
    }
    tkn[x] = schr;
  }
}


I've tried googling for a strtok() tutorial, but I can't really seem to find one that goes in detail of how the function works. Should I change the vector to a char type? I do need to compare the tokens (such as if the first token is "open" and the second is "door", dothis).

Any hints on either the tutorial or just the code (although I prefer a tutorial, as it'll give me a chance to work out the core problem myself)?
Aug 11, 2010 at 11:24pm
Use tkn.push_back(schr);
The other assumes memory is already allocated in the vector, just like an array, and since you never resized the vector it probably isn't.
Aug 12, 2010 at 1:44am
Also, char chr[slen]; is illegal C++.

Instead of using those C functions, use the string functions instead.
Aug 12, 2010 at 10:06pm
@Zhuge

Yeah, I completely forgot about that little thing about vectors. But there's still an issue: the while-loop is not terminating after the last word, and is trying to continue checking for tokens even though there are none. Afterwards, it crashes.

@firedraco

Which string functions are you referring to? c_str()? Or something else?
Last edited on Aug 12, 2010 at 10:07pm
Aug 14, 2010 at 1:59pm
strtok().... maybe
Aug 14, 2010 at 9:11pm
Aug 14, 2010 at 10:07pm
Depending what your tokens separators are you might consider replacing all token separators with a space and then using a std::istringstream to read the space separated words:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <string>
#include <algorithm>
#include <sstream>
#include <cstdlib> // for ispunct() function

// replace all punctuation marks with space
std::replace_if(s.begin(), s.end(), std::ptr_fun(ispunct), ' ');

// turn the string into an input stream
std::istringstream iss(s);

//read one space separated word from the input stream
std::string word;
iss >> word; // if successful, word contains one token 


Last edited on Aug 14, 2010 at 10:17pm
Aug 15, 2010 at 2:11pm
There's a problem with ptr_fun(ispunct()):

ptr_fun() cannot have an integer value as an argument. And what am I supposed to put in ispunct()? Does this bit of code go into the current function or do I create a new one all together?

The rest makes sense to me, if my idea of putting this code as statements in a while loop is correct.
Aug 15, 2010 at 2:39pm
Sorry, I made a mistake. ispunct() function comes from #include <cctype>. If you include that it should work.
Aug 15, 2010 at 2:46pm
Sorry to disappoint, but it still doesn't work. Read my previous post. ispunct() needs an argument.

There really should be a tokens tutorial. :s
Aug 15, 2010 at 3:23pm
Can you post your code?
Aug 15, 2010 at 3:25pm
Ah, I see it. You put this ptr_fun(ispunct()) when it should be this ptr_fun(ispunct).

Notice that there are no () on the ispunct. That is because you are not calling the function, but sending a pointer to the function to the ptr_fun() function.
Last edited on Aug 15, 2010 at 3:26pm
Aug 15, 2010 at 4:22pm
If I do that, my compiler gives me this error:

error: no matching function for call to 'ptr_fun(<unknown type>)'

It doesn't know what ispunct is. I've already included the header files needed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <cstring>
#include <algorithm>
#include <sstream>
#include <cctype>

// includes general functions
// for conversions, comparisons, ect.
#include <zsstd>

using namespace std;

void getTokens(string s, vector<string> &tkn) {
  // while loop will go here, methinks
  // replace all punctuation marks with space
  replace_if(s.begin(), s.end(), std::ptr_fun(ispunct), ' ');

  // turn the string into an input stream
  istringstream iss(s);

  //read one space separated word from the input stream
  string word;
  iss >> word; // if successful, word contains one token
  tkn.push_back(word);
}
...
Aug 15, 2010 at 4:34pm
The cctype functions are often overloaded in funny ways, so you need to be explicit.

15
16
17
18
19
20
  replace_if(
    s.begin(),
    s.end(),
    std::ptr_fun <int, int> ( std::ispunct ),
    ' '
    );

You will of course need <algorithm> and <cctype> and <functional> for these...

Hope this helps.
Aug 15, 2010 at 5:04pm
1
2
3
  // while loop will go here, methinks
  // replace all punctuation marks with space
  replace_if(s.begin(), s.end(), std::ptr_fun<int,int>(ispunct), ' ');


Actually you don't need a loop there because that function will replace all the punctuation. The first parameter is the beginning of the string(s.begin()) and the second parameter is the end of the string (s.end()) so it will replace everything between those two markers (iterators).

For every character between begin() and end() it will call the function ispunct() to decide if it is a punctuation mark or not. If it is it will replace it with a space ' ' .
Last edited on Aug 15, 2010 at 5:05pm
Aug 15, 2010 at 9:32pm
Duoas, you are a saint. The only thing left to do is to get each word inside the vector.

As for Galik and the rest, thanks for the help. Tip of the hat to you guys.

Guess it'll be wise to figure out how this worked.
Topic archived. No new replies allowed.