However, I don't understand it. How does this work, as the result is stored in a char variable? How does the function know where a word ends (I understand the use of delimiters, but it stores the result in a char variable)?
One more thing... when I use the function myself, Visual Studio gives me the warning "warning C4996: 'strtok': This function or variable may be unsafe. Consider using strtok_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.".
Shouldn't I use this function or can I ignore this warning?
I still don't understand it (the pointer part). I just don't understand WHAT is stored in pch. I know it stores a pointer to the last token, but WHAT is the token and how/where is it saved? For example, why can you just print pch without dereferencing?
I tried several combinations, but I just can't get it to work because I don't really understand how the pch pointer works.
String streams can parse out words delimited by white space. In the following example, each word is extracted and stored as a string in a vector. Note that istringstream::operator>> can extract the words into a char array if you prefer, although that isn't what I would recommend. The vector, rather than an array, handles memory for you.
This is a C++ approach rather than using C functions.
#include <iostream>
#include <sstream>
#include <vector>
#include <algorithm>
#include <iterator>
usingnamespace std;
int main( int argc, char * args[] )
{
// original data
char test[] = "This is a test";
// string stream to parse out words delimied by white space
istringstream ss( test );
// new container of words
vector<string> words;
// parse out words using string stream and store words in container
string word;
while( ss >> word )
{
words.push_back( word );
}
// dump all words to stdout, delimied by spaces
copy( words.begin(), words.end(), ostream_iterator<string>( cout, " " ) );
return 0;
}
i thought you didn't know pointers, anyway here's the explanation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
what the function does is to return a pointer pointing to the starting point of the next token so in the first call to the function in this case the pointer points to char 'T' and marks the end of the token with a null character which is '\0'. so char ',' is replace with a '/0'. so actually the string looks actually looks like this at this point "- This\0 a sample string."
in the case of printing a char array or pointer usually streams prints each character until char '\0' is found. so printf will only print "This"
on the next call the string is actually "- This\0 a\0sample string."
but this time the function expect to receive a null pointer to tell the function to use the last string you are working on.
But the more I read about this, the more I get confused...
pch = strtok (NULL, " ,.-");
How does the compiler knows it should continue working on the string here, as you don't pass it as a parameter anymore? Also, where is the end of the token marked (I mean, where is the end stored)?
I've messed a bit with this code:
1 2 3 4
char test[] = "Dit is een test";
char* pointer = test;
cout << pointer << endl;
This actually works, but why? When I dereference pointer (cout << *pointer), it displays "D", but why?
1 2
char test[] = "Dit is een test";
char* pointer = &test;
Why doesn't this work?
I'm still getting confused with pointers. I did read the documentation about it here and on other websites/tutorials multiple times, but I still get stuck with them. If anyone has any good exercises or something, I would really appreciate them =)
that is because when you assign a char array to a pointer the pointer actually pointer to the 0 index of the array
and note when you declare char array like this char test[] = "Dit is een test";
a null character is automatically added at the end so you actually declared "Dit is een test\0" and the length of this is 16.
-hope that helps, i'll post again tomorrow it's 3am here. bye
char test[] = "Dit is een test";
char * pch;
pch = strtok (test," ");
pch = strtok (NULL," ");
cout << pch << endl;
cout << test << endl;
I really don't understand the output:
is
dit
First of all, how does the compiler know it's working with the char array on this line:
pch = strtok (NULL," ");
Why don't you need to specify the variable test anymore?
I understand the output of pch, but not the output of test. it outputs "dit", but why? (I expected it would also output "is")
Also, when I use the while loop, like in the example, and try to output pch afterwards, my program crashes. Why?
My goal is to display the LAST word in the string. So it should display "test", that is what I'm trying to do here.
- 'test' is an array name.
- an array name without its brakets is a pointer to the array (technically this isn't true, but for now just think of it that way)
Therefore:
1 2 3 4 5
pointer = test; // this makes 'pointer' point to the 'test' array
// that is the same as this:
pointer = &test[0]; // point to the first character in the 'test' array
When I dereference pointer (cout << *pointer), it displays "D", but why?
- pointer points to the 'test' string.
- The dereference operator (*) is an alternative way to use the braket operator ([])
1 2 3 4 5
cout << *pointer;
// is the same as
cout << pointer[0];
Why doesn't this work?
- an array name without brakets is a pointer to the array (again, technically not true, but bear with me)
- The & operator gets the address (pointer) to a variable
ie:
1 2 3 4
pointer = test; // works
pointer = &test; // doesn't work because 'test' is already a
// pointer, so you're getting a pointer to a pointer
// (note: technically not true, again -- I'm trying to keep this simple)
strok_s is for wide characters which is unicode, char is for ascii code. it's ok ignore the warning.
strtock_s() has nothing to do with Unicode. It's about... well, about something irrelevant that I don't feel like looking up. Possibly related to concurrency.
First of all, how does the compiler know it's working with the char array on this line:
strtok() maintains internal state that persists between calls.
Try this, for example:
1 2 3 4 5 6
int f(int a){
staticint b;
if (a>10)
return b=a;
return b+a;
}
it outputs "dit", but why?
strtok() modifies the char array to which its first parameter points. Specifically, it writes a '\0' after each token found.
Also, when I use the while loop, like in the example, and try to output pch afterwards, my program crashes. Why?
Because pch is pointing to NULL, as guaranteed by the while condition. Dereferencing null pointers is illegal.
My goal is to display the LAST word in the string.
You have to keep the last pointer obtained from strtok() in a different pointer, so that when strtok() finally returns zero, the pointer doesn't get overwritten.
Thank you Helios and Disch, I'm starting to understand it.
So how can I store the last pointer obtained in a different pointer? I'm thinking about putting it in the while, just before the function is called. The problem is that I'm stuck again when I want the SECOND last word that way.
Like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
char* pointer;
pch = strtok (str," ,.-");
while (pch != NULL)
{
pointer = pch; // I'm pretty sure I'm doing it wrong again...
pch = strtok (NULL, " ,.-");
}
return 0;
}
Also, is it possible to store the contents of pch (or the new pointer variable, whatever) in a new char array?
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
char* pointer, *secondLast;
char *tokens =" ,.-";
secondLast = pointer =""; //in case we are given an empty string or a string consisting of tokens "" to tokenize.
pch = strtok (str,tokens);
while (pch != NULL)
{
secondLast = pointer;
pointer = pch;
std::cout << pointer << std::endl;
pch = strtok (NULL, tokens);
}
return 0;
}
With regard to the second part - saving the strings or the pointers - could be a bit messy given that the string to be tokenized can be of an arbitrary length thus producing an arbitrary number of strings (anywhere from 0 to stringlength/2) depending on the token content)
To answer one of your original questions, I recommend against using strtok() unless you really, really have to. It is important to understand strtok() and it sounds like you are getting there, but there is a better option.
The Boost String Algorithms library has this functionality and it is much easier to use.
Well, dealing with character arrays, pointers and pointer arrays is, as you are experiencing, pretty painful. The advantage of using C++ over C is that it has these nice data types, containers and algorithms to make programming a little more enjoyable.
The reason that Microsoft warns of the use of strtok() is that it is easy to use in an unsafe manner. The behavior of strtok() is undefined when the first call is passed a NULL pointer. It is also not thread safe -- you have to remember to use the thread-safe version.
I've looked at the code and I think I understand it (didn't test it yet), but the only thing confusing me is why a pointer is used on this line:
std::cout << *it << std::endl;