why strtok changes original string?

Jan 8, 2015 at 10:38pm
I write function to find out how many occurrences are in string. I use strtok function as I don't know other way how to do it. Now I notices that the strtok replaces occurences of ';' delimiter by null pointer. I dismayed because later I want use the original str string to strtoken again, but in the case that there are no semicolons anymore, I got only first token and the other are not reachable. So my question is which this function does the change to original string I just don't understand sense of it and it seems to me like a sh*tty strok function... haha. But maybe I just did not understood something here?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
int getArgWordCount(char * str, SETTINGS * settings )
{
    if (!settings->loadSpecifiedKernels)
        return 0;

    const char s[2] = ";";
    char *token;
    token = strtok(str, s); // read first
    int count = strlen(token)?1:0;

    while ( token != NULL )
    {
          printf( " %s\n", token );
          count++;
          token = strtok(NULL, s); // READ NEXT TOKEN: char *
    }
return count;
}
Jan 8, 2015 at 10:53pm
why strtok changes original string?
To be able to resume execution and return parts of the string without addition allocations (standard C function do not do additional allocations, they work only on provided buffers). It is simpliest and fastest way to do that.

You should consider using strpbrk and strspn/strcspn to implement non-intrusive reentrant tokenisation.
http://en.cppreference.com/w/cpp/string/byte/strpbrk
http://en.cppreference.com/w/cpp/string/byte/strspn
http://en.cppreference.com/w/cpp/string/byte/strcspn
Jan 9, 2015 at 1:47am
Jan 9, 2015 at 10:45am
Thank you. I have corrected my function and works great. However it is not still clear to me how to parse the string. Should I first copy the original string with strcpy and then parse the copy? This seems to me like only solution. I think that it's regardless whether I copy the string outside of the std function or inside my own function. This will not effect performance at all, huh?

Or maybe I should let it rename the separators for null character and read the string with strspn next time,
Last edited on Jan 9, 2015 at 10:54am
Jan 9, 2015 at 4:15pm
If you are using strtok() and not the function I gave you, then yes, make a copy first.

Copying strings always costs. You'll have to run your program and see if it noticeably affects performance.

You could just use the function I gave you.
Jan 18, 2015 at 11:31am
Finally I corrected the program., I have no idea why Duoas recommended me those functions. There is much simpler and faster approach.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/* Return count of parsed items */
int getArgWordCount(char * str, SETTINGS * settings )
{
    if (!settings->loadSpecifiedKernels)
        return 0;

    unsigned int count = 0;
    char* pCopy = str; // just copy pointer
    // char* sep = ";"; // This would be for char* (word or string separator)
    do {
       /* This would be for char*: */
       // pCopy = strposBreak(pCopy, sep); // find separator
       // if (pCopy) pCopy += strSameLength(pCopy, sep); // skip separator
       /* This is for char: */
        pCopy = strchr(pCopy, settings->separator);
        pCopy++;
       ++count; // increment word count
    } while(pCopy && *pCopy);

return count;
}


strchr() finds the separator character ';' or ':' depended how it is defined in settings. Then just pCopy++ to skip separator and then increment count.
Last edited on Jan 18, 2015 at 11:32am
Jan 18, 2015 at 11:54am
If character does not exist in the rest of the string, your program will try to dereference 0x1 pointer and can crash.
For example line "first;second" with separator ';'.
First iteration works fine, but on second line 15 results in null pointer, line 16 increments it. And on line 18 pCopy results in true and then *pCopy is evaluated. *(0x1) is not valid.
Jan 18, 2015 at 12:01pm
I have no idea why Duoas recommended me those functions.

That tokenizer has more features / is more generic / does error checking, but apparently you did not notice or need them.


Questions for you:

Is 0+1 true or false, and who says that memory address 0x00000001 is null?
How many words are in ""?
How many words are in ";"?
How many words are in "foobar"?
Jan 18, 2015 at 12:36pm
MiiNiPa:
I have notice the problem later, when I finished interaction and tried to think out why it crashes. You noticed me about the NULL pointer, so I added condition.
if (pCopy) pCopy++;
Thanks.
Topic archived. No new replies allowed.