Implement strtok without modifying string?

Hi

strtok works by replacing references to the token with a NULL pointer so that the string argument is itself modified. Often this is an undesirable consequence.

A simple solution is to simply copy the string and input the copy to strtok. However, this does seem inefficient and a little inelegant.

So I was wondering whether it is possible to create a function that is effectively strtok without modifying the string? I assume not as otherwise strtok would have done so!

Thanks
sure -- you can use strstr to find the hits without modification, and do what you will with the answers.

all of which you can do with std::string. I don't know if it has a direct strtok-like algorithm or if you would have to cook it up with find and substring functions but it should be straightforward.

Do you need a real time solution or a C solution? What is your goal / need here?

I had a go at implementing:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include <iostream>
#include <string.h>

using namespace std;

bool strsplit (char* str, char* result, char delim)
{
    static int str_begin = 0;
    
    if (str_begin == strlen(str))  //end of string ->start again and return false
    {
        str_begin = 0;
        return false;
    }
    
    for (int i = str_begin; i < strlen(str); i++)
    {
        if (str[i] == delim)
        {
            memcpy (result, str + str_begin, i - str_begin);
            result[i - str_begin] = '\0';
            str_begin = i + 1;
            return true;
        }
    }
    //nothing found so copy till end of str
    
    memcpy(result, str + str_begin, strlen(str )  - str_begin );
    result[strlen(str) - str_begin] = '\0';
    str_begin = strlen(str);
    return true;
}

int main()
{
   char str[] = "hello my name is tom";
   char substr[strlen(str) + 1];
   substr[9] = '\0';
   
   while (strsplit(str,substr,' '))
        cout <<"---" <<substr <<"----" <<endl;
   
   return 0;
}
Last edited on
A simple solution is to simply copy the string and input the copy to strtok. However, this does seem inefficient and a little inelegant.

Your solution makes copies on the characters in the string too. So it's (approximiately) equally inefficient. If you must use C-strings and your own memory-management then strdup() & strtok() (or strtok_r() for thread safety if available/necessary) seem fine to me.
Not without modifying the string, but restoring the string back to its original form before each search for the next token.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#include <string.h>

char* my_str_tok( char* str, const char* delim )
{
    static _Thread_local char* curr_str = NULL ;
    static _Thread_local char last_char = 0 ;

    if(str) // first call
    {
       curr_str = str ;
       last_char = 0 ;
    }

    else curr_str[-1] = last_char ; // subsequent call; restore last delimiter

    char* tok = strpbrk( curr_str, delim ) ;
    if(tok)
    {
        last_char = *tok ;
        *tok = 0 ;
        char* result = curr_str ;
        curr_str = tok+1 ;
        return result ;
    }
    else return NULL ;
}

http://coliru.stacked-crooked.com/a/40986a1480cc20b0
Last edited on
For immutable strings, the alternative would be to directly use strpbrk (usually in conjunction with strspn)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <stdio.h>
#include <string.h>

int main()
{
    const char test[] = "\n\"Twinkle, twinkle, little bat!\nHow I wonder what you're at!\"\n- Mad Hatter\n" ;
    puts(test) ;

    const char delim[] = ", !-\"\n" ;

    int cnt = 0;
    const char* beg = test + strspn( test, delim ) ;
    const char* end = strpbrk( beg, delim ) ;
    
    while( end && *end )
    {
        printf( "%d. ", ++cnt ) ;
        while( beg < end ) putchar( *beg++ ) ;
        putchar( '\n' ) ;
        beg = end + strspn( end, delim ) ; ;
        end = strpbrk( beg, delim ) ;
    }

    puts(test) ;
}

http://coliru.stacked-crooked.com/a/6e1ef3f7c0639d14
Topic archived. No new replies allowed.