Tokenize string to char *array[]

Dec 31, 2018 at 12:36pm
Say I have this string
 
string str="name1,name2,name3";

I want to separate the names using ',' as delimiter.
The str is user defined so it can have any number of names separated by comma.
How do I go about separating it and storing in a char *array

The examples I have seen use string streams, vectors which I do not want to use.

Thanks
Last edited on Dec 31, 2018 at 12:46pm
Dec 31, 2018 at 1:13pm
One way would be to 1) count the number of tokens (#commas + 1), 2) allocate a char* array that is big enough to hold all the tokens, 3) use strtok in a loop that adds each token pointer to the array.
Last edited on Dec 31, 2018 at 1:21pm
Jan 1, 2019 at 5:25am
Here is what I have come up with

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
char gfg[100] = "name1, name2, name3";
	
	//count delimiters
	int count = 0;
	for (int i = 0; i < 100; i++)
	{
		if (gfg[i] == ',')
		{
			++count;
		}
	} 
	char *tok[50];
	tok[0] = strtok(gfg, ",");
	MessageBox(NULL, tok[0], tok[0], MB_OK);
	for (int x = 1; x < count+1; x++)
	{
		tok[x] = strtok(0, ",");
		MessageBox(NULL, tok[x], tok[x], MB_OK);
	}

This gives me desired results, but I am worried If I am doing it right, plus do I have to delete the array afterwards?. Please help.
thanks
Last edited on Jan 1, 2019 at 6:32am
Jan 1, 2019 at 10:58am
do I have to delete the array afterwards?

No, the arrays (gfg and tok) will be destroyed automatically when they go out of scope.

I am worried If I am doing it right

It is correct for the example that you have shown, but it won't work correctly of the string can contain more than 50 tokens. The reason I mentioned counting the number of tokens first was so that you could know how large array you should create. If the array always have the same size you don't really need to do so. You could just call strtok until it returns null. You might also want to add a check that makes sure you don't go out of bounds in case there are more tokens than array elements.

Also note that array indices start at 0 so in your code you are not making use of the first element in the tok array which means you only have room for 49 tokens instead of 50.
Jan 1, 2019 at 11:20am
Thanks I really appreciate your help, will try creating dynamic array and come back.
I should use malloc right?
Jan 1, 2019 at 11:55am
char gfg[100] = "name1,, name3";
If you had this, would you expect to see an empty field?

char gfg[100] = "\"Flintstone, Fred\",\"Rubble, Barney\"";
If your strings are coming from a CSV file, you're likely to run into all sorts of trouble over simple use of strtok.
Commas embedded inside fields will trip you up.
https://en.wikipedia.org/wiki/Comma-separated_values
Jan 1, 2019 at 12:34pm
I should use malloc right?

If this is C code, then yes.
If it is C++ code, then you probably want to use new.
Personally I would use std::vector, but you said you didn't want to use that.
Jan 8, 2019 at 9:29pm
not sure what the string version of strtok is?

anyway, the very simple way to do this is one loop, all in one:
copy the original string into a c-string, letter by letter.
while copying: set the first char* for the first string to the address of the first letter of the copy destination.
if the letter is a comma, replace it with a zero and set the next char* to the address of the next letter...
this would be easier if the char*s were an array or vector, not named entities. If you need to name them, name the index
stringvar[thefirststring] or the like...

the whole thing is like 1/4 the size of the strtok approach above in size and complexity. Its reinventing the wheel which is bad but its rolling the copy, pointer assignment, token splitting and all into a single loop.
Last edited on Jan 8, 2019 at 9:39pm
Feb 25, 2019 at 4:16pm
closed account (E0p9LyTq)
Tokenize string to char *array[]

Are you required to write your own code to tokenize, or can you use a library?

The Boost library has the Boost Tokenizer.

https://www.boost.org/doc/libs/1_69_0/libs/tokenizer/doc/index.html
Topic archived. No new replies allowed.