My goal is to make a function that takes a string and some delimites, then it split the string in substring, add them to a 2D char array and finally return a pointer to this 2D array
Sorry but for some reason code formatting is not working!
this is what I was able to do in 3 hours, I am still not very experienced
Your problem is the terminating 0 or better the lack there of:
1 2 3
uint8_t len = strlen(str);
char str_copy[len]; // Variable length arrays are usually not a good idea (stack overflow)
strncpy(str_copy, str, len); // No terminating 0 is appended
char *token = strtok(str_copy, delimiter); // This requires the terminating 0
1 2
keysChar[counter] = (char *)malloc(strlen(token)); // It should be strlen(token) + 1
strcpy(keysChar[counter],token); // strcpy((...) appends the terminating 0 --> out of bounds
len: 35
#M
1
T
259
S
31
P
5
A
45
D
78
C
99
size: 1
0: #M
1: 1
2: T
3: 259
4: S
5: 31
6: P
7: 5
8: A
9: 45
10: D
11: 78
12: C
13: 99
14:
So about the function: there is still a problem with the size of the array
I can choose to set the array size to the maximum possible lenght at the beginning: char **keysChar = (char **)malloc(sizeof(char *) * len);
then i can reallocate its size to the value of the counter: keysChar = (char **)realloc(keysChar, sizeof(char *) * counter);
Do you think is a good idea? or should i work harder to find a better way to allocate in the first malloc the exact size?
then the second problem: I am not able to find a way to get the size of the resultin array. I have followed a few answer on stack ovewflow but none of them seems to work.
Actually on line 11 you do not copy the 0. User strcpy(...) instead of strncpy(...).
While realloc(...) for a pure c program is ok, in your case line 26 (outside the loop) is wrong. You need to reallocate as soon as counter > len whithin the loop after line 23.
this should work:
It does not work for dynamically allocated arrays.
To determine the end of the array you might allocate one more line than necessary and set it to null. Similar to strings.
do everything in your power to minimize calls to realloc. Do not call it in a loop, for example, call it before the loop and add as many as you will loop over (if you know this, and if not, can you find out?). Realloc is very slow, and the more you do it, the more impact it will have on your program's speed. Overalloc the first time (make your best guess at a reasonable starting size) or try to find out how much you need before you get memory when you can. Exact size isn't needed, but if you must guess, then guess bigger than what you expect to need.
size of the array:
you should always know this. Always. Just track it.
It was 10. you realloc and add 3, now its 13. You know this because you called realloc... right there in your code, just lift the resulting size and keep it around. Even if its a variable from reading a file in a loop, somewhere, you can still count and track.
You need to reallocate as soon as counter > len whithin the loop
Hopefully, if I m not wrong (and today it happened a few times) this won't ever happen, because I set the initial malloc to be the same size of the input string. I will follow jonnin advice to
do everything in your power to minimize calls to realloc.... Overalloc the first time
this way I m over allocating then reducing the size to fit the number of substring.
It does not work for dynamically allocated arrays
I discovered it now!
To determine the end of the array you might allocate one more line than necessary and set it to null
I am not sure if I understood, so if I do something like this: arr[counter++] = "\0“ then I would be able to use the sizeof()?
#ifndef DUTHOMHAS_SPLIT_CSTRING_H
#define DUTHOMHAS_SPLIT_CSTRING_H
#ifdef __cplusplus
extern"C" {
#endif
char** split( constchar* s, constchar* delimiters );
/*
function
Split a string into substrings by delimiters.
Consecutive delimiters are treated as a single delimiter.
Leading and trailing delimiters do not introduce empty substrings.
arguments
s - The string to split into substrings.
Must not be NULL.
Must be null-terminated.
delimiters - Null-terminated list of characters used to split the string.
May be NULL, in which case the substrings are delimited by
whitespace.
returns
A NULL-terminated array of char*, one for each substring in s.
Each substring is duplicated so s is not modified or encumbered
by references.
The result must be passed to free() when you are done with it.
example
split( " A B C ", NULL ) → { "A", "B", "C", NULL }
split( "", ... ) → { NULL }
*/
#ifdef __cplusplus
}
#endif
#endif
#include <stdlib.h>
#include <string.h>
char** split( constchar* s, constchar* delimiters )
{
char** result;
char** r;
char* p = (char*)s;
size_t n = 0;
if (!delimiters) delimiters = " \f\n\r\t\v"; // defaults to whitespace
// First pass: count delimited segments
p += strspn( p, delimiters );
while (*p)
{
p += strcspn( p, delimiters ); // skip NOT delimiters
p += strspn ( p, delimiters ); // skip delimiters
n += 1;
}
// Allocate the result array in a single block
// The result is first an array of char**,
// immediately followed a copy of the source string s, which we will tokenize.
result = r = (char**)malloc( sizeof(char*) * (n + 1) + (p - s + 1) );
for (size_t k = 0; k < n + 1; k++) result[k] = NULL;
p = (char*)(result + n + 1);
strcpy( p, s );
// Second pass: build the result[] array and separate the substrings
p += strspn( p, delimiters );
while (1)
{
if (!*p) break; *r++ = p; p += strcspn( p, delimiters ); // skip NOT delimiters
if (!*p) break; *p++ = '\0'; p += strspn ( p, delimiters ); // skip delimiters
}
return result;
}
#include <stdio.h>
#include <stdlib.h>
#include "split.h"
int main( int argc, char** argv )
{
FILE* f;
char* s;
long size;
size_t n;
char** lines;
if (argc == 1)
{
fprintf( stderr, "%s\n", "You must provide a file name!" );
return 1;
}
f = fopen( argv[1], "rb" );
fseek( f, 0, SEEK_END );
size = ftell( f );
fseek( f, 0, SEEK_SET );
s = (char*)malloc( size + 1 );
if (s)
{
fread( s, size, 1, f );
s[size] = '\0';
n = 0;
lines = split( s, "\r\n" );
if (lines)
{
for (char** line = lines; *line; ++line)
printf( "%lu: \"%s\"\n", (unsignedlong)n++, *line );
free( lines );
}
free( s );
}
fclose( f );
}
Heh heh heh... :O)
You can easily modify this to do things like:
• modify the source string (as your code does) instead of a copy
• preserve empty substrings
• split on a specific delimiter sequence instead of any character in the delimiter string
Yes, I had thought about that (actually now I am doing it until I don't find a better solution) but I would like to understand how to copy sub_string to dest_array
Though I recommend that you try to understand what @Duthomhas showed...
I did it, even if I didn't understood all the passages, sometimes there is too much "pointer magic" for my level, I am still learning and this discussion helped me a lot, really thanks to all of you :D
#include <iostream>
#include <string.h>
#define print std::cout
void splitString(char*** dest_arr, int* len_dest_arr, char *str, constchar *delimiters)
{
int str_len = strlen(str) + 1; // add null terminator
char str_copy[str_len];
strcpy(str_copy, str); // we work on a copy
char **sub_string = (char **)malloc(sizeof(char *) * str_len); // over size
uint8_t counter = 0;
char *token = strtok(str_copy, delimiters); // split until first token
while (token != nullptr)
{
sub_string[counter] = (char *)malloc(strlen(token) + 1); // add null terminator
strcpy(sub_string[counter], token); // copy token to dest_array
token = strtok(NULL, delimiters); // continue splitting
counter++;
}
sub_string = (char **)realloc(sub_string, sizeof(char *) * counter); // reallocate the right memory
*dest_arr = (char **)realloc(sub_string, sizeof(char *) * counter);
*len_dest_arr = counter;
//free (sub_string); can't do this because dest_arr point to it
}
int main()
{
char *data1 = "xM=1:T=259:S=31:P=5:A=45:D=78:C=99";
int sz1;
char **sub1 = nullptr;
splitString(&sub1, &sz1, data1, ":=");
print << "\nsize: " << sz1 << "\n";
for (int n = 0; n < sz1; n++)
{
print << n << ": " << sub1[n] << "\n";
}
print << "\nthanks guys\n";
}
and its output:
size: 14
0: xM
1: 1
2: T
3: 259
4: S
5: 31
6: P
7: 5
8: A
9: 45
10: D
11: 78
12: C
13: 99
thanks guys
Still I am quite sure i can leave without sub_string, I tried in the last 30 minutes to get rid of it but without a lot of success. For my level three-level pointer is head-ache
Inside the function i know that **dest_arr will give me the first element of the array but i am not able to go to the next one
Still I am quite sure i can leave without sub_string, I tried in the last 30 minutes to get rid of it but without a lot of success.
Actually using sub_string is fine. Just line 24 is unnecessary and should be removed.
If you don't want to use sub_string you can replace it with (*dest_arr). Niote the parentheses. The reason for this is that the subscript operator[] has higher precedence than the dereference operator*. See:
Is beyond doubt that i wouldn't be able to do it without your help and that i really need to put my hands on a good C/C++ book
I am posting the final version of the function since I hope it may be helpful for others who find this discussion, it should be memory leak free (hope so)