about size of outbuff in iconv

hey everyone:

I have a question which is about iconv, the following code works well now, but there is still a question to me.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

char *ComUtil::convertUTF8ToGBK(std::string str) {
	const char *src = str.c_str();
	char *res = convert("UTF-8", "GBK", (char *)src);
	return res;
}

char *ComUtil::convert(const char *from, const char *to, char *src) {

        iconv_t cd;
	cd = iconv_open(to, from);
	if((int)cd == -1)
		return (0);

	unsigned int len;
	unsigned int target_len;
	char *target;
	char *target_start;
	const char *src_start;
	int len_start;
	int target_len_start;

	len = strlen(src);
	if(!len)
		return (0);

//	target_len = 2*len;
//	target_len = len;
	target_len = len+1;
	target = (char *)calloc(target_len, 1);
	len_start = len;
	target_len_start = target_len;
	target_start = target;
	src_start = src;

	size_t iconv_value;
	iconv_value = iconv(cd, &src, &len, &target, &target_len);
	if(iconv_value == (size_t)-1)
		return (0);
	return target_start;
}


when I use "target_len = len;", the result in GBK is something contains the right chars and some unrelated ones, but when use "target_len = 2*len;" or "target_len = len+1;", every thing is ok.

I think the source encode UTF-8 is using 3 bytes for Chinese char, but GBK will use 2 bytes, so in "target_len = len;", the target_len will be enough large, but the result did not show that.

so can anybody help me? thanks in advance!

Regards
Zhimin

strlen gives you the length of the string, not including the terminating null.

So the extra characters you are seeing are the values in the adjoining memory space (until the necessary null value is found).

Adding 1 to the value of strlen will give you the necessary space, and will copy all characters into that space including the required null.

See http://www.cplusplus.com/reference/clibrary/cstring/strlen/
Last edited on
Hey JMJAtlanta:

Thanks!
I miss the basic usage of strlen()!
And I recalled that problem just happened when source char is English, but not Chinese! because when Chinese char, GBK uses 2 bytes, but when English char, it uses 1 byte, and the same time UTF-8 uses 1 byte too when English char, so I miss strlen() when dealing with English chars!
Thanks again.
Best Regards
Zhimin
Topic archived. No new replies allowed.