Optimizing string conversion and concatenation

I am looking to optimize the speed of the following code as much as possible.
I'm from Java world so thing isn't my kind of thing...
Any help or tips would be appreciated.

I would prefer to use stl string's as oppose to char*, but if there is a big performance difference I'm open to char*.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
void to_char(const wchar_t* orig, char* out) {
	size_t origsize = wcslen(orig) + 1;
	size_t convertedChars = 0;
	wcstombs_s(&convertedChars, out, origsize, orig, _TRUNCATE);
}

void ltrim(string& str) {
	string::size_type pos = 0;
	while (pos < str.size() && isspace(str[pos])) pos++;
	str.erase(0, pos);
}

void rtrim(string& str) {
	string::size_type pos = str.size();
	while (pos > 0 && isspace(str[pos - 1])) pos--;
	str.erase(pos);
}

void btrim(string& str) {
	ltrim(str);
	rtrim(str);
}

int getResult(string& resultStr) {
	int retVal = FAILURE;

	// Ignore this part.
	pRECOGWORDS2 pRecogWords = NULL;
	L_INT nWordsCount = 0;
	retVal = L_Doc2GetRecognizedWords(hDoc, 0, &pRecogWords, sizeof(RECOGWORDS2), &nWordsCount);

	if (SUCCESS == retVal) {			
// This is what I would like to optimize. It works, but seems inefficient.
		string tempResult = "";
		for (L_INT i = 0; i < nWordsCount; i++) {
			char out[1000] = {0};
			// *** Note that pRecogWords[i].szWord is type wchar_t
			to_char(pRecogWords[i].szWord, out);
			string outStr(out);

			if (outStr != "") {
				tempResult += outStr;
				tempResult += " ";
			}
		}
		
		btrim(tempResult);
		resultStr = tempResult;
	}

	L_Doc2FreeRecognizedWords(hDoc, &pRecogWords);
	return retVal;
}
if (outStr != "") is always true.

++i is faster than i++ in for (L_INT i = 0; i < nWordsCount; i++), although optimiser may fix that.

AS it stands, you're calling wcslen and wcstombs_s and std::string::operator+= twice for each iteration.

Why not build up a wide buffer with a space between the words then convert to char once at the end?
Thanks for suggestions.

New code if anyone is interested, any other suggestions are welcome.

I did have a question. Does the RESULT_BUFFER_SIZE make a difference in performance or just memory?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

// Added this
static const size_t RESULT_BUFFER_SIZE = 1024;

int getResult(std::string& resultStr) {
	int retVal = FAILURE;

	pRECOGWORDS2 pRecogWords = NULL;
	L_INT nWordsCount = 0;
	retVal = L_Doc2GetRecognizedWords(hDoc, 0, &pRecogWords, sizeof(RECOGWORDS2), &nWordsCount);

	if (SUCCESS == retVal) {
		wchar_t buf[RESULT_BUFFER_SIZE] = {0};
		size_t wordLen = 0;
		for (L_INT i = 0; i < nWordsCount; ++i) {
			wordLen = wcslen(pRecogWords[i].szWord);
			if (wordLen > 0) {
				wcsncat_s(buf, RESULT_BUFFER_SIZE, pRecogWords[i].szWord, wordLen);
				wcsncat_s(buf, RESULT_BUFFER_SIZE, L" ", 1);
			}
		}

		// Remove the trailing space.
		buf[wcslen(buf) - 1] = 0;
		char out[RESULT_BUFFER_SIZE] = {0};
		StringUtil::to_char(buf, out);
		resultStr = out;
	}

	L_Doc2FreeRecognizedWords(hDoc, &pRecogWords);
	return retVal;
}
It would be more efficient not to zero out the entire buffer if necessary. (lines 13, 25)

It would also be more efficient to write directly to the resultStr by making the string 1024 characters
wide and then using &resultStr[0].

> and then using &resultStr[0]

That's a dangerous way to do it. The reference need not be to the internal data.

The least dangerous way to do that dangerous dangerousness is to use data() directly, as all known implementations simply return the string's internal buffer. (You'll have to const_cast<CharT*> it, though.)

But the standard doesn't guarantee that either data() or c_str() return anything you can manipulate, so it is better just to use assign() or append().
You're still calling wcslen in the loop, and wcsncat_s twice in the loop. These walk the string each time, and and the string gets longer with each iteration giving you an O(n^2) algorithm.

Maintain the end pointer yourself and copy the characters yourself.

You need to alllocate memory dynamically for the buffer and grow it when you reach the end, or start with a large-ish fixed buffer and drop out of the loop when it's full. This will yeild a linear algorithm, directly proportional to the total number of characters you need to process.
jsmith - If I do not zero it I get runtime error.
Whats my alternative?

I will stay away from using the string directly.

kbw - Could you give me some direction as to what functions calls I would use?

As for calling wcslen in the loop, whats my alternative? Should I just use:
 
wcsncat_s(buf, RESULT_BUFFER_SIZE, pRecogWords[i].szWord, RESULT_BUFFER_SIZE);


Problem is that if the word is not > 0 I do not want to add it, and I do get "words" that are not actually words, just a 0 length string.
Last edited on
You don't need functions. I've not compiled this, but it illustrates the point. It doesn't check for end of buffer either.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
	static const size_t RESULT_BUFFER_SIZE = 1024;
	wchar_t buf[RESULT_BUFFER_SIZE] = {0};
	wchar_t *bufptr = buf;

	for (int i = 0; i < nWordsCount; ++i)
	{
		// copy over string
		for (wchar_t *p = pRecogWords[i]; *p; ++p)
			*bufptr++ = *p;

		// copy over space
		if (pRecogWords[i] != p)
			*bufptr++ = ' ';
	}

	// null terminate the string
	*bufptr++ = 0;
kbw. Thanks a lot for your help!

I added in a call to rtrim (as seen in my original post) to remove trailing spaces which I still get.

Any comments on wchar_t buf[RESULT_BUFFER_SIZE] = {0};?
Is that inefficient or will this simply set buf[0] to null?

Updated version:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
 
int getResult(std::string& resultStr) {
	int retVal = FAILURE;

	pRECOGWORDS2 pRecogWords = NULL;
	L_INT nWordsCount = 0;
	retVal = L_Doc2GetRecognizedWords(hDoc, 0, &pRecogWords, sizeof(RECOGWORDS2), &nWordsCount);

	if (SUCCESS == retVal) {
		wchar_t buf[RESULT_BUFFER_SIZE] = {0};
		wchar_t *bufptr = buf;

		for (L_INT i = 0; i < nWordsCount; ++i) {			
			wchar_t *p;
			// copy over string
			for (p = pRecogWords[i].szWord; *p; ++p)
				*bufptr++ = *p;

			// copy over space
			if (pRecogWords[i].szWord != p)
				*bufptr++ = ' ';			
		}

		*bufptr++ = 0;

		char out[RESULT_BUFFER_SIZE] = {0};
		to_char(buf, out);		
		resultStr = out;
		rtrim(resultStr);
	}

	L_Doc2FreeRecognizedWords(hDoc, &pRecogWords);
	return retVal;
}
wchar_t buf[RESULT_BUFFER_SIZE] = {0} assigns zero to buf[0]. You don't need it here.
Topic archived. No new replies allowed.