how to understand this C++ function?

just start to learn C++, and sort of confusing about the following section of code:


1.int *getCharCountArray(char *str) {
2. int *count = (int *)calloc(sizeof(int), NO_OF_CHARS);
3. int i;
4. for (i = 0; *(str+i); i++)
5 count[*(str+i)]++;
6. return count;
7. }

In specific, I know the line 4 is to iterate the char array pointed by str, but I don't quite understand how this kind of pointer stuff works? Moreover, I do not know how the line 5 works? Thanks for your help.
line 4 is the typical C-style loop over an string until you find \0 (end-marker of C-strings)

Line 5 increase the position in an array, that has the byte-value of a character. Lets get this written more clearly: Line 5 is equivalent to:
1
2
char valueOfPositionI = str[i];
count[valueOfPositionI]++;


(Remember, that "*(str+i)" is just a fancy version for "str[i]". )

So basically, the function calculates the number of occurences of each character in a string. Some kind of histogram.

By the way: The memory is allocated using "malloc" (calloc is a helper which calls malloc and memset(.., 0) ), so you have to free the return value using "free()" and not "delete". Another indicator that this function comes from C and not C++ ;).

Ciao, Imi.


Thanks, imi.

Still feel confusing about how the "count" array works.

Here, "count" works as an integer array where each entry is supposed to store the number of occurences of each character.

For instance, assume char *str = "ABA"
Then the program should go like as follows:
iteration 1: *str = A, count[A]=1
iteration 2: *str = B, count[B] =1
iteration 3: *str = A, count[A]=2
Thus, in the code of "count[*(str+i)]++;" the *(str+i) works as an index for the count array.

What confused me is that, in general, an array should be indexed by 1, 2, 3, etc. Here, an array is indexed by a character!
Yeah. They are using the fact that a character is implicitly convertable to an integer ('\0' becomes 0, etc.)
They are using the fact that a character is implicitly convertable to an integer

Actually, they are using the fact that a character IS an integral type. No conversion takes place.

That's one of the odd things in C (and inherited in C++): The dual nature of char.

1
2
3
char c = 32; // space is ASCII 32. integer 32 (usually 4 bytes) gets converted by the compiler to integral type "char" (only 1 byte)
c += 2; // now it's ASCII 34 == ".
c += '1'; // add the value of ASCII '1', which is 49. Result is 83 or 'S'. No conversation takes place here. 


This can be very confusing, especially if you come from languages like Java where "char" is a lot, but it's not just a 1-byte integral number. ;)


While we are on the topic: The code above is not portable and may crash on many machines, if str contains 8-bit characters. The "normal" char (without anything before it) can be made "signed" or "unsigned" by the compiler. So on platforms where char is signed, the code will crash if the compiler uses "signed char" and you read in characters that are ASCII >=128, as this would lead to negative indexes.

In other words: The code will only work for plain 7-bit ASCII strings and crash (or behave undefined) otherwise.

Ciao, Imi.

Thanks, firedraco and imi.

Based on your explanation, can I understand this issue as follows:

I still use char *str = "ABA" as an example.
Here, every char A, B and A are converted to int type when the program calls count[*(str+i)]. This is because how the count array was defined, i.e., int *count = (int *)calloc(sizeof(int), NO_OF_CHARS);

Thus, in count[A], the system will treat A as an integer, which works as an index.

One more question comes up, is that feasible to know what exactly the integer value of "A" has been converted to. Should A be converted to 1, since A is the first entry in the count array.
when an integral value is needed and you pass in a character, the integral value of that character is used..

One more question comes up, is that feasible to know what exactly the integer value of "A" has been converted to. Should A be converted to 1, since A is the first entry in the count array.
i think it is 65, cout << 'A'-'\0'; will show it's value..
Thanks, blackcoder41.

when you need to show it's value, why you are using cout<<'A'-'0\'; rather thancout<<'A';.

Another issue is that, for example, if *str = "AC" , and thus the count is an array with two entries. However, we have count[65]and count[67]. Does these two entries, count[65]and count[67], stored continuously, just as what we normally have for an array, like A[1]and A[2].
when you need to show it's value, why you are using cout<<'A'-'0\'; rather thancout<<'A';.
since there is a minus operator then it uses the integral value of 'A' and '\0' since it expects numbers as its operand.

for your second question, the answer is yes as far as i know..

you may also want to watch a lecture from standford university
http://www.youtube.com/watch?v=jTSvthW34GU&feature=SeriesPlayList&p=9D558D49CA734A02
when you need to show it's value, why you are using cout<<'A'-'0\'; rather thancout<<'A';.


'A' is of type "char". Allthough char is an integral type (which means you can add and substract it like normal numbers), it is also formost the type called "char".

the cout - iostream object has a special overload for displaying "char", which will not print its integral value number, but its character representation. Hence you get "A" as display, if you do cout << 'A'.

You can achieve the same with cout << (int)'A'; which probably is a bit more understandable ;-).

However, we have count[65]and count[67]. Does these two entries, count[65]and count[67], stored continuously, just as what we normally have for an array, like A[1]and A[2].

If you mean whether the byte for ASCII 67 ('C') comes directly after the byte for ASCII 65 ('A'), then: no. They are stored at the index 65 and 67 resp. So there is a byte at position 66 ('B') inbetween. As well as there are a total of 65 different entries before the count[65] (starting from count[0], count[1] ...)

But if your question was, whether "count" is just like a normal array, then: yes. ;-)

If it improves your understanding, think of line 5 in your original example as written like this:

 
count[ (int)str[i] ]++;



Ciao, Imi.
Topic archived. No new replies allowed.