Frequency Analysis Question

My program has 4 options, the first asks for a user specified input file name, the second stores the letters in that file into an array. The third part analyzes the array and detects character (a-z) frequency, while the fourth part compares the analyzed frequencies to a set of real frequencies given earlier in the program. The third part is where I'm lost:

In the third part, I'm supposed to analyze the array, detecting the frequency of each character in the array.

The characters are stored in the array coderead[i] and I know for a fact they are stored correctly because when I do a cout << coderead[i]; all the numbers from the array are displayed.

So I have this in a do-while loop from the second option:

readfile >> coderead[i]; //file is read into array
num++; //counts the number of characters in file

Outside of the do-while loop in the third option I have this:

for (int i=0; i<=num; i++) {

if (coderead[i]==1) {
count++;
}

f=count/num-1; //gives the frequency of a character in the array
}
cout << count; //checks to see if count is working

//cout << count; is just spitting out 0.. not sure what's going on here or what i'm doing is right.



I'm guessing that when I check the frequencies, I need to store them into another array so I can eventually compare them with the real frequencies.

I just don't know exactly how to analyze the frequency of each character.

If you need me to be more descriptive or if you need more information to help me out just ask.

Any help would be appreciated.

Thanks!
closed account (D80DSL3A)
First off - watch out for integer division! This expression:
f=count/num-1; //gives the frequency of a character in the array Since count < num you will get f = 0. (eg. 3/5 = 0 NOT 0.6.) Also, are you sure the quantity is right? For example f = 3.0/6.0 - 1.0 = 0.5 - 1.0 = -0.5. Without () in the denominator that's what you'd get.
Why subtract 1? Even if you meant f = count/(num-1) I think (if I understand correctly) that this would give a wrong value.
Example: file has 6 a's in it. Then frequency of a's should be 6.0/6.0 = 1.0 not 6.0/5.0 = 1.2.
Shouldn"t the sum of all frequencies = 1?
I take it that f is a float or a double value. Cast the integers to float (or double) before dividing them.

What is this if (coderead[i]==1) for? This would test for a character with ascii value = 1 (which is a smiley-face). Aren't you looking for 'a' through 'z'?

I'm guessing that when I check the frequencies, I need to store them into another array...

That seems like a good idea. If you used:
int letterCount[26] = {0};
You could get all of the counts in a single loop through the coderead[].
Like so:
1
2
3
for (int i=0; i<num; i++) // this will loop num times
    if (coderead[i]>='a' && coderead[i]<='z')
        letterCount[....]++;// hint - find index by subtracting letters 

Hope this helps.
Yeah, I had the division all messed up with floats and ints. I subtracted 1 because values are stored in the 0 place in arrays, and I wasn't sure if I needed to subtract 1 or not for characters.

So I did this for the new array:

i created an array called char alphabet[]={"abcdefghijklmnopqrstuvwxyz"};

for (int i=0; i<num; i++) {

if (coderead[i]>='a' && coderead[i]<='z') {

letterCount[coderead[i]-alphabet[x]]++; //not sure if this is what you meant by subtracting letters?
}

f=letterCount[coderead[i]-alphabet[x]]/num;

cout << f;

}


I guess I'm a little confused still, but this makes a lot more sense to me. It's definitely not spitting out the right frequencies, I'm pretty confused about subtracting the letters?
closed account (D80DSL3A)
I don't think you need the alphabet array. What I meant by subtracting letters was:
letterCount[coderead[i]-'a']++; This gives the array index = offset from the letter a.
Examples: 'a' - 'a' = 0, 'd' - 'a' = 3
Try that and try casting to float in your calculation of f, which you will want to do in another loop following the one where letterCount[] is being filled.
Where you have it, the letterCount[] isn't ready yet! Not all of characters have been counted.
After filling letterCount:
1
2
3
4
5
6
for (int i=0; i<26; i++)
{
     f = (float)letterCount[i]/(float)num;// there's just one f (not an array of them)
                                    // so do your comparing here as the values are calculated
    cout << f;
}
Last edited on
for part 4,

would you assign a new array with the true values of the table(frequencies) given for the assignment, and some how match them up to one another?
such as
1
2
float realfrequencies[27]={0,.07,.02,.04,0,.05,0,.03,.01,.06,0,0,.08,0,.09,.1,0,
			0,.11,.12,0,0,0,0,.22,0,0};   //for a to z 


would a for loop be the best method to swap the letters in the text file?

i'm in the same class
Last edited on
closed account (D80DSL3A)
OP didn't state what comparisons are to be made so I don't know about that.
He was seeking help with part 3 only.
Your array is declared to hold int values though so you can't store float values in it.
comparisons between the float array(sorry) and the frequencies of the letters found in part 3. If they match up (say a has a frequency of .04 in part 3 and the true values 't' has frequency of .04, then the letters need to swap places to "Decode the message"
Topic archived. No new replies allowed.