Counting the occurences of different letters within a file

Feb 6, 2017 at 8:09am
Hi guys, I'm trying to make a program that can read a file, check each 'character', record the frequency of each character, then print the character that is most frequent. I'm also aiming to make this program apply to any character, not just the letters displayed in the text file.

The text file is as follows:

yy
aaa
b

So far I've managed to parse the file and store each sequence of characters into different elements of a vector named list.

Right now I'm trying to access each element of the vector, count the occurrences of each character and compare them. However once I compile the program, I get the error message "string subscript out of range".

I believe the issue lies somewhere within my last 'for' loop because if I change the loop to "(int i = 0; i < size -1; i++)" the error message stops however, as expected, the console only prints the line "ya", completely ignoring b.

Can anyone explain the reason behind this? I'm still fairly new to programming so I apologise if this seems a bit silly.

Also, any tips on how to make this program would also be appreciated, thanks :)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
  std::vector<std::string> list;
	std::ifstream ifile;
	ifile.open("test.txt");
	std::string charsequence;
	char spaces[3]; //array for position of spaces
	int position = 0;
	if (ifile.fail())
	{
		std::cout << "Error opening file" << std::endl;
		exit(1);
	}
	std::string line;
	while (getline(ifile, line))
	{
		for (int i = 0; i < line.length(); i++)
		{
			if (line[i] == ' ')
			{
				spaces[position] = i;
				position++;
			}
		}

		charsequence = line.substr(0, spaces[position]);
		list.push_back(line);
	}

	int size = list.size();

	for (int i = 0; i < size; i++)
	{
		std::string seq = list[i];
		char letter = seq[i];
		std::cout << letter;

	}
	
Feb 6, 2017 at 8:18am
What is your program like? Can you provide us with expected output?

Feb 6, 2017 at 9:19am
I rather would read the file char by char.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <iostream>
#include <iomanip>
#include <Windows.h>
#include <fstream>

int main()
{
  std::ifstream src("filename");

  if (!src)
  {
    std::cerr << "Error opening file: ";
    return;
  }
  char ch = 0;
  while (src.get(ch))
  {
    // use ch
  }
  system("pause");
  return 0;
}


To record the frequency of each character, I would use a std::map<char, size_t>
Feb 6, 2017 at 10:26am
Yes, the expected output of my current code should be: yab.
Currently, I'm trying to loop through the elements of the vector named 'list' (which consists of a series of strings that contain:
list[0] = yy,
list[1] = aaa,
list[2] = b.


Then I want to store the repeated char into the variable 'letter', then print it.
Feb 6, 2017 at 10:32am
Is it what you wanted?

1
2
3
4
5
6
for (int i = 0; i < size; i++)
	{
		std::string seq = list[i];
		char letter = seq[0];
		std::cout << letter;
	}


Line 33 : You can access the first character and it will be always safe. Otherwise, seq[i] is unsafe because the index is misused.
Feb 6, 2017 at 10:58am
The std::map could be nice for recording a {char,count} table.
http://www.cplusplus.com/reference/map/map/find/
Feb 6, 2017 at 11:20am
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <iostream>
#include <limits>
#include <fstream>
#include <cctype>
#include <algorithm>

int main()
{
    constexpr std::size_t N = std::numeric_limits< unsigned char >::max() + 1 ;
    int counts[N] {} ; // letter frequency counts; initialise to all zeroes

    //@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    std::ifstream file( __FILE__ /*"myfile.txt"*/ ) ; // this file
    file >> std::noskipws ; // read all characters, including white-space

    //@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    char c ;
    while( file >> c ) { const unsigned char u = c ; ++counts[u] ; }

    //@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    const auto most_frequent = std::max_element( counts, counts+N ) - counts ;
    std::cout << "(one of the) most frequently occurring byte(s) - \nvalue: " << int(most_frequent)
              << "  frequency: " << counts[most_frequent] ;
    if( std::isprint(most_frequent) ) std::cout << "  printable char: '" << char(most_frequent) << "'\n" ;
    else std::cout << "  it is not a printable character\n" ;

    //@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
}

http://coliru.stacked-crooked.com/a/5e188b823589ba56
Feb 6, 2017 at 12:41pm
closed account (48T7M4Gy)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <iostream>
#include <vector>
#include <fstream>

int main()
{
    std::vector<int> list(255);
    
    std::ifstream ifile;
    char ch;
    int highest_frequency = 0;
    int index_of_most_frequent = 0;
    
    ifile.open("????.txt");
    
    if (ifile.fail())
    {
        std::cout << "Error opening file" << std::endl;
        exit(1);
    }
    
    while (ifile >> ch)
    {
        list[(int)ch]++;
    }
    
    // LIST OUT RESULTS OF COUNTING
    for(int i = 0; i < list.size(); i++)
    {
        std::cout << i << ' ' << (char)i << ' ' << list[i] << '\n';
        if(list[i] > highest_frequency)
        {
            highest_frequency = list[i];
            index_of_most_frequent = i;
        }
    }
    
    std::cout
    << "Most frequent character " << (char)index_of_most_frequent
    << " frequency = " << highest_frequency << '\n';
    
    return 0;
}
Feb 6, 2017 at 1:13pm
do you know what a bucket sort is?

int counter[256] = {0}; //ascii. Unicode is much more troublesome.

for(I = 0; ....iterate over the file size in chars)
{
counter[characterfromfile]++;
}

Done. Want to know how many times the letter X appeared?
cout << counter[(int)'X'];


edit, looks like someone beat me to it.
Last edited on Feb 6, 2017 at 1:16pm
Feb 6, 2017 at 3:32pm
closed account (48T7M4Gy)
http://stackoverflow.com/questions/9317248/writing-bucket-sort-in-c

google "c++ bucket sort" if that one is no use.
Feb 8, 2017 at 11:17am
Thanks for the advice guys :) Particularly Mantorr22, that fixed the issue I was encountering.
Topic archived. No new replies allowed.