I'm working on a program that can read a text file and analyze the percentage of each specific words. I've finished all of the functions, but have some trouble with removing numbers and punctuations. the text file looks like this: "1 guy is writing his shitty code, he got 6 functions ready but stuck at "reading into the array". he is so stupid."
#include <iostream>
#include <fstream>
#include <string>
#include <cctype>
std::string remove_non_alpha( std::string str ) {
std::string result ;
for( char& c : str ) if( std::isalpha(c) ) result += c ;
return result ;
}
int main() {
std::string fileName;
std::cout << "Please input the filename: " ;
std::cin >> fileName ;
if( std::ifstream fin{fileName} ) { // if the file was opened for input
const std::size_t MAX_WORDS = 2'000 ;
std::string wordList[MAX_WORDS] ;
std::size_t num_words = 0 ; // actual number of valid words read
// read up to a maximum of MAX_WORDS
std::string word ;
while( num_words < MAX_WORDS && fin >> word ) {
word = remove_non_alpha(word) ; // remove punct, digits etc
if( !word.empty() ) wordList[num_words++] = word ;
}
// do something with the words that were read. for instance, print them out:
std::cout << "\nvalid words read from file '" << fileName << "'\n--------------------------------\n" ;
for( std::size_t i = 0 ; i < num_words ; ++i )
std::cout << i << ". " << wordList[i] << '\n' ;
}
else {
std::cerr << "error opening file '" << fileName << "'\n" ;
return 1 ;
}
}
Thanks a lot, it works, partially.
I've created a text file, your code can do that, but it cannot work with my original txt file (which contains some Unicode letters, greek I think?)
here's the link of my weird file https://drive.google.com/open?id=1LhdACxKU-BCa8B8tiv8iPc3cywyvcGV6
It also appears to have a mixture of line endings - some lines have internet standard (windows) line-endings and others have unix native line-endings.
The simplest solution would be to select and copy the text from google drive (in the browser) and then use a text editor to save it as a plain text file.