reading a file--deleting numbers and punctuations

I am writing a code which reads a document and I have to keep track of the line number and how many times the word appears in the document. Then I have to output it. I created the program and it runs smoothly. I just didn't use a document with numbers or punctuations because I was having problems and wanted to move on. This is what I have below, if I use continue the program with still output the number or punctuation. If I replace it with a space it will out the space with what line the space is in(because it used to be something).
Hoping someone can help.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
  ifstream input_file("test_file.txt");
  string line; 
 if (input_file.is_open()) {
     int counter = 0;
      while (input_file.good()) {
        getline(input_file,line);
        counter++;
        stringstream readword (line);
        string word;
        while(readword>>word){
          for(int i=0; i<word.length(); i++){
                if(isdigit(word[i]))
                    continue;
                if(ispunct(word[i]))
                    continue;

          }
               UpperToLower(word);
            ListOFdata.Add(word, counter);//adds word to my list
            }//end while
      }//end while openfile
       
Last edited on
Would of been nice to see a text file to see what kind of examples you have to deal with. Are the numbers/punctuation mixed with other things in the word etc. Instead of doing the for loop you could do something like this. (Some C++11 lambdas used but you can always make that a separate function if needed).

1
2
auto it = std::remove_if(word.begin(), word.end(), [](char c) {return ispunct(c) || isdigit(c); });
word.erase(it, word.end());


Then don't add it to your list of words if the string has size 0. Could be easier depending on how the text file looks.
thanks for your response.
To test out program our prof told us to use any document, or even texts from a wiki page.
I gather there wouldn't be something like this:
H1 I a!m gr8t today.
Rather:
Hi! there are 47 people on line at the back.

What do you think I should use? I think I need to use something for something before C++11. Also is 'c' a variale?
1
2
auto it = std::remove_if(word.begin(), word.end(), [](char c) {return ispunct(c) || isdigit(c); });
word.erase(it, word.end());


I will use this from the computer science wiki page:

Computer science is the scientific and practical approach to computation and its applications. It is the systematic study of the feasibility, structure, expression, and mechanization of the methodical procedures (or algorithms) that underlie the acquisition, representation, processing, storage, communication of, and access to information, whether such information is encoded as bits in a computer memory or transcribed in genes and protein structures in a biological cell.[1] An alternate, more succinct definition of computer science is the study of automating algorithmic processes that scale. A computer scientist specializes in the theory of computation and the design of computational systems.
Last edited on
Topic archived. No new replies allowed.