Sorting a vector with a bool function (overloading?)

Hey there. I currently have program made that takes words from a textfile and print out how many times they appear.

I currently have two problems, the first being that it is sorted by alphabetical order instead of the number of times it appears. The second problem is that the words appears in several categories thanks to characters like .,! for example it says the word "you" appears 410 times and the word "you," 25 times.

So currently i am a good way to make file sorted but i am currently at a stand still as to how i should write my bool-function that sorts my list by the number of times it appears.

And also i currently have no idea how to remove the unwanted characters from the words. So any suggestions for that would be really helpful. Also please explain as you are talking to little child cause i am so new to this that i a lot of the times i dont understand the explanation given to me.

As you can see i am not sure how to word my bool function.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#include <iostream>
#include <fstream>
#include <string>
#include <map>
#include <vector>
#include <algorithm>

using namespace std;

ifstream book;
string tempbook;
map<string, int> words;
vector<pair<string, int> > wordsVec;

bool sorting(?????)
{
   ?????
}

int main()
{
  // Öppnar filen
  book.open("hitchhikersguide.txt");

  // Kollar att filen öppnades korrekt.
  if(!book.is_open())
    cout << "Filen öppnades inte korrekt." << endl;

  // Lägger in värdena i book till tempbook som sen kollar ifall ordet
  // finns i map:en words. Om det inte hittas lägger vi till värdet 1.
  // Hittas ordet lägger plussar vi på 1 till ordets värde.
  while (book >> tempbook)
  {
    if (words.find(tempbook) == words.end())
      words[tempbook] = 1;
    else
      words[tempbook]++;
  }

  // Lägger in orden och deras respektive värden som par i en vector.
  for (map<string, int>::iterator it = words.begin(); it != words.end(); ++it)
  {
    wordsVec.push_back(*it);
  }

  // Skriver ut vår vector.
  for (vector<pair<string, int> >::iterator i = wordsVec.begin(); i != wordsVec.end(); ++i)
  {
    cout << i->first << " finns i texten " << i->second << " gånger." << endl;
  }

  sort(wordsVec.begin(), wordsVec.end(), sorting);

  return 0;

}
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
#include <cctype>
#include <iostream>
#include <fstream>
#include <functional>
#include <string>
#include <map>
#include <vector>
#include <algorithm>

using namespace std;


// comparison function for sort:
bool by_frequency(const pair<string, int>& a, const pair<string, int>& b)
{
    return a.second < b.second;
}

// sanitize a string.  Keep alphanumeric characters, make it all the same case.
string sanitize(string txt)
{
    txt.erase(remove_if(txt.begin(), txt.end(), not1(isalnum)), txt.end());
    transform(txt.begin(), txt.end(), txt.begin(), tolower);
    return txt;
}

int main()
{
    // Öppnar filen
    std::ifstream book("hitchhikersguide.txt");     // prefer to define objects near first use.

    // Kollar att filen öppnades korrekt.
    if (!book.is_open())
        cout << "Filen öppnades inte korrekt." << endl;

    // Lägger in värdena i book till tempbook som sen kollar ifall ordet
    // finns i map:en words. Om det inte hittas lägger vi till värdet 1.
    // Hittas ordet lägger plussar vi på 1 till ordets värde.
    string tempbook;                    // prefer to define objects near first use
    map<string, int> words;
    while (book >> tempbook)
    {
        tempbook = sanitize(tempbook);
        //if (words.find(tempbook) == words.end())
        //    words[tempbook] = 1;
        //else
        //    words[tempbook]++;
        // value_types are default initialized (set to 0 in the case of int,) which means 
        // the following will b equivalent to the code commented out above:
        ++words[tempbook];
    }

    // Lägger in orden och deras respektive värden som par i en vector.
    //for (map<string, int>::iterator it = words.begin(); it != words.end(); ++it)
    //{
    //    wordsVec.push_back(*it);
    //}
    vector<pair<string,int>> wordsVec(words.begin(), words.end());  // prefer to define objects near first use.

    sort(wordsVec.begin(), wordsVec.end(), by_frequency);

    // Skriver ut vår vector.
    for (vector<pair<string,int>>::iterator i = wordsVec.begin(); i != wordsVec.end(); ++i)
    {
        cout << i->first << " finns i texten " << i->second << " gånger." << endl;
    }
}
Thanks a lot for the help.

I used your bool function for sorting and it worked like a charm and i am trying use your function for erase and transform. I did get the transform to work after little googling cause i got a error.

I think the tolower was accessing the wrong library so after i added "::" before tolower it worked perfectly.

however the erase part is not working for me. I think it has something to do with accessing the wrong library but i am not sure. I have tried so many different stuff to get it to work but i cant.

The error i am getting in the compiler right now is:
1
2
3
4
5
6
7
8
9
10
11
12
13
labb5.cpp: In function ‘int main()’:
labb5.cpp:44:76: error: no matching function for call to ‘not1(<unresolved overloaded function type>)’
     tempbook.erase(remove_if(tempbook.begin(), tempbook.end(), not1(isalnum)), tempbook.end());
                                                                            ^
labb5.cpp:44:76: note: candidate is:
In file included from /usr/include/c++/4.8/functional:49:0,
                 from labb5.cpp:2:
/usr/include/c++/4.8/bits/stl_function.h:369:5: note: template<class _Predicate> std::unary_negate<_Predicate> std::not1(const _Predicate&)
     not1(const _Predicate& __pred)
     ^
/usr/include/c++/4.8/bits/stl_function.h:369:5: note:   template argument deduction/substitution failed:
labb5.cpp:44:76: note:   couldn't deduce template parameter ‘_Predicate’
     tempbook.erase(remove_if(tempbook.begin(), tempbook.end(), not1(isalnum)), tempbook.end()); 


I am reading something about the candidate being functional, should it not be ctype, how can i change that? Also as you can see while i am doing the "error searching" i prefer to do it outside a function so the string function is currently under my while-loop.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#include <cctype>
#include <functional>
#include <iostream>
#include <fstream>
#include <string>
#include <map>
#include <vector>
#include <algorithm>

using namespace std;

ifstream book;
string tempbook;
map<string, int> words;
vector<pair<string, int> > wordsVec;

// Funktion som sorterar orden efter antal gånger de finns i texten.
bool SortByFreq(const pair<string, int> a, const pair<string, int> b)
{
    return a.second < b.second;
}

int main()
{
  // Öppnar filen
  book.open("hitchhikersguide.txt");

  // Kollar att filen öppnades korrekt.
  if(!book.is_open())
    cout << "Filen öppnades inte korrekt." << endl;

  // Lägger in värdena i book till tempbook som sen kollar ifall ordet
  // finns i map:en words. Om det inte hittas lägger vi till värdet 1.
  // Hittas ordet lägger plussar vi på 1 till ordets värde.
  while (book >> tempbook)
  {
    tempbook.erase(remove_if(tempbook.begin(), tempbook.end(), not1(isalnum)), tempbook.end());
    transform(tempbook.begin(), tempbook.end(), tempbook.begin(), ::tolower);

    if (words.find(tempbook) == words.end())
      words[tempbook] = 1;
    else
      words[tempbook]++;
  }
  // Lägger in orden och deras respektive värden som par i en vector.
  for (map<string, int>::iterator it = words.begin(); it != words.end(); ++it)
  {
    wordsVec.push_back(*it);
  }

  sort(wordsVec.begin(), wordsVec.end(), SortByFreq);

  // Skriver ut vår vector.
  for (vector<pair<string, int> >::iterator i = wordsVec.begin(); i != wordsVec.end(); ++i)
  {
    cout << i->first << " finns i texten " << i->second << " gånger." << endl;
  }

  return 0;

}


The line 1 here is what is giving me the error.
Line 2 i got to work but that way i only know one char to remove and it is also not the best way so i would prefer if i could use isalnum in someway.
1
2
    tempbook.erase(remove_if(tempbook.begin(), tempbook.end(), not1(isalnum)), tempbook.end());
    tempbook.erase(remove_if(tempbook.begin(), tempbook.end(), '.', tempbook.end());


I think the tolower was accessing the wrong library so after i added "::" before tolower it worked perfectly.

Sounds like your using namespace std; had some unintended consequences. It's probably the same issue you're experiencing with isalnum. Did you try qualifying the function? ::isalnum or std::isalnum?
I did get it to work when i tried this however now it removes all alpha/num characters and keeps characters like "." "," "!" and so on. So i am trying to make it the opposite from this.


1
2
//#include <functional>
 tempbook.erase(remove_if(tempbook.begin(), tempbook.end(), ::isalnum), tempbook.end());
What are you trying to erase? If you want to remove punctuation there is ispunct that you may want to try. And remember that these functions are in both the std namespace and the global namespace so if you use the "using namespace std;" clause you need to specify which you want to use by using the scope resolution operator::.

Look up the cctype header file documentation for more information as to the functions available.
Last edited on
Aha. std::not1 has some requirements I didn't expect, and compilation with VC++ didn't bark at me for some reason.

The simplest thing to do if you're using C++11, is to provide a lambda that does the work:

1
2
auto not_alnum = [](char ch) { return !std::isalnum(ch); };
tempbook.erase(remove_if(tempbook.begin(), tempbook.end(), not_alnum), tempbook.end());
Last edited on
Thanks cire!

I ended up using your code.

However i guessing it is the auto-part that is the reason that i need to use a g++11 compiler.
If you have time to explain the not_alumn function to me or (lambda?) that would be awesome because i see that it works which is really good but i also want to learn how/why it works.

Thanks :)

Also i noticied it returns empty space one time. like this.

"the" appears 1000 times
"and" appears 1001 times
" " appears 1002 times
"how" appears 1003 times.

This is just and example. I more wondering what the empty space is. I dont think i had it before using your solution.

And i kind of just thought of words like he's. I think they disappear now. Oh no.. :(
Last edited on
However i guessing it is the auto-part that is the reason that i need to use a g++11 compiler.
auto and the lambda itself. Both were introduced in C++11.
http://www.drdobbs.com/cpp/lambdas-in-c11/240168241


Also i noticied it returns empty space one time. like this.

I would guess that is some anomaly in the file. book >> tempbook should extract no whitespace at all. You might want to print out the integer value of the first character in that string to see if it actually is a space.


And i kind of just thought of words like he's. I think they disappear now. Oh no.. :(

Well, it is sanitized to "hes", but if that's not acceptable then adjust the lambda:
auto not_alnum = [](char ch) { return !std::isalnum(ch) && ch != '\''; };
Last edited on
Thanks for all the help cire! I got my program working and i understand the whole code behind it.

(Just as a quick notation. The "he's" did not become "hes" infact the word never showed up in the count).

Cheers!
- Zorac
Last edited on
Topic archived. No new replies allowed.