Maps and Bigram Probabilities

Hey everyone.

First just take a minute to read what the assignment is: https://drive.google.com/file/d/0B-t5ghDb_TCqVzRvNFkycTRjUzQ/view?pref=2&pli=1

I have written all the code. I expected it to work, but my probabilities always print out as 0.

Here is my full code below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
#include <iostream>
#include <fstream>
#include <map>
#include <utility>

using namespace std;

void CalcBigrams(ifstream&, string, string);
int main()
{
    ifstream file("Whitman-Leaves.txt");
    if(!file)
    {
        cout << "FILE NOT OPENED";
        return 0;
    }

    cout << "Enter two words: ";
    string w1, w2;
    cin >> w1 >> w2;

    CalcBigrams(file, w1, w2);

    return 0;
}

void CalcBigrams(ifstream& file, string w1, string w2)
{
    string word1, word2;

    map<pair<string, string>, int> m1;
    map<string, int> m2;
    map<pair<string, string>, double> m3;

    map<pair<string, string>, int>::iterator it1;
    map<string, int>::iterator it2;

    pair<string, string> p;

    // while loop reads in words unitl end of file
    file >> word1;
    while(file)
    {
        file >> word2;
        p = make_pair(word1, word2);
        m1[p]++; // insert to m1, or increment int value if key already exists
        m2[word1]++; // insert to m2, or increment int value if key already exists
        word1 = word2; // word2 needs to be re-used as word1 in the next pair
    }

    it1 = m1.begin();
    it2 = m2.begin();
    double prob;

    // keep loop until iterators reach the end
    while(it1 != m1.end() && it2 != m2.end())
    {
        prob = (1.0*it1->second)/it2->second; //int value of both maps divided.
                                            // multiply by 1.0 to make it a double
        m3[it1->first] = prob;  // insert pair of words into map as key, set   //the value part to prob. i.e. fill map3 with the probabilities
        it1++;
        it2++;
    }

    // this cout statement shows that m1 and m2 are being filled up by correct number
    p = make_pair(w1, w2);
    cout << m1[p] << " / " << m2[w1] << " = " << 1.0*m1[p]/m2[w1] << endl; // prints a decimal number

    cout << m3[p] << endl; // prints 0 always. Why??

}


That last cout statement should print the probability for the pair of words 'p'. Instead it always just prints 0.

I know m1 and m2 are filled correctly because their int values print fine and dividing them gives me the correct decimal number, as shown by the second-to-last cout statement.

I even know that m3 is filled up correctly because I iterated through it and printed the decimal numbers in it (which are the probabilities).

So why does m3[p] print 0 no matter what pair of words I enter?
Last edited on
Imagine that your input only contains 10 different words, but it contains every possible combination of pairs of those 10 words. That means that m1 will contain 100 elements, and m2 will contain 10 elements. The loop on line 56 iterates over both maps element by element until either one has been completely traversed. Meaning that in this example, m3 will end containing only 10 elements, instead of 100.
Can you figure out what the fix is?
Last edited on
I tried changing my while condition so it only exits when it1 has reached m1.end(). Program just crashed.

Don't know what to try anymore. Any hints?
Last edited on
AHA!

I figured it out.

Here is my updated while loop:

1
2
3
4
5
6
7
    while(it1 != m1.end())
    {
        prob = (1.0*it1->second)/m2[it1->first.first]; //int value of pair / int value of word1 for pair in m1
                                            // multiply by 1.0 to make it a double
        m3[it1->first] = prob;  // insert pair of words into map, set the vaue part to prob
        it1++;
    }


This seems to work. The probabilities in m3 seems to match the probability you get when dividing the int values of m1 and m2.

Yep. That's it.
Topic archived. No new replies allowed.