Program isn't working with big numbers (or maybe small) + floating point exception?

Hello,

I've been messing around with a binomial probability program. I've been learning A-level stats (S1), but the book I have doesn't have a statistical table. Since i've been messing around with C++ too, I thought this would be the perfect opportunity to test what I know. Unfortunately, I keep running into problems!

So far, this has demonstrated that I know the basics of C++, but I seem to fall down on the details.

This is what i've written:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
 /*

Calculates the probabilities of getting exactly i things in a sample size of n, if there is p chance of getting one of the things in the first place.

(ie. if there are 3 lightbulbs, and there is a 0.25 chance of a bulb
being broken, then outputs:

P(x=0)
P(x=1)
P(x=2)
P(x=3)

)

Outputs single probabilities and cumulative.

*/

#include <iostream>
#include <cmath>
#include<iomanip>

using namespace std;

int fact(int n);
int C(int n, int r);

int main(){ 

			int n;
			cout << "Please enter the sample size: ";
			cin >> n;

			double p;
			cout << "Enter the probability of a certain behaviour occurring: ";
			cin >> p;

			double k = 0;
			int i;
			double j;

					for(i = 0; i <= n; i++){
	
					j = C(n,i)*pow(p,i)*pow(1-p,n-i);
					
					cout << "P(x = " << i <<  ")=" << j; 
					cout << setw(30);
					cout << "P(x <= " << i << ")=" << j+k;
					cout << '\n';

					k = j+k;

			} 

	return 0;

}

int fact(int n){

	int t, answer;
	
	answer = 1;
	for(t=1;t<=n;t++) answer = answer * t;
	
	return(answer);
}

int C(int n, int r){

		int a;

		a = fact(n) / (fact(r)*fact(n - r) );

		return a;

}


I've posted this program before here:

http://www.cplusplus.com/forum/beginner/171139/

but since this is a new problem, I thought i'd make a new topic.

I've changed the program to output the individual probabilities on the left, and the cumulative probabilities on the right.

Here's an example of what I get:


name@computer ~/Documents $ ./binomial
Please enter the sample size: 8
Enter the probability of a certain behaviour occurring: 0.5
P(x = 0)=0.00390625                       P(x <= 0)=0.00390625
P(x = 1)=0.03125                       P(x <= 1)=0.0351562
P(x = 2)=0.109375                       P(x <= 2)=0.144531
P(x = 3)=0.21875                       P(x <= 3)=0.363281
P(x = 4)=0.273438                       P(x <= 4)=0.636719
P(x = 5)=0.21875                       P(x <= 5)=0.855469
P(x = 6)=0.109375                       P(x <= 6)=0.964844
P(x = 7)=0.03125                       P(x <= 7)=0.996094
P(x = 8)=0.00390625                       P(x <= 8)=1
 


This is what I should get. The individual probabilities on the left are all positive, and they add up to 1.

You can see this on a statistical binomial table found here:

http://i.stack.imgur.com/2Blui.png

(look at n=8 on the left hand column, and 0.5 on the top).

However, let's try a "big" number like 20:


name@computer ~/Documents $ ./binomial
Please enter the sample size: 20
Enter the probability of a certain behaviour occurring: 0.5
P(x = 0)=9.53674e-07                       P(x <= 0)=9.53674e-07
P(x = 1)=-1.81198e-05                       P(x <= 1)=-1.71661e-05
P(x = 2)=9.53674e-07                       P(x <= 2)=-1.62125e-05
P(x = 3)=9.53674e-07                       P(x <= 3)=-1.52588e-05
P(x = 4)=-1.90735e-06                       P(x <= 4)=-1.71661e-05
P(x = 5)=0.00207329                       P(x <= 5)=0.00205612
P(x = 6)=-9.53674e-07                       P(x <= 6)=0.00205517
P(x = 7)=-1.90735e-06                       P(x <= 7)=0.00205326
P(x = 8)=9.53674e-07                       P(x <= 8)=0.00205421
P(x = 9)=9.53674e-07                       P(x <= 9)=0.00205517
P(x = 10)=1.04904e-05                       P(x <= 10)=0.00206566
P(x = 11)=9.53674e-07                       P(x <= 11)=0.00206661
P(x = 12)=9.53674e-07                       P(x <= 12)=0.00206757
P(x = 13)=-1.90735e-06                       P(x <= 13)=0.00206566
P(x = 14)=-9.53674e-07                       P(x <= 14)=0.0020647
P(x = 15)=0.00207329                       P(x <= 15)=0.00413799
P(x = 16)=-1.90735e-06                       P(x <= 16)=0.00413609
P(x = 17)=9.53674e-07                       P(x <= 17)=0.00413704
P(x = 18)=9.53674e-07                       P(x <= 18)=0.00413799
P(x = 19)=-1.81198e-05                       P(x <= 19)=0.00411987
P(x = 20)=9.53674e-07                       P(x <= 20)=0.00412083 


This is going wrong! For some reason we're getting negative numbers on the left. This has a knock on effect when they're added meaning the individual probabilities don't add up to 1.

I have a feeling that the problem may have to do with small probabilities. By this I mean that the probabilities get so small, they no longer count as doubles.

Also, if you put an n value in that is bigger than 32, you get:


name@computer ~/Documents $ ./binomial
Please enter the sample size: 50
Enter the probability of a certain behaviour occurring: 0.1
Floating point exception 


I'm used to the compiler picking up errors, what's this on about?
Last edited on
A signed integer can only hold 31 bits plus 1 bit for the sign of the number.
1310! = 1011100110010100011001100000000002
This is already over the limit. So your factorial function is limited by this since it returns integers.
Ah! I thought it was something like that.

I've run into a problem like this before, and the solution was to use long long int and long double.

With a little messing about I get this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
/*

Calculates the probabilities of getting exactly i things in a sample size of n, if there is p chance of getting one of the things in the first place.

(ie. if there are 3 lightbulbs, and there is a 0.25 chance of a bulb
being broken, then outputs:

P(x=0)
P(x=1)
P(x=2)
P(x=3)

)

Outputs single probabilities and cumulative.

*/

#include <iostream>
#include <cmath>
#include<iomanip>

using namespace std;

long long int fact(int n);
long long int C(int n, int r);

int main(){ 

			int n;
			cout << "Please enter the sample size: ";
			cin >> n;

			double p;
			cout << "Enter the probability of a certain behaviour occurring: ";
			cin >> p;

			long double k = 0;
			int i;
			long double j;

					for(i = 0; i <= n; i++){
	
					j = C(n,i)*pow(p,i)*pow(1-p,n-i);
					
					cout << "P(x = " << i <<  ")=" << j; 
					cout << setw(30);
					cout << "P(x <= " << i << ")=" << j+k;
					cout << '\n';

					k = j+k;

			} 

	return 0;

}

long long int fact(int n){

	int t;
	long long int answer;
	
	answer = 1;
	for(t=1;t<=n;t++) answer = answer * t;
	
	return(answer);
}

long long int C(int n, int r){

		long long int a;

		a = fact(n) / (fact(r)*fact(n - r) );

		return a;

} 


..and, with n=20 I get:


name@computer ~/Documents $ ./binomial
Please enter the sample size: 20 
Enter the probability of a certain behaviour occurring: 0.1
P(x = 0)=0.121577                       P(x <= 0)=0.121577
P(x = 1)=0.27017                       P(x <= 1)=0.391747
P(x = 2)=0.28518                       P(x <= 2)=0.676927
P(x = 3)=0.19012                       P(x <= 3)=0.867047
P(x = 4)=0.0897788                       P(x <= 4)=0.956826
P(x = 5)=0.0319214                       P(x <= 5)=0.988747
P(x = 6)=0.00886704                       P(x <= 6)=0.997614
P(x = 7)=0.00197045                       P(x <= 7)=0.999584
P(x = 8)=0.000355776                       P(x <= 8)=0.99994
P(x = 9)=5.27076e-05                       P(x <= 9)=0.999993
P(x = 10)=6.44204e-06                       P(x <= 10)=0.999999
P(x = 11)=6.50711e-07                       P(x <= 11)=1
P(x = 12)=5.4226e-08                       P(x <= 12)=1
P(x = 13)=3.70776e-09                       P(x <= 13)=1
P(x = 14)=2.05987e-10                       P(x <= 14)=1
P(x = 15)=9.15496e-12                       P(x <= 15)=1
P(x = 16)=3.1788e-13                       P(x <= 16)=1
P(x = 17)=8.3106e-15                       P(x <= 17)=1
P(x = 18)=1.539e-16                       P(x <= 18)=1
P(x = 19)=1.8e-18                       P(x <= 19)=1
P(x = 20)=1e-20                       P(x <= 20)=1
 


Wahey! Out of the park!

However, this has just kicked the can down the road:


name@computer ~/Documents $ ./binomial
Please enter the sample size: 40
Enter the probability of a certain behaviour occurring: 0.5
P(x = 0)=9.09495e-13                       P(x <= 0)=9.09495e-13
P(x = 1)=0                       P(x <= 1)=9.09495e-13
P(x = 2)=0                       P(x <= 2)=9.09495e-13
P(x = 3)=0                       P(x <= 3)=9.09495e-13
P(x = 4)=0                       P(x <= 4)=9.09495e-13
P(x = 5)=0                       P(x <= 5)=9.09495e-13
P(x = 6)=0                       P(x <= 6)=9.09495e-13
P(x = 7)=1.81899e-12                       P(x <= 7)=2.72848e-12
P(x = 8)=0                       P(x <= 8)=2.72848e-12
P(x = 9)=0                       P(x <= 9)=2.72848e-12
P(x = 10)=0                       P(x <= 10)=2.72848e-12
P(x = 11)=0                       P(x <= 11)=2.72848e-12
P(x = 12)=0                       P(x <= 12)=2.72848e-12
P(x = 13)=0                       P(x <= 13)=2.72848e-12
P(x = 14)=0                       P(x <= 14)=2.72848e-12
P(x = 15)=0                       P(x <= 15)=2.72848e-12
P(x = 16)=0                       P(x <= 16)=2.72848e-12
P(x = 17)=0                       P(x <= 17)=2.72848e-12
P(x = 18)=0                       P(x <= 18)=2.72848e-12
P(x = 19)=0                       P(x <= 19)=2.72848e-12
P(x = 20)=0                       P(x <= 20)=2.72848e-12
P(x = 21)=0                       P(x <= 21)=2.72848e-12
P(x = 22)=0                       P(x <= 22)=2.72848e-12
P(x = 23)=0                       P(x <= 23)=2.72848e-12
P(x = 24)=0                       P(x <= 24)=2.72848e-12
P(x = 25)=0                       P(x <= 25)=2.72848e-12
P(x = 26)=0                       P(x <= 26)=2.72848e-12
P(x = 27)=0                       P(x <= 27)=2.72848e-12
P(x = 28)=0                       P(x <= 28)=2.72848e-12
P(x = 29)=0                       P(x <= 29)=2.72848e-12
P(x = 30)=0                       P(x <= 30)=2.72848e-12
P(x = 31)=0                       P(x <= 31)=2.72848e-12
P(x = 32)=0                       P(x <= 32)=2.72848e-12
P(x = 33)=1.81899e-12                       P(x <= 33)=4.54747e-12
P(x = 34)=0                       P(x <= 34)=4.54747e-12
P(x = 35)=0                       P(x <= 35)=4.54747e-12
P(x = 36)=0                       P(x <= 36)=4.54747e-12
P(x = 37)=0                       P(x <= 37)=4.54747e-12
P(x = 38)=0                       P(x <= 38)=4.54747e-12
P(x = 39)=0                       P(x <= 39)=4.54747e-12
P(x = 40)=9.09495e-13                       P(x <= 40)=5.45697e-12 


So what's the industry standard for getting around ever increasing numbers?

Do I just keep writing "long long long long long int"? It seems a little clumsy.
Re: runtime error.

Slightly changed your program for debugging purposes and this is what I get in debugger:

1
2
3
4
5
6
7
int C(int n, int r)
{
    int num = fact(n);
    int den = (fact(r)*fact(n - r) );
/**/int a = num / den; //Program stopped here
    return a;
}
Function arguments
    n         50
    r         0
Local variables
    a         16777216
    num       0
    den       0
http://puu.sh/juO6p/351ac81b46.png
As you can see, you are trying to divide by 0.

Why the result of factorial is 0? No idea and do not even want to know. Probably there is enough 2's to make total to have 32 0 in least significant bits, or it is byproduct of signed overflow UB.

To somewhat postpone the problem use [unsigned] long long as data type. To postpone it even more, do not use naive function and optimise it at least slightly: 100! / 102! == 101*102, there is no need to calculate both factorials completely.


EDIT:
So what's the industry standard for getting around ever increasing numbers?
Get a biginteger library, or optimise formulas so those large numbers will not exist in the first place.
Last edited on
My solution for doing large factorials is J
   ! 120x
6689502913449127057588118054090372586752746333138029810295671352301633557244962989366874165271984981308157637893214090552534408589408121859898481114389650005964960521256960000000000000000000000000000
   200 ! 300x   NB. 300C200
4158251463258564744783383526326405580280466005743648708663033657304756328324008620

http://jsoftware.com
Slightly changed your program for debugging purposes


You make it sound like i've written a highly professional program!

To somewhat postpone the problem use [unsigned] long long as data type. To postpone it even more, do not use naive function and optimise it at least slightly: 100! / 102! == 101*102, there is no need to calculate both factorials completely.


I thought that too but realised, like you say, that it only kicks the can even further down the road. Ultimately, I want this program to work for n=1000 so I can compare it's accuracy with a normal distribution approximation to the binomial distribution.

My solution for doing large factorials is J
! 120x
6689502913449127057588118054090372586752746333138029810295671352301633557244962989366874165271984981308157637893214090552534408589408121859898481114389650005964960521256960000000000000000000000000000
200 ! 300x NB. 300C200
4158251463258564744783383526326405580280466005743648708663033657304756328324008620

http://jsoftware.com


This is interesting! Most of the programming I do is for maths.

I'll dabble in J after I get reacquainted with C++. At the moment i'd rather be better in a more general programming language than one that focuses on a particular area.
Do not use naive approach using factorials then. There are other ways to calculate NCR, and some of them are specially optimized to work fast on computers.

Look there for example: http://stackoverflow.com/a/9331125
fumbles22 wrote:
I'll dabble in J after I get reacquainted with C++. At the moment i'd rather be better in a more general programming language than one that focuses on a particular area.
jsoftware wrote:
J (J language) is a high-level, general-purpose, high-performance programming language.

I have always felt that learning other languages has benefited1 me as a programmer. It is possible to learn a new perspective and a different way to approach a problem based on the available language tools. In the industry languages come and go, and there are times where you may need to communicate or modify an older program written in a different language. My number one reason that I like learning new languages is, I become bored quickly if I work my mind on the same thing for too long.

1There are a few exceptions to this. Some languages are just meant to be entertaining(Shakespeare) or puzzling(Malbolge). https://en.wikipedia.org/wiki/Esoteric_programming_language
Last edited on
Topic archived. No new replies allowed.