Whats wrong with my standard deviation code?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#include <iostream> 
#include <fstream> 
#include <cmath>

using namespace std; 

int main() 
{ 
    double x; 
    ifstream Input;  
    ofstream Output; 
    
	Input.open("population.txt");  
    
    if(Input.fail( ))
    { 
        cout << "Error opening input file."<<endl; 
        return 1;
    }
    
	Output.open("output.txt"); 
    
    if(Output.fail( )) 
    {    
        cout << "Error opening output file.."<<endl; 
        return 0; 
    } 

    // declaring variables 
    double total=0;
    double count=0;
	double sum = 0;
	double average=0;
	double standardDeviation = 0;

    while(!Input.eof())
    {
        Input>>x;
        total=total+x;
        count++;
    }

	//formulas 
    average= total / count;
	standardDeviation = sqrt((( count * pow(sum,2.0)) -(( 1.0/count) * (count *pow(sum,2.0)))) / (count));
	

	//cout<<"The average:" << average;
	cout<<"sum "<<total<<" count "<<count<<" average "<<average;
	cout << " The standard deviation is:  " << standardDeviation << endl;
    Output<<average;
    
    Input.close();     
    Output.close(); 
    
    return 0;

}  


I get 0, becuase i initialized it. Whats the correct formula

Looks like you've either combined several different methods of computing standard deviation, or have confused individual arith operations with summations.

The popular Standard Deviation formula is: squareroot of (Σ(x-average))/(count)
with x being each individual data point.

So, for each datapoint x, subtract the average from that, and sum up all those differences. Divide that sum by the total number of datapoints, take the squareroot of that, and you have your std dev.

You can also just divide by count-1 as opposed to count for a sample standard deviation. General rule of thumb, if you're calculating a standard deviation for some sample dataset of a large pool, use count-1. If you are calculating a standard deviation for a complete set of data, as opposed to only a sampling, then use count.

Alternatively, for easier programatic calculation the following formula could be used (to avoid having to retain all datapoints):

squareroot [(Σx2) - (Σx)2/(count * (count-1))]

Again, don't mix the individual operations with the summations. For the first sum, you square the datapoint, then add that to the sum. For the second sum, you sum up all the individual datapoints, then square that sum.
Then there's the fact that your current equation is sqrt((( count * pow(sum,2.0)) -(( 1.0/count) * (count *pow(sum,2.0)))) / (count)) only you never altered sum. You declare it as sum=0 and then never mention it again until the equation.

So standardDeviation=X*0.
Topic archived. No new replies allowed.