variance & std deviation

I've come across two methods for calculating variance & std deviation.
When applying a single method to a set of test figures specific to each, neither comes out correct.
But when applying the test figures specific to each method, both come out correct.

The 2 methods are outlined as follows:
method 1: http://www.mathsisfun.com/data/standard-deviation.html
method 2: http://www.statisticshowto.com/how-to-find-the-sample-variance-and-standard-deviation-in-statistics/

Expected results are:
method 1: test{600,470,170,430, 300} variance = 21,704 std dev = 147.32..
method 2: test{3,21,98,203,17,9} variance = 6,219.9 std dev = 78.86..

Could somebody suggest which method to use?

code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include <iostream>     // std::cou
#include <algorithm>    // std::transform

double calc_sum_of_squares1(std::vector<double> & vld);
double calc_sum_of_squares2(std::vector<double> & vld);
int sq(int c, int d) {return c+d*d;}

int main () {
  std::vector<double> vld1 { 600,470,170,430, 300};
  std::vector<double> vld2 { 3, 21, 98, 203, 17, 9};

  double sum_of_squares = calc_sum_of_squares1(vld1); // http://www.mathsisfun.com/data/standard-deviation.html
  std::cout << "Variance 1: " << sum_of_squares/vld1.size() << " Std Deviation 1: " << sqrt(sum_of_squares/vld1.size()) << std::endl;

  sum_of_squares = calc_sum_of_squares2(vld2); // http://www.statisticshowto.com/how-to-find-the-sample-variance-and-standard-deviation-in-statistics/
  std::cout << "Variance 2: " << sum_of_squares/(vld2.size()-1) << " Std Deviation 2: " << sqrt(sum_of_squares/(vld2.size()-1)) << std::endl;

  return 0;
}

// http://www.mathsisfun.com/data/standard-deviation.html
double calc_sum_of_squares1(std::vector<double> & vld) {
  double mean = std::accumulate(begin(vld), end(vld), 0.0) / vld.size();
  std::vector<double> diff(vld.size());
  std::transform(begin(vld), end(vld), begin(diff), std::bind2nd(std::minus<double>(), mean));
  return std::inner_product(begin(diff), end(diff), begin(vld), 0.0);
}

// http://www.statisticshowto.com/how-to-find-the-sample-variance-and-standard-deviation-in-statistics/
double calc_sum_of_squares2(std::vector<double> & vld) {
  double sum = std::accumulate(begin(vld), end(vld), 0.0); // step1
  sum = (sum*sum) / vld.size(); // step 2
  sum = std::accumulate(begin(vld), end(vld), 0.0,  sq) - sum; // step3&4
  return sum; // step5&6 occur after return
}
Last edited on
By the look of it, variance 1 is the actual SAMPLE variance (which involves dividing by N).
Variance 2 is an unbiased estimate of the POPULATION variance (which involves dividing by N-1).

This is a statistics issue - not a C++ one: it depends what you are trying to do. If you intend to do any of the standard statistical tests (normal distribution or t-test) you will probably be using the latter.

Either method of calculating the sum of squared deviations about the mean will do. It's whether you subsequently divide by N or N-1 that produces the difference. If your sample consisted of the whole population then you would divide by N. If, as is more usual, you have a representative sample of a much larger population then you would divide by N-1.

If you live in the United States then, as found out earlier this month, most statistical theory might as well be thrown out of the window ...
Last edited on
yes, you are correct. I've just re-verified the figures using https://www.easycalculation.com/statistics/standard-deviation.php and both methods are in fact correct, but the second method applies to sample and the first to the whole population.

Thanks
Topic archived. No new replies allowed.