Standard deviation function refuses to work :(

Sep 6, 2012 at 2:50pm
I have already spent about an hour on this piece of code. There must be some sneaky bugs lurking somewhere. I implore you, o Wise One, please enlighten this ignorant newbie...


A bunch of integers are given as a vector, and their mean is known. I want to calculate their standard deviation.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

double sd(const std::vector<int> &results, double mean)
{
    int n = results.size();   // getting the sample size


//  summing over  (deviation)^2

    double temp = 0;

    for(int i = 0; i < n; i++)
    {
        temp += ((double) results[i] - mean)*( (double)results[i] - mean);
    };


//  this is the variance

    double temp2 = temp / (double)(n-1);


//  take sqrt for standard deviation

    return std::sqrt(temp2);
};




This code gave rubbish when I plugged it into my main program. I then tested it using random integers between [0, 10], and it always returns 1 no matter what...

(The mean has been fluctuating around 5, so I know the random number is working.)
Sep 6, 2012 at 3:24pm
Can you give an example? What output do you get and what output did you expect?
Sep 6, 2012 at 3:32pm
Looks fine to me. Perhaps you're not calling the function correctly, or storing the output in an int, or some other such error.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <vector>
#include <cmath>
#include <iostream>

using namespace std;


double sd(const std::vector<int> &results, double mean)
{
    int n = results.size();   // getting the sample size


//  summing over  (deviation)^2

    double temp = 0;

    for(int i = 0; i < n; i++)
    {
        temp += ((double) results[i] - mean)*( (double)results[i] - mean);
    };


//  this is the variance

    double temp2 = temp / (double)(n-1);


//  take sqrt for standard deviation

    return std::sqrt(temp2);
};

int main()
{
  vector<int> input(8);
  input[0] = 0; input[1] = 1; input[2] = 2; input[3] = 3; input[4] = 4; input[5] = 5; input[6] = 6; input[7] = 7;
  cout << sd(input, 3.5);
}



Sep 6, 2012 at 3:38pm
I do not see any problem with the code. Only it could be written more simply. For example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
double sd( const std::vector<int> &results, double mean )
{
	const std::vector<int>::size_type n = results.size();

	double temp = 0;

	for( std::vector<int>::size_type i = 0; i < n; i++ )
	{
		temp += ( results[i] - mean ) * ( results[i] - mean );
	}


//  take sqrt for standard deviation

	return ( std::sqrt( temp / ( n - 1 ) ) )l;
}


You should 1) look values of the vector; 2) look the return value

For example
1)
for ( auto x : results ) std::cout << x << ' ';
std::cout << std::endl;

2)
std::cout std::sqrt( temp / ( n - 1 ) ) << std:;endl;

I think that the problem is somewhere else in your program.
Last edited on Sep 6, 2012 at 3:43pm
Sep 6, 2012 at 3:46pm
1)
for ( auto x : results ) std::cout << x << ' ';
std::cout << std::endl;


That's if his compiler supports range-based for loops.
Sep 6, 2012 at 3:52pm
I generate random integer between [ 0, (max -1) ] using:

1
2
3
4
5
6
7
8

    vector<int> samples;

    for(int i = 0; i < sample_size; i++)
    {
        samples.push_back( ( rand()%max ) );
    };



The mean is calculated with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

double mean(const std::vector<int> &samples)
{
    // get sample size
    int n = samples.size();


    // add everything
    int temp = 0;

    for(int i = 0; i < n; i++)
    {
        temp += results[i];
    };


    // taking average
    return ((double) temp / n);
};



Regardless of what I use for max and sample_size, I always get 1 for standard deviation.


My expectation? Isn't it common knowledge that:

standard deviation ~ max / sqrt(sample_size) ?
Sep 6, 2012 at 3:54pm
Can you show how you call the sd function?
Sep 6, 2012 at 4:04pm
--edit--

Code is fixed.

The renaming of variables was to clean up some less-than-decent language included in the code out of my frustration. And well that inevitably introduced typos...

--edit--


Here's m main():


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

int main()
{
    int max;

    cout << "Random integer variable between 0 and: " << endl;
    cin >> max;

    max++;  // i want the range to be [0, max]



    srand( time(0) );



    int sample_size;

    cout << "sample size?" << endl;
    cin >> sample_size;



// generating the sample

    vector<int> samples;

    for(int i = 0; i < sample_size; i++)
    {
        samples.push_back( ( rand()%max ) );
    };


//  calling the functions

    double avg = mean(samples);
    double error = sd(samples, avg);



// output to console

    cout << "Mean is " << avg << endl;
    cout << "Uncertaity is " << sd << endl;

    return 0;

};
Last edited on Sep 6, 2012 at 4:20pm
Sep 6, 2012 at 4:11pm
Please post real code. Your gives compiler errors. You use results instead of samples in mean and in main you do i < samples.
Last edited on Sep 6, 2012 at 4:11pm
Sep 6, 2012 at 4:32pm
@wohtp


Did you read what I wrote?! So, please, do not bother any more the forum until you did not check values of the vector and the return value in your function.

I can suspect that values of the vector are not the same as you are assuming. Maybe you should substitute std::vector<int> for std::vector<double>
Last edited on Sep 6, 2012 at 4:37pm
Sep 6, 2012 at 8:27pm
cout << "Uncertaity is " << sd << endl;

sd is a function. I think you meant to print the error variable.


Also, you don't need to have semicolon after the function definitions.
Last edited on Sep 6, 2012 at 8:29pm
Topic archived. No new replies allowed.