creating two normal random variables with specified corrcoef(rho)

Hello everyone

i have question about generating correlated random variables...is there a way to generate x1(0, 1), x2(0, 1) which are normal to have rho = 0; or generate
x3(0, 1), x4(0, 1) to have rho = 0.75 or something else?

i tried this so far

1- independent normal generator:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
vector<double> uncorr_normal(double m, double s, int n)
{
	random_device seed;
	mt19937 gen{ seed() };

	normal_distribution<> dist{ m, s };

	vector<double> samples;
	for (int i = 0; i < n; i++)
	{
		samples.push_back(dist(gen));
	}
	return samples;
}


2- dependent normal generator:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
pair<vector<double>, vector<double>> 
corr_normal(double m1, double s1, double m2, double s2, double rho, int n)
{
	vector<double> X;
	vector<double> Y;

	random_device seed;
	mt19937 gen{ seed() };

	normal_distribution<> dist1{ m1, s1 };
	normal_distribution<> dist2{ m2, s2 };
	
	for (int i = 0; i < n; i++)
	{
		double x = dist1(gen);
		X.push_back(x);
		double y = rho * x + sqrt(1 - rho * rho) * dist2(gen);
		Y.push_back(y);
	}
	pair<vector<double>, vector<double>> pair(X, Y);
	return pair;
}


i measure the correlation coefficient by the function i implemented below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
double rho(vector<double>& X, vector<double>& Y)
{
  double sum_X = 0, sum_Y = 0, sum_XY = 0;
	double squareSum_X = 0, squareSum_Y = 0;
	//------------------------------------------
	size_t n = max(X.size(), Y.size());
	//------------------------------------------
	for (int i = 0; i < n; i++)
	{
		// sum of elements of array X.
		sum_X = sum_X + X[i];

		// sum of elements of array Y.
		sum_Y = sum_Y + Y[i];

		// sum of X[i] * Y[i].
		sum_XY = sum_XY + X[i] * Y[i];

		// sum of square of array elements.
		squareSum_X = squareSum_X + X[i] * X[i];
		squareSum_Y = squareSum_Y + Y[i] * Y[i];
	}

	// use formula for calculating correlation coefficient.
	double corr = (double)(n * sum_XY - sum_X * sum_Y)
		/ (double)(sqrt((n * squareSum_X - sum_X * sum_X)
			* (n * squareSum_Y - sum_Y * sum_Y)));

	//------------------------------------------
	return corr;
}


however if i generate two uncorrelated random variables and test them with rho function i dont get rho = 0;

and for correlated case if i plugged in the random correlated vectors i dont get the specified the rho either.

can you help me with this please?

best regards
In the uncorrelated case, how big is the sample and what value of correlation coefficient do you actually end up with? It would be highly unlikely to be exactly 0, just small. There are statistical tests on the correlation coefficient to determine if it is significantly different from zero.
i tested it with 100 and 1000 samples and end up with something between [0 0.15]
i dont expect exactly 0 but something in [0 , 0.05] ... what about the correlated case? my main problem is there? could you please guide me how to generate two normals with rho = r , or manipulate the existing normals to have rho = r?
Last edited on
I've checked your expression for Y in the correlated case and as far as I can see it is correct.

The standard deviation of the mean goes like 1/sqrt(n), which really falls off quite slowly with n. So you can expect quite a lot of scatter for n=100 or 1000.

Why don't you try with a decent size sample, say n=1000000? As far as I can see, your maths is correct.


According to
https://onlinecourses.science.psu.edu/stat501/node/259/
the relevant test statistic is
r*sqrt(n-2)/sqrt(1-r^2)
which you compare with the t distribution with n-2 degrees of freedom.
For the uncorrelated case, if r=0.15 and n=100, that will be about 1.5, which is nowhere near big enough to reject the null hypothesis of no correlation at any reasonable confidence level.
Basically, your sample size n is too small. You need sqrt(n) to be quite large to check statistical significance.
Last edited on
Topic archived. No new replies allowed.