Normal distribution random values sum to 1

Hello everyone,
I need a small help. I need to generate random values between 0 and 1. They must be successively ordered. Let me give you an example. Starting index is 3 and ending index 5, so in an array who has 10 index, they must be ordered like this:

A=[0 0 0.2 0.3 0.5 0 0 0 0 0]

These values have to be randomly generated according to the normal distribution and their sum must be 1. How can I do that? I really struggle at this point. I will be very happy if you can help me.

[I seem to remember that this question has been asked before in this forum]
Yes, I had asked it but that time, any specific probability distribution is not defined. This time the difficulty comes from the existence of normal distribution for me.

Thank you so much
You just sample a normal distribution and divide each sample by the sum of all samples.
Last edited on
randomly generated according to the normal distribution and their sum must be 1


The normal distribution has two parameters - mean and standard deviation. You haven't specified either.

They won't be random if you force their sum to 1.0. If you generate three standard-normal variates (mean=0, sigma=1) and their sum happens to be 0 you will run into difficulty. You could set the mean equal to 1/(number_of_samples) but any rescaling will change the intended standard deviation and you still don't rule out the possibility that the sum will be 0 or negative.


Please post the original problem that you have been set, not your paraphrasing of it.
Last edited on
Thank you @mbozzi and @lastchance. @lastchance, actually, I do not paraphrase it. I do not know how to define mean and standard values of normal distribution to generate such list of values. You had answered this question previuosly:

#include <iostream>
#include <valarray>
#include <random>
#include <ctime>
using namespace std;

mt19937 gen( time( 0 ) );
uniform_real_distribution<double> dist( 0.0, 1.0 );

valarray<double> getValues( int N )
{
valarray<double> V(N);
for ( double &e : V ) e = dist( gen );
return V / V.sum();
}

int main()
{
int N;
cout << "How many numbers? "; cin >> N;
valarray<double> V = getValues( N );
for ( double e : V ) cout << e << ' ';
}
but here the values are uniformly distributed. @mbozzi, do you mean the thing we did here ? How will I do it for normal distrubution. Can't we say , for example , let's we need 5 values since their sum has to be 1 their mean 0.2. We can give a value for standard deviation. Is not it possible to generate such list in this way ? Thank you so much
@learner999,
Your post crossed with my editing. Note:
You could set the mean equal to 1/(number_of_samples) but any rescaling will change the intended standard deviation and you still don't rule out the possibility that the sum will be 0 or negative.


An alternative would be to ADD a correction (of (1-sum)/number_of_samples ) to each one, or, equivalently, set the original mean to 1/number_of_samples and then correct additively.

The problem with the normal distribution is that such a random variable can have any value between minus infinity and plus infinity. This isn't true of a uniform distribution, which must be constrained to lie between two limits.

However, whether you are forcing a sum multiplicatively or additively you are changing the distribution post event. I'm not convinced that is legitimately random. It is reducing the number of degrees of freedom by 1. And you still haven't told us what standard deviation to use.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#include <iostream>
#include <valarray>
#include <random>
#include <ctime>
using namespace std;

mt19937 gen( time( 0 ) );
normal_distribution<double> Z( 0.0, 1.0 );     // Standard normal Z~N(0,1)

valarray<double> getValues( int N )
{
   valarray<double> V(N);
   for ( double &e : V ) e = Z( gen );
   return V + ( 1 - V.sum() ) / N;
}

int main()
{
   int N;
   cout << "How many numbers? ";   cin >> N;
   valarray<double> V = getValues( N );
   for ( double e : V ) cout << e << ' ';
}


How many numbers? 3
0.632521 -0.731 1.09848


Note the negatives. Note also that this code is a quick illustration of one means of generating the requisite numbers, not how to splice them into a subrange of your array. Also, if you want them in ascending order you will have to use std::sort() on them.


I repeat: please post your original problem verbatim.


Incidentally, is this you or one of your pals?
https://python-forum.io/thread-35126.html
Last edited on
@lastchance. Yes, it is me. In pyton or c++ languages do not make difference. I just want to understand how can it be done. As I said, there is no original question. Even with the standart deviation information you ask, I do not know how to do that. Thank you so much
Well, "uniform distribution" (without fixed parameters) made some sort of sense, although fixing the sum reduces the number of degrees of freedom by one.

But since the normal distribution can go to plus or minus infinity it doesn't make as much sense.

What prompted you to try to do this, then, if it isn't a set exercise?
@lastchance. Thank you. Actually I work on a probability problem. I know that an event will happen between the 13 pm and 18 pm. I need to define a probability value for each hour. So the sum of the values must be 1 and the probability distribution is normal distribution.
Ah, I see!!!

You are NOT trying to produce a set of normally-distributed points!

You need the cumulative normal distribution, which you can get from the error function (erf(x)).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <iostream>
#include <iomanip>
#include <vector>
#include <cmath>
using namespace std;

// Cumulative distribution function of the standard normal
double CDFnormal( double x, double mu, double sigma )
{
   return 0.5 * ( 1.0 + erf( (x-mu)/sigma/sqrt(2.0) ) );
}

int main()
{
   const int hour1 = 13, hour2 = 17;        // First and last hour (time goes up to hour2+1)
   const double mean = 15.5, stddev = 1.0;  // For example

   vector<double> ProbHour(24,0);

   // Integrated probability (then normalised to 1)
   double sum = CDFnormal( hour2 + 1.0, mean, stddev ) - CDFnormal( hour1, mean, stddev );
   for ( int i = hour1; i <= hour2; i++ ) 
   {
      ProbHour[i] = ( CDFnormal( i + 1.0, mean, stddev ) - CDFnormal( i, mean, stddev ) ) / sum;
   }

   for ( int i = 0; i < 24; i++ )
   {
      cout << setw(2) << setfill('0') << i << ":00 - " << setw(2) << setfill('0') << i + 1 << ":00     "
           << ProbHour[i] << '\n';
   }
}


00:00 - 01:00     0
01:00 - 02:00     0
02:00 - 03:00     0
03:00 - 04:00     0
04:00 - 05:00     0
05:00 - 06:00     0
06:00 - 07:00     0
07:00 - 08:00     0
08:00 - 09:00     0
09:00 - 10:00     0
10:00 - 11:00     0
11:00 - 12:00     0
12:00 - 13:00     0
13:00 - 14:00     0.0613596
14:00 - 15:00     0.24477
15:00 - 16:00     0.38774
16:00 - 17:00     0.24477
17:00 - 18:00     0.0613596
18:00 - 19:00     0
19:00 - 20:00     0
20:00 - 21:00     0
21:00 - 22:00     0
22:00 - 23:00     0
23:00 - 24:00     0


Thanks @lastchance, you just taught me quite a bit.
@lastchance, millions of thanks. You taught me a lot as well. I understood it well , thank you so much one more time, you saved me . Maybe, it will be too much but I was wondering if I can keep only the two decimal after the point by keeping sum equal to 1. I tried to add setprecision(2) but apparently it did not work
Last edited on
Hello, @learner999.
Maybe I should go on a psychology course to learn how to ask you the right questions - it took a long time to work out what you were trying to do.


keep only the two decimal after the point by keeping sum equal to 1

Well, you sort of can - see below. However, each floating point value will be rounded independently, so once you have cut them off at 2 dp (you will need both fixed and setprecision) there is no guarantee that the rounded values will add to exactly 1.00. (For example, the independently rounded values below add up to 0.99). That is the nature of rounding. Note that this is a feature of how you present the output - the original array retains its much higher floating-point accuracy.

In return, are you going to tell us what these hour-by-hour probabilities are actually to represent?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <iostream>
#include <iomanip>
#include <vector>
#include <cmath>
using namespace std;

// Cumulative distribution function for the normal distribution N(mu,sigma)
double CDFnormal( double x, double mu, double sigma )
{
   return 0.5 * ( 1.0 + erf( (x-mu)/sigma/sqrt(2.0) ) );
}

int main()
{
   const int hour1 = 13, hour2 = 17;        // First and last hour (time goes up to hour2+1)
   const double mean = 15.5, stddev = 1.0;  // For example

   vector<double> ProbHour(24,0);

   // Integrated probability (then normalised to 1)
   double sum = CDFnormal( hour2 + 1.0, mean, stddev ) - CDFnormal( hour1, mean, stddev );
   for ( int i = hour1; i <= hour2; i++ ) 
   {
      ProbHour[i] = ( CDFnormal( i + 1.0, mean, stddev ) - CDFnormal( i, mean, stddev ) ) / sum;
   }

   for ( int i = 0; i < 24; i++ )
   {
      cout << setw(2) << setfill('0') << i << ":00 - " << setw(2) << setfill('0') << i + 1 << ":00     "
           << fixed << setprecision(2) << ProbHour[i] << '\n';         // <===== 
   }
}


00:00 - 01:00     0.00
01:00 - 02:00     0.00
02:00 - 03:00     0.00
03:00 - 04:00     0.00
04:00 - 05:00     0.00
05:00 - 06:00     0.00
06:00 - 07:00     0.00
07:00 - 08:00     0.00
08:00 - 09:00     0.00
09:00 - 10:00     0.00
10:00 - 11:00     0.00
11:00 - 12:00     0.00
12:00 - 13:00     0.00
13:00 - 14:00     0.06
14:00 - 15:00     0.24
15:00 - 16:00     0.39
16:00 - 17:00     0.24
17:00 - 18:00     0.06
18:00 - 19:00     0.00
19:00 - 20:00     0.00
20:00 - 21:00     0.00
21:00 - 22:00     0.00
22:00 - 23:00     0.00
23:00 - 24:00     0.00
Last edited on
@ lastchance,no, it is not you, it is me who could not explain the things clearly. Thank you so much for your understanding and patience. I was on it for a day it had made me stressed out really. I learnt many things from you since I subscribed this forum because I am really a learner.

I see, it might not be equal to 1 , I had guessed liked that but I wanted to get your ideas and the others to be sure. I knew to use setprecision , I realised that I misplaced it and forgot to write "fixed" in the code that's why it did not work when I tried it.

Actually , we try to simulate a bottleneck for a call center and we expect given number of calls with some probabilities between 13pm-18pm. For know we do not have real data, that's why I had to generate some probability values.
Last edited on
Topic archived. No new replies allowed.