Random numbers

Hello. I have no questions today, but I need some explanation about this method so as to generate a random number. I made a little test using this code - an alternative to randomize a number :

1
2
3
4
5
std::pair <int, int> n(1, 10);
std::random_device seeder;
std::mt19937 engine(seeder());
std::uniform_int_distribution<int> d(n.first, n.second);
int nn = d(engine);


As you can understand, it generates a random number (Mersenne Twister 19937 generator) between two integers which I set in as a pair. It works as expected - and when "I throw my dice" 100 times, I have a real randomized output which I display this way showing numbers occurrences for 100 generations :


*************
*************
**************
**
*****
***********
*******
************
*************
**********


However, the more I increase the number of randomized processes, the more the output becomes flat. Mathematically it's very interesting. Do you have some explanation? I guess that there is an algorithm to explain this leveling. Take a look at my code and its output with a large computation - one million ++

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#include <iostream>
#include <random>
#include <vector>

std::pair <int, int> n(1, 10);
std::vector<int> numbers;
int loop = 1000000; // before try with 100

int freq(std::vector<int> v, int i)
{
    return (std::count(v.begin(), v.end(), i) / (loop/100));
}

int main()
{
    std::random_device seeder;
    std::mt19937 engine(seeder());
    std::uniform_int_distribution<int> d(n.first, n.second);
    // polulate my vector with random numbers
    for (int i = 0; i < loop; i++)
    {
        int nn = d(engine);
        numbers.push_back(nn);
        //std::cout << nn << std::endl;
    }

    for (int j = n.first; j <= n.second; j++)
    {
        for (int k = 0; k < freq(numbers, j); k++)
            std::cout << '*';

        std::cout << std::endl;
    }
}



**********
**********
*********
**********
**********
*********
*********
*********
*********
**********


Last edited on
May I say that with a very big amount of processes (x billions) I could have a really flat output ?
The clue is in the name - uniform.
https://en.cppreference.com/w/cpp/numeric/random/uniform_int_distribution

If you want a different shape, then use a different distribution.
https://en.cppreference.com/w/cpp/named_req/RandomNumberDistribution

@Peter87. Thank you for the link - I guess that I made an interesting application of the main principle. I have an interesting paper to read for my evening :)

@salem c. Thank you for the link too. There are many many interesting alternatives which I did not know. Gamma_distribution is really interesting.

https://en.cppreference.com/w/cpp/numeric/random/piecewise_linear_distribution

piecewise_linear_distribution is what I expected about randomized numbers, but it seems to me very superficial because of its settings ++
Last edited on
Geckoo wrote:
May I say that with a very big amount of processes (x billions) I could have a really flat output ?

No, not with that program, because if we assume loop is 1 billion then the expected value for each count would be 100 000 000. If the count is slightly greater (or equal) it will display 10 stars. If the count is slightly less (e.g. 99 999 999) it will display only 9 stars. The probability of these two are about the same so you are very likely to get 9 or 10 stars on each row, but whether you get 9 or 10 is basically fifty-fifty.

This is basically just an artefact caused by the way you translate the counts to stars. Rounding (e.g. displaying all numbers in the range [95 000 000, 105 000 000) as 10 stars) and/or using a higher resolution (e.g. 100 stars instead of 10) would help in making the result look flatter.
Last edited on
Back when with rand() or hand-rolled RNG before <random> you had to DIY. There are probably examples out on the web of how to do that, or off in C code places, that may help you understand.

a really simple one..
int distro[] {1,2,2,3,3,3,4,4,4,4,5,5,5,5,5,6,6,6,6,6,7,7,7,7,8,8,8,9,9,10};
if I did that right 1/30 for 1 and 10,
1/15 for for 2 and 9,
1/10 for 3 and 8,
and so on ... a pretty steep bell curve.
distro[rand()%30] is your number...
thankfully we don't have to do that anymore: you can see from the list of standard distros or all the crap in a statistics book how many there are to deal with (though several are more or less identical in practice) not counting oddball ones for specific needs/data...
uniform, using that idea and assuming rand() actually worked right, would just be
distro[] {1,2,3,4,5,6,7,8,9,10} so each one has equal chances...

(random isn't populating an array and picking from it, you can do the above using math instead of picking from hard lists. hard lists are faster and how it was done frequently with ints or int like data, though, just for performance).
Last edited on
@geckoo,

You divide your range into a fixed number of bins.

For any individual bin, the probability that a uniformly-distributed integer lands in it is p, where p=1/(number of bins). A "good" RNG should hit that, but that isn't the question being asked here.

This is a Bernoulli trial. If you ran it n times you would get a binomial distribution (for that particular bin) with mean np and variance npq (where q=1-p). The standard deviation is sqrt(npq).

The variability of your output depends on the ratio of standard deviation to mean. This is
sqrt(npq)/(np) or sqrt(q/p)/sqrt(n)
Importantly, this is INVERSELY PROPORTIONAL TO SQRT(n). Bigger n means smaller RELATIVE variation.

As an example, if you had 10 bins, then relative variation (standard deviation/mean) for any one particular bin is 3/sqrt(n).

Note that 1/sqrt(n) doesn't actually fall off all that fast - that is why Monte-Carlo methods aren't quite as great as one might have hoped. It is also why you would not expect perfect "flatness" for your output graph. With the 10-bin example above, the relative variation for one particular bin is still 0.003 even when n is one million. If you had 101 bins then the relative variation for a particular bin would be 0.01 when n was 1 million. So, if you want more "flatness" ... use less bins!

Last edited on
Thank you for all your clever explanations. LLN is a very interesting case of study. I quickly coded another example which uses only two possibilities - heads or tails. The demonstration is more evidente. After this one, I did the same code, but using only rand() in order to generate a randomized number. In fact, what is the best method? rand() or mt19937 ? Which do you prefer?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <iostream>
#include <random>

void coins(int loop)
{
    int nTails = 0;
    int nHeads = 0;
    std::pair <int, int> n(0, 1);

    std::random_device seeder;
    std::mt19937 engine(seeder());
    std::uniform_int_distribution<int> d(n.first, n.second);

    for (int i = 0; i < loop; i++)
    {
        if (d(engine) == 0)
            nHeads++;
        else
            nTails++;
    }

    std::cout << "LLN test with " << loop << " coins" << std::endl;
    std::cout << "Heads -> " << nHeads << std::endl;
    std::cout << "Tails -> " << nTails << std::endl;

    long float disp;
    if ((nHeads >= nTails) ? disp = (long float(nHeads) / long float(nTails)) 
                           : disp = (long float(nTails) / long float(nHeads)));

    std::cout << "Dispersion % : " << (disp - 1) * 100 << std::endl;
    std::cout << std::endl;
}

int main()
{
    coins(100);
    coins(10000);
    coins(1000000);

    return 0;
}



LLN test with 100 coins
Heads -> 53
Tails -> 47
Dispersion % : 12.766

LLN test with 10000 coins
Heads -> 5022
Tails -> 4978
Dispersion % : 0.883889

LLN test with 1000000 coins
Heads -> 500246
Tails -> 499754
Dispersion % : 0.0984484
Last edited on
The same, but with std::uniform_real_distribution :/

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <iostream>
#include <random>

void coins(int loop)
{
    int nTails = 0;
    int nHeads = 0;
    std::pair <float, float> n(0, 1);

    std::random_device seeder;
    std::mt19937 engine(seeder());
    std::uniform_real_distribution<float> d(n.first, n.second);

    for (int i = 0; i < loop; ++i)
    {
        if (d(engine) > 0.5)
            nHeads++;
        else
            nTails++;
    }

    std::cout << "LLN test with " << loop << " coins" << std::endl;
    std::cout << "Heads -> " << nHeads << std::endl;
    std::cout << "Tails -> " << nTails << std::endl;

    long float disp;
    if ((nHeads >= nTails) ? disp = (long float(nHeads) / long float(nTails)) 
                           : disp = (long float(nTails) / long float(nHeads)));

    std::cout << "Dispersion % : " << (disp - 1) * 100 << std::endl;
    std::cout << std::endl;
}

int main()
{
    coins(100);
    coins(10000);
    coins(1000000);

    return 0;
}
As you can understand, it generates a random number (Mersenne Twister 19937 generator) between two integers which I set in as a pair. It works as expected - and when "I throw my dice" 100 times, I have a real randomized output which I display this way showing numbers occurrences for 100 generations :

*************
*************
**************
**
*****
***********
*******
************
*************
**********


This is because you are looking at only one sample of "100 dice rolls" ;-)

In fact, if we look at a single sample of "100 dice rolls", then it is perfectly possible (although unlikely) to get a sample where the number "1" occurred 100 times in a row and all other numbers did not occur at all!

If, instead, we compute the average (mean) number of occurrences from many samples of "100 dice rolls", then you will see that the average number of occurrences of each of the numbers quickly approaches 10.0:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
int main()
{
    const std::pair <int, int> n(0, 9);
    std::random_device seeder;

    double means[100];
    memset(means, 0, sizeof(means));

    for (size_t round = 1; round < SIZE_MAX; ++round)
    {
        // Generate next sample of size 100
        std::mt19937 engine(seeder());
        std::uniform_int_distribution<int> dist(n.first, n.second);
        size_t counters[100];
        memset(counters, 0, sizeof(counters));
        for (int i = 0; i < 100; ++i)
        {
            const int value = dist(engine);
            ++counters[value];
        }

        // Update the mean number of occurrences (by using Welford's online algorithm)
        for (int i = 0; i < 10; ++i)
        {
            const double delta = static_cast<double>(counters[i]) - means[i];
            means[i] += delta / static_cast<double>(round);
        }

        // Print the current mean values (every 999983 rounds)
        if ((round % 999983U) == 1U)
        {
            printf("Means after %zu samples:\n", round);
            for (int i = 0; i < 10; ++i)
            {
                printf("means[%d] = %7.4f\n", i, means[i]);
            }
            puts("");
        }
    }
}
Means after 1 samples:
means[0] = 10.0000
means[1] =  6.0000
means[2] = 10.0000
means[3] =  7.0000
means[4] = 10.0000
means[5] =  8.0000
means[6] = 14.0000
means[7] = 14.0000
means[8] = 14.0000
means[9] =  7.0000

Means after 999984 samples:
means[0] =  9.9994
means[1] = 10.0041
means[2] =  9.9965
means[3] =  9.9978
means[4] =  9.9996
means[5] =  9.9962
means[6] = 10.0023
means[7] = 10.0016
means[8] = 10.0008
means[9] = 10.0018

[...]

Means after 2372959660 samples:
means[0] =  9.9999
means[1] =  9.9999
means[2] = 10.0000
means[3] = 10.0000
means[4] = 10.0001
means[5] = 10.0000
means[6] =  9.9999
means[7] = 10.0001
means[8] = 10.0000
means[9] = 10.0000


...and that is because with a uniform distribution of n numbers and with a sample size of k, the expected number of occurrences of each number is k/n. In your example, we obviously get: 100 / 10 = 10
Last edited on
Another thing to consider when using either the C or C++ random function there is only one "true" non-deterministic random generator in the lot. All others are pseudo (deterministic) random generators.

The one "true" non-deterministic generator (up to the implementation, of course) is std::random_device.

The pseudo random generators can have their sequences seeded, usually with some variant of the current local PC time or using std::random_device. Naturally, use the same seed and you get the same sequence of numbers.

C++ <random> offers a seed sequence so a PRNG can use more than a single seed.

New C++ code should never use the C library PRNG functions, even the C standard recommends if possible not using srand/rand.

https://web.archive.org/web/20180123103235/http://cpp.indi.frih.net/blog/2014/12/the-bell-has-tolled-for-rand/

C++ has an embarrassment of riches for generating random numbers.

The <random> library isn't designed for hard-core cryptography work.
George P wrote:
use the same seed and you get the same sequence of numbers

Note that if you want to rely on this you should not use the standard distributions (e.g. std::uniform_int_distribution) because it's not specified how they should work exactly so they might give different results on different implementations.

George P wrote:
C++ has an embarrassment of riches for generating random numbers.

I think what it lacks is a small and fast generator of relatively good quality that is quick to seed (e.g. PCG, Xorshift, etc.). At the moment it only really has std::mt19937 as a good alternative but that one is huge and is slow to seed.

Not being able to rely on the standard distributions to work the same everywhere is also a big issue for some use cases such as procedural generation (https://en.wikipedia.org/wiki/Procedural_generation) where you often want the same seed to always give the same result.

I'm not sure the standard should try to address these issues but as long as it doesn't it means there are many good reasons why people choose not to use the standard library random number facility.
Last edited on
Is seed time that critical? Its usually done very few times. If you want to repeat a small sequence, save the sequence in a vector. Longer sequences, re-seed isn't that slow?

but yes, a faster quality generator would be nice.
Registered users can post here. Sign in or register to post.