Random numbers

Pages: 12 3

Hello. I have no questions today, but I need some explanation about this method so as to generate a random number. I made a little test using this code - an alternative to randomize a number :

std::pair <int, int> n(1, 10);
std::random_device seeder;
std::mt19937 engine(seeder());
std::uniform_int_distribution<int> d(n.first, n.second);
int nn = d(engine);

As you can understand, it generates a random number (Mersenne Twister 19937 generator) between two integers which I set in as a pair. It works as expected - and when "I throw my dice" 100 times, I have a real randomized output which I display this way showing numbers occurrences for 100 generations :


*************
*************
**************
**
*****
***********
*******
************
*************
**********

However, the more I increase the number of randomized processes, the more the output becomes flat. Mathematically it's very interesting. Do you have some explanation? I guess that there is an algorithm to explain this leveling. Take a look at my code and its output with a large computation - one million ++

#include <iostream>
#include <random>
#include <vector>

std::pair <int, int> n(1, 10);
std::vector<int> numbers;
int loop = 1000000; // before try with 100

int freq(std::vector<int> v, int i)
{
    return (std::count(v.begin(), v.end(), i) / (loop/100));
}

int main()
{
    std::random_device seeder;
    std::mt19937 engine(seeder());
    std::uniform_int_distribution<int> d(n.first, n.second);
    // polulate my vector with random numbers
    for (int i = 0; i < loop; i++)
    {
        int nn = d(engine);
        numbers.push_back(nn);
        //std::cout << nn << std::endl;
    }

    for (int j = n.first; j <= n.second; j++)
    {
        for (int k = 0; k < freq(numbers, j); k++)
            std::cout << '*';

        std::cout << std::endl;
    }
}


**********
**********
*********
**********
**********
*********
*********
*********
*********
**********

Last edited on

Geckoo (362)

May I say that with a very big amount of processes (x billions) I could have a really flat output ?

Peter87 (11264)

https://en.wikipedia.org/wiki/Law_of_large_numbers

salem c (3726)

The clue is in the name - uniform.
https://en.cppreference.com/w/cpp/numeric/random/uniform_int_distribution

If you want a different shape, then use a different distribution.
https://en.cppreference.com/w/cpp/named_req/RandomNumberDistribution

Geckoo (362)

@Peter87. Thank you for the link - I guess that I made an interesting application of the main principle. I have an interesting paper to read for my evening :)

@salem c. Thank you for the link too. There are many many interesting alternatives which I did not know. Gamma_distribution is really interesting.

https://en.cppreference.com/w/cpp/numeric/random/piecewise_linear_distribution

piecewise_linear_distribution is what I expected about randomized numbers, but it seems to me very superficial because of its settings ++

Last edited on

Peter87 (11264)

Geckoo wrote:
May I say that with a very big amount of processes (x billions) I could have a really flat output ?

No, not with that program, because if we assume loop is 1 billion then the expected value for each count would be 100 000 000. If the count is slightly greater (or equal) it will display 10 stars. If the count is slightly less (e.g. 99 999 999) it will display only 9 stars. The probability of these two are about the same so you are very likely to get 9 or 10 stars on each row, but whether you get 9 or 10 is basically fifty-fifty.

This is basically just an artefact caused by the way you translate the counts to stars. Rounding (e.g. displaying all numbers in the range [95 000 000, 105 000 000) as 10 stars) and/or using a higher resolution (e.g. 100 stars instead of 10) would help in making the result look flatter.

Last edited on

jonnin (11497)

Back when with rand() or hand-rolled RNG before <random> you had to DIY. There are probably examples out on the web of how to do that, or off in C code places, that may help you understand.

a really simple one..
int distro[] {1,2,2,3,3,3,4,4,4,4,5,5,5,5,5,6,6,6,6,6,7,7,7,7,8,8,8,9,9,10};
if I did that right 1/30 for 1 and 10,
1/15 for for 2 and 9,
1/10 for 3 and 8,
and so on ... a pretty steep bell curve.
distro[rand()%30] is your number...
thankfully we don't have to do that anymore: you can see from the list of standard distros or all the crap in a statistics book how many there are to deal with (though several are more or less identical in practice) not counting oddball ones for specific needs/data...
uniform, using that idea and assuming rand() actually worked right, would just be
distro[] {1,2,3,4,5,6,7,8,9,10} so each one has equal chances...

(random isn't populating an array and picking from it, you can do the above using math instead of picking from hard lists. hard lists are faster and how it was done frequently with ints or int like data, though, just for performance).

Last edited on

lastchance (6980)

@geckoo,

You divide your range into a fixed number of bins.

For any individual bin, the probability that a uniformly-distributed integer lands in it is p, where p=1/(number of bins). A "good" RNG should hit that, but that isn't the question being asked here.

This is a Bernoulli trial. If you ran it n times you would get a binomial distribution (for that particular bin) with mean np and variance npq (where q=1-p). The standard deviation is sqrt(npq).

The variability of your output depends on the ratio of standard deviation to mean. This is
sqrt(npq)/(np) or sqrt(q/p)/sqrt(n)
Importantly, this is INVERSELY PROPORTIONAL TO SQRT(n). Bigger n means smaller RELATIVE variation.

As an example, if you had 10 bins, then relative variation (standard deviation/mean) for any one particular bin is 3/sqrt(n).

Note that 1/sqrt(n) doesn't actually fall off all that fast - that is why Monte-Carlo methods aren't quite as great as one might have hoped. It is also why you would not expect perfect "flatness" for your output graph. With the 10-bin example above, the relative variation for one particular bin is still 0.003 even when n is one million. If you had 101 bins then the relative variation for a particular bin would be 0.01 when n was 1 million. So, if you want more "flatness" ... use less bins!

Last edited on

Geckoo (362)

Thank you for all your clever explanations. LLN is a very interesting case of study. I quickly coded another example which uses only two possibilities - heads or tails. The demonstration is more evidente. After this one, I did the same code, but using only rand() in order to generate a randomized number. In fact, what is the best method? rand() or mt19937 ? Which do you prefer?

#include <iostream>
#include <random>

void coins(int loop)
{
    int nTails = 0;
    int nHeads = 0;
    std::pair <int, int> n(0, 1);

    std::random_device seeder;
    std::mt19937 engine(seeder());
    std::uniform_int_distribution<int> d(n.first, n.second);

    for (int i = 0; i < loop; i++)
    {
        if (d(engine) == 0)
            nHeads++;
        else
            nTails++;
    }

    std::cout << "LLN test with " << loop << " coins" << std::endl;
    std::cout << "Heads -> " << nHeads << std::endl;
    std::cout << "Tails -> " << nTails << std::endl;

    long float disp;
    if ((nHeads >= nTails) ? disp = (long float(nHeads) / long float(nTails)) 
                           : disp = (long float(nTails) / long float(nHeads)));

    std::cout << "Dispersion % : " << (disp - 1) * 100 << std::endl;
    std::cout << std::endl;
}

int main()
{
    coins(100);
    coins(10000);
    coins(1000000);

    return 0;
}


LLN test with 100 coins
Heads -> 53
Tails -> 47
Dispersion % : 12.766

LLN test with 10000 coins
Heads -> 5022
Tails -> 4978
Dispersion % : 0.883889

LLN test with 1000000 coins
Heads -> 500246
Tails -> 499754
Dispersion % : 0.0984484

Last edited on

Geckoo (362)

The same, but with std::uniform_real_distribution :/

#include <iostream>
#include <random>

void coins(int loop)
{
    int nTails = 0;
    int nHeads = 0;
    std::pair <float, float> n(0, 1);

    std::random_device seeder;
    std::mt19937 engine(seeder());
    std::uniform_real_distribution<float> d(n.first, n.second);

    for (int i = 0; i < loop; ++i)
    {
        if (d(engine) > 0.5)
            nHeads++;
        else
            nTails++;
    }

    std::cout << "LLN test with " << loop << " coins" << std::endl;
    std::cout << "Heads -> " << nHeads << std::endl;
    std::cout << "Tails -> " << nTails << std::endl;

    long float disp;
    if ((nHeads >= nTails) ? disp = (long float(nHeads) / long float(nTails)) 
                           : disp = (long float(nTails) / long float(nHeads)));

    std::cout << "Dispersion % : " << (disp - 1) * 100 << std::endl;
    std::cout << std::endl;
}

int main()
{
    coins(100);
    coins(10000);
    coins(1000000);

    return 0;
}

kigar64551 (853)

*************
*************
**************
**
*****
***********
*******
************
*************
**********

This is because you are looking at only one sample of "100 dice rolls" ;-)

In fact, if we look at a single sample of "100 dice rolls", then it is perfectly possible (although unlikely) to get a sample where the number "1" occurred 100 times in a row and all other numbers did not occur at all!

If, instead, we compute the average (mean) number of occurrences from many samples of "100 dice rolls", then you will see that the average number of occurrences of each of the numbers quickly approaches 10.0:

int main()
{
    const std::pair <int, int> n(0, 9);
    std::random_device seeder;

    double means[100];
    memset(means, 0, sizeof(means));

    for (size_t round = 1; round < SIZE_MAX; ++round)
    {
        // Generate next sample of size 100
        std::mt19937 engine(seeder());
        std::uniform_int_distribution<int> dist(n.first, n.second);
        size_t counters[100];
        memset(counters, 0, sizeof(counters));
        for (int i = 0; i < 100; ++i)
        {
            const int value = dist(engine);
            ++counters[value];
        }

        // Update the mean number of occurrences (by using Welford's online algorithm)
        for (int i = 0; i < 10; ++i)
        {
            const double delta = static_cast<double>(counters[i]) - means[i];
            means[i] += delta / static_cast<double>(round);
        }

        // Print the current mean values (every 999983 rounds)
        if ((round % 999983U) == 1U)
        {
            printf("Means after %zu samples:\n", round);
            for (int i = 0; i < 10; ++i)
            {
                printf("means[%d] = %7.4f\n", i, means[i]);
            }
            puts("");
        }
    }
}

Means after 1 samples:
means[0] = 10.0000
means[1] =  6.0000
means[2] = 10.0000
means[3] =  7.0000
means[4] = 10.0000
means[5] =  8.0000
means[6] = 14.0000
means[7] = 14.0000
means[8] = 14.0000
means[9] =  7.0000

Means after 999984 samples:
means[0] =  9.9994
means[1] = 10.0041
means[2] =  9.9965
means[3] =  9.9978
means[4] =  9.9996
means[5] =  9.9962
means[6] = 10.0023
means[7] = 10.0016
means[8] = 10.0008
means[9] = 10.0018

[...]

Means after 2372959660 samples:
means[0] =  9.9999
means[1] =  9.9999
means[2] = 10.0000
means[3] = 10.0000
means[4] = 10.0001
means[5] = 10.0000
means[6] =  9.9999
means[7] = 10.0001
means[8] = 10.0000
means[9] = 10.0000

...and that is because with a uniform distribution of n numbers and with a sample size of k, the expected number of occurrences of each number is k/n. In your example, we obviously get: 100 / 10 = 10

Last edited on

deleted account xyzzy (5768)

Another thing to consider when using either the C or C++ random function there is only one "true" non-deterministic random generator in the lot. All others are pseudo (deterministic) random generators.

The one "true" non-deterministic generator (up to the implementation, of course) is std::random_device.

The pseudo random generators can have their sequences seeded, usually with some variant of the current local PC time or using std::random_device. Naturally, use the same seed and you get the same sequence of numbers.

C++ <random> offers a seed sequence so a PRNG can use more than a single seed.

New C++ code should never use the C library PRNG functions, even the C standard recommends if possible not using srand/rand.

https://web.archive.org/web/20180123103235/http://cpp.indi.frih.net/blog/2014/12/the-bell-has-tolled-for-rand/

C++ has an embarrassment of riches for generating random numbers.

The <random> library isn't designed for hard-core cryptography work.

Peter87 (11264)

George P wrote:
use the same seed and you get the same sequence of numbers

Note that if you want to rely on this you should not use the standard distributions (e.g. std::uniform_int_distribution) because it's not specified how they should work exactly so they might give different results on different implementations.

George P wrote:
C++ has an embarrassment of riches for generating random numbers.

I think what it lacks is a small and fast generator of relatively good quality that is quick to seed (e.g. PCG, Xorshift, etc.). At the moment it only really has std::mt19937 as a good alternative but that one is huge and is slow to seed.

Not being able to rely on the standard distributions to work the same everywhere is also a big issue for some use cases such as procedural generation (https://en.wikipedia.org/wiki/Procedural_generation) where you often want the same seed to always give the same result.

I'm not sure the standard should try to address these issues but as long as it doesn't it means there are many good reasons why people choose not to use the standard library random number facility.

Last edited on

jonnin (11497)

Is seed time that critical? Its usually done very few times. If you want to repeat a small sequence, save the sequence in a vector. Longer sequences, re-seed isn't that slow?

but yes, a faster quality generator would be nice.

Geckoo (362)

The next code works as expected generating some random values. It seems to me clever and efficient for little projects without any solid pretension, but I would like to know what you think about it...

#include <iostream>
#include <chrono>

int pseudo_rand(int min, int max) 
{
    auto x = std::chrono::steady_clock::now().time_since_epoch().count();
    return (x % ((max - min) + 1)) + min;
}

int main()
{
    std::cout << "Pseudo random Generator" << std::endl;

    for (int i = 0; i < 10; ++i)
        std::cout << "Number : " << pseudo_rand(0, 10) << std::endl;

    return 0;
}


Pseudo random Generator
Number : 2
Number : 10
Number : 8
Number : 3
Number : 1
Number : 6
Number : 3
Number : 7
Number : 0
Number : 0

Last edited on

jonnin (11497)

Ok, its terrible :)
First, modulo is known to give poorish results for randomized values. That alone is a concern, but
- your clock may not increment on some platforms between calls, so you could get the same value each iteration 2, 3 times in a row on some systems and...
- x is an incrementing sequence. Its not a great seed.
- this is basically a linear congruential random number generator except, not using primes and known good values, its prone to bad results.
- its slower than it has to be for its simplicity, because of getting the clock every time.

across the board, its not going to hold up well in the grand scheme of random number theory and advanced scientific testing.
however, in practice, its probably fine for a quick and dirty toy for less than scientific results.
Don't feel bad about the criticism ... writing a good PRNG is extremely difficult, and these pitfalls are only known because other people had the same ideas as you and tried them, only to have someone come behind them and study the flaws and look for a better way..

kigar64551 (853)

If you have a source that gives you uniformly distributed "random" numbers in the range from 0 to UINT32_MAX (or UINT64_MAX), then using the modulo function (%) to map those numbers to the 0 to N-1 range is fine – as long a N is significantly smaller than UINT32_MAX.

If N is close to UINT32_MAX, and if UINT32_MAX is not an exact multiple of N, then there will be a certain "bias", yes. But, again, as long as your N is significantly smaller than UINT32_MAX, you don't have to worry about the bias, as the bias would be negligible.

________

Your problem is a different one: The clock (seconds elapsed since "epoch") is not even close to being a good "random" source! 😱

It's even very possible that consecutive calls to your function pseudo_rand() give the exactly same result, because the clock has not advanced in between these calls! Keep in mind that the clock has a limited resolution and only "ticks" every so often!

________

If you need a simple PRNG that still produces "high quality"* random numbers, have a look at:

Middle Square Weyl Sequence PRNG (MSWS PRNG)
https://arxiv.org/pdf/1704.00358.pdf

32-Bit version:

#include <stdint.h>

uint64_t x = 0, w = 0, s = 0xb5ad4eceda1ce2a9;

inline static uint32_t msws32() {
   x *= x; x += (w += s); return x = (x>>32) | (x<<32);
}

64-Bit version:

#include <stdint.h>

uint64_t x1 = 0, w1 = 0, s1 = 0xb5ad4eceda1ce2a9;
uint64_t x2 = 0, w2 = 0, s2 = 0x278c5a4d8419fe6b;

inline static uint64_t msws64() {
   uint64_t xx;

   x1 *= x1; xx = x1 += (w1 += s1); x1 = (x1 >> 32) | (x1 << 32);
   x2 *= x2;      x2 += (w2 += s2); x2 = (x2 >> 32) | (x2 << 32);

   return xx ^ x2;
}

* in the sense that it passes scientific randomness tests, such as the TestU01 "Big Crush" suite

Last edited on

Peter87 (11264)

kigar64551 wrote:
Your problem here is a different one: The clock (seconds elapsed since "epoch") is not even close to being a "random" source !!!

The resolution of std::chrono::steady_clock is unlikely to be that bad.

But I agree, it's not a good random source.

To demonstrate I increased the maximum number to 1 000 000 ...

#include <iostream>
#include <chrono>

int pseudo_rand(int min, int max) 
{
	auto x = std::chrono::steady_clock::now().time_since_epoch().count();
	return (x % ((max - min) + 1)) + min;
}

int main()
{
	std::cout << "Pseudo random Generator" << std::endl;
	
	for (int i = 0; i < 10; ++i)
	{
		std::cout << "Number : " << pseudo_rand(0, 1000000) << std::endl;
	}
}

... and then I ran the program three times and got the following outputs:

Number : 460923
Number : 479778
Number : 484588
Number : 488453
Number : 491815
Number : 496048
Number : 500407
Number : 504606
Number : 508759
Number : 513074

Number : 633083
Number : 649724
Number : 651975
Number : 653616
Number : 655477
Number : 657054
Number : 658610
Number : 659978
Number : 662026
Number : 663529

Number : 969272
Number : 987384
Number : 989496
Number : 990778
Number : 992479
Number : 993877
Number : 995210
Number : 996550
Number : 997887
Number : 999910

Geckoo, do you see the problem?

Last edited on

Geckoo (362)

Hello. All your explanations are clever - and I cannot argue against you. It's bad - that's all. The clock gives a simple value which will be incremented if the range between min and max is larger than ticks. I think that for a little project like a rock paper scissors game, it fits with the goal - no more ++

kigar64551 (853)

But why use a solution that is already known to be bad, when you can have a good solution so easily?

E.g., the MSWS32 PRNG that I posted above can be implemented in one line of code and produces really "high quality" random numbers...

(It's also much faster than querying the system clock a zillion of times)

Heck, even a simple rand() % N would work a lot better than your time_since_epoch().count() % N 😏

Last edited on

Pages: 12 3