Ok, I'm back with more cool stuff! :D
===========================================================
-m4ster r0shi, rand()%n is evil... What should I do?...
-Fear not, young programmer! Just increase your RAND_MAX...
===========================================================
The problem with rand()%n is that the distribution it gives (from 0 to n-1) is NOT uniform, when RAND_MAX+1 is not divisible by n. The problem becomes greater as n gets closer to RAND_MAX. And of course if n is greater than RAND_MAX+1 it's a total disaster... But let's take a closer look and see why this happens...
Suppose your rand() function draws numbers from the following sequence:
{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
Here, we have a period of 16 and a RAND_MAX of 15.
The above sequence modulo 3 gives:
{0,1,2,0,1,2,0,1,2,0,1,2,0,1,2,0}
The frequencies of 0,1,2 are:
0 -> 6 times
1 -> 5 times
2 -> 5 times
As you see, we have a problem here because 0 appears more frequently than the other two values. The higher/lower frequency ratio is 6/5=1.2
Now, suppose that our rand() function draws numbers from this sequence:
{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25}
Here we have a period of 26 and a RAND_MAX of 25.
The above sequence modulo 3 gives:
{0,1,2,0,1,2,0,1,2,0,1,2,0,1,2,0,1,2,0,1,2,0,1,2,0,1}
The frequencies are:
0 -> 9
1 -> 9
2 -> 8
The hi/lo frequency ratio here is 9/8=1.125
As you can see, the problem is bigger when n is close to RAND_MAX. One can solve this problem in various ways. But I'll post here one I came up with recently and I find very simple and effective. What I actually do is wrap the given rand() function in another one (that I call improved_rand() :P ) in a way that increases the initial rand()'s RAND_MAX value. But let's see how exactly this can be done.
Given the initial sequence rand() draws numbers from,
{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
I create 4 sequences like this:
sub_seq_1 -> {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
sub_seq_2 -> {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15} + 16 (i.e. add 16 to each element)
sub_seq_3 -> {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15} + 2*16 (same here...)
sub_seq_4 -> {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15} + 3*16 (...and here)
Now, in every call to improved_rand() I make two calls to rand(). One to pick the sequence I'll use (rand()%4, notice that we don't have a problem here, since 4*4=16=RAND_MAX+1), and one to pick a random number from the chosen sequence. But ofc, in order to not "break" the distribution of the rand() function doing this, I use 5 ints to keep track of the 5 different seeds of these processes (1 for the sub-sequence selection process and 4 for the sub-sequences themselves).
I made a simple program demonstrating this theory in action ;) Plz, take a look and tell me what you think. I've been also wondering if something like this would look nice in the article section... Of course if I do it I'll also put other ways of solving this problem and compare them to each other.
Here comes the source code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86
|
#include <iostream.h>
#include <stdlib.h>
void my_srand(unsigned int s);
unsigned int my_rand();
unsigned int my_seed;
const unsigned int MY_RAND_MAX=31;
unsigned int my_seq[]=
{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31};
unsigned int my_improved_rand();
const unsigned int count=1000;
int main()
{
unsigned int n;
cout << "enter n (<=" << MY_RAND_MAX+1 << "): ";
cin >> n;
unsigned int * arr1=new unsigned int[n];
unsigned int * arr2=new unsigned int[n];
int i;
for (i=0; i<n; i++) arr1[i]=arr2[i]=0;
my_srand(0);
for (i=0; i<count; i++)
arr1[my_rand()%n]++;
my_srand(0);
for (i=0; i<count; i++)
arr2[my_improved_rand()%n]++;
cout << "method 1:\n";
for (i=0; i<n; i++)
cout << i << " -> " << arr1[i] << " time(s)\n";
cout << endl;
cout << "method 2:\n";
for (i=0; i<n; i++)
cout << i << " -> " << arr2[i] << " time(s)\n";
cout << endl;
return 0;
}
void my_srand(unsigned int s)
{
my_seed=s;
my_seed%=MY_RAND_MAX+1;
}
unsigned int my_rand()
{
unsigned int result;
result=my_seq[my_seed++];
my_seed%=MY_RAND_MAX+1;
return result;
}
unsigned int my_improved_rand()
{
static unsigned int main_seed=0;
static unsigned int sub_seeds[4]={0};
int i;
my_srand(main_seed++);
main_seed%=MY_RAND_MAX+1;
i=my_rand()%4;
my_srand(sub_seeds[i]++);
sub_seeds[i]%=MY_RAND_MAX+1;
return my_rand()+i*(MY_RAND_MAX+1);
}
|