Help in this shuffling question!

Consider the following algorithm, which generates a (not necessarily uniformly) random permutation of numbers 1 through N:
1
2
3
4
P := [1, 2, ..., N]
for i in 1..N do
    j := rand(1, N)
    swap(P[i], P[j])

Here, rand(1, N) returns a uniformly random integer between 1 and N inclusive. Let's denote the probability that the permutation generated by this algorithm is P by p(P).

Find a permutation P1 such that p(P1) is maximum possible and a permutation P2 such that p(P2) is minimum possible.
Last edited on
Duplicate of topic posted two days ago: http://www.cplusplus.com/forum/general/242239/

It's an interesting question, i.e. in what way are certain permutations more likely than other permutations, since different sequences can produce the same end-result permutation.

Personally, since I'm not sure where to start from a mathematics point of view, I'd write a simulation that then gets the distribution of each of the N! permutations (not necessarily brute-force for higher N). Then figure out how to get the expected values. Not sure how'd you generalize that, but maybe some pattern will emerge.
Last edited on
PS: It doesn't answer your question, but here's a program I wrote that plots the distribution of each possible permutation, lexographically ordered via std::next_permutation. It's interesting output;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <chrono>
#include <random>
#include <map>
#include <utility>

unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
std::mt19937 gen(seed);

template <size_t Size>
size_t random_index()
{
    static std::uniform_int_distribution<size_t> dist(0, Size-1);
    return dist(gen);
}

using Data = std::vector<size_t>;

template <size_t Size>
Data trial()
{
    //P := [1, 2, ..., N]
    //for i in 1..N do
    //    j := rand(1, N)
    //    swap(P[i], P[j])

    // [1, 2, .... N]
    Data values(Size);
    for (size_t i = 0; i < Size; i++)
    {
        values[i] = i + 1;
    }
    
    for (size_t i = 0; i < Size; i++)
    {
        size_t j = random_index<Size>();
        std::swap(values[i], values[j]);
    }
    
    return values;
}

int main()
{
    constexpr size_t N = 6;
    Data values(N);
    for (size_t i = 0; i < N; i++)
    {
        values[i] = i + 1;
    }
   
    // Initialize map
    std::map<Data, size_t> map;   
    do {
        map[values] = 0;
    } while (std::next_permutation(values.begin(), values.end()));
    
    std::cout << "N: " << N << "\n";
    std::cout << "size of map: " << map.size() << "\n";

    // fill map with trial data
    for (int i = 0; i < 2'000'000; i++)
    {
        Data data = trial<N>();
        map[data]++;  
    }

    std::cout << "Printing...\n";
    
    // max element of map
    using pair_type = decltype(map)::value_type;
    auto max_pair = std::max_element
    (
        std::begin(map), std::end(map),
        [] (const pair_type & p1, const pair_type & p2) {
            return p1.second < p2.second;
        }
    );
    size_t max_value = max_pair->second;
        
    // Reset for print
    for (size_t i = 0; i < N; i++)
    {
        values[i] = i + 1;
    }
    do {
        size_t value = (map[values] / static_cast<double>(max_value) * 160.0);
        std::cout << std::string(value, '#') << '\n';
    } while (std::next_permutation(values.begin(), values.end()));    
}


If you look at the output, the maximum row does seem to be in about the same place (proportionally) regardless of N. But I have no idea if this generalizes.
Last edited on
Hey @Ganado!

The question states that N<=17. But on my PC, the code won't work for N>10. Any way to optimise it?
My program will not work for N == 17, as that would require around ~355687 GB (times some constant) to hold every array of every permutation. I'm sure it could be optimized due to the vast # of arrays repeating! But probably would still be magnitudes too slow at such large numbers.

My program wasn't meant as a solution, just a way to show what appears to be an emerging pattern that can be investigated more with probability math.

I don't know probability very well, I've only had one class on it, so your problem seems very difficult. Maybe math.stackexchange.com would be a good place to ask a question (but be thorough in your question or it will get downvoted, probably).
Last edited on
Topic archived. No new replies allowed.