Detect the "order" (arrangement) of an array

Suppose you have an array of length n, containing some "random" numbers. There are n! possible orders in which these numbers can be arranged. For example, if n=4, there are 4! = 24 possible orders/arrangements: The biggest element can be in any of the 4 positions, the second biggest element can be in one of the 3 remaining positions, the third biggest element can be in one of the 2 remaining positions, and the last (smallest) element has only one position left. So, we get 4×3×2×1 = 24 possible orders/arrangements.

Now, I want to write a function that takes as input an array of length n and detects its order/arrangement. The return value should be a single number in the 0 to k-1 range, where k=n! is the total number of possible orders/arrangements.

My basic idea was to create a secondary array and initialize it with the numbers from 0 to k-1. Then I sort the primary array (i.e. the given input array) using a simple sorting algorithm, e.g Bubble Sort. Also, whenever two elements of the primary array are swapped by the sorting algorithm, the corresponding elements of the secondary array are swapped too. In the end, after the primary array has been sorted, we get a unique arrangement of the numbers from 0 to k-1 inside the secondary array for each possible order of the input numbers.

For n=4 it looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
static size_t get_order4(const uint32_t a, const uint32_t b, const uint32_t c, const uint32_t d)
{
    uint32_t u[] = { a, b, c, d };
    uint32_t v[] = { 0, 1, 2, 3 };

    if(u[0]>u[1]) { swap(u[0], u[1]); swap(v[0], v[1]); }
    if(u[2]>u[3]) { swap(u[2], u[3]); swap(v[2], v[3]); }
    if(u[0]>u[2]) { swap(u[0], u[2]); swap(v[0], v[2]); }
    if(u[1]>u[3]) { swap(u[1], u[3]); swap(v[1], v[3]); }
    if(u[1]>u[2]) { swap(u[1], u[2]); swap(v[1], v[2]); }

    return ???
}

Question: How do I convert the final state of the secondary array to a single number in the 0 to k-1 range to get my result?

________

I could use some bit shifts and a giant switch statement, but that doesn't scale so well...

1
2
3
4
5
6
7
8
9
10
11
12
13
static size_t get_index4(const uint32_t *const v)
{
    switch (((v[0U] & 0x3) << 6) | ((v[1U] & 0x3) << 4) | ((v[2U] & 0x3) << 2) | (v[3U] & 0x3))
    {
        case 0x1B: return  0U;
        case 0x1E: return  1U;
        case 0x27: return  2U;
        case 0x2D: return  3U;
        case 0x36: return  4U;
        case 0x39: return  5U;
        [...]
    }
}
Last edited on
so, are you asking for a unique ID for what pattern your data has, eg if N is 200, you want one value from 1 to 788657867364790503552363213932185062295135977687173263294742533244359449963403342920304284011984623904177212138919638830257642790242637105061926624952829931113462857270763317237396988943922445621451664240254033291864131227428294853277524242407573903240321257405579568660226031904170324062351700858796178922222789623703897374720000000000000000000000000000000000000000000000000 ?

if so, this is not really viable for N of any real size.
you would be better off getting a hashed value of the data, like an enterprise ID number or 'SHA' like value. That can be done, and you can say if 2 arrangements are equal with a very, very high degree of probability, but you can't say which pattern they are in.

The swapping idea with a modified parallel sort could work, but you should use a stable sort. To convert that to a number, use base 256 to generate bytes. so if you got array locations 0,1,2,3,4,5 and sorted it and got 054213, convert 54213 into bytes by dividing it by 256 repeatedly. the low byte is 211, the high byte is 197, I believe. You can do that any number of ways, but you want out D3C5 in hex, the most compact form you can get of the common formats. 256 isnt the only way, you can use any base, that just happens to be hex. At the end of the day you have to turn your data in the parallel array into 'digits' in some base. Im going to stop and assume you see how to do that, but if not, we can do more.

this seems like a lot of work vs a known hash. I think your approach zeros out the possibility of repeated values, and it would let you re-create the ordering, but it takes up more space for the generated key. I am still puzzling out if your sort idea can be done with the work of a sort, eg in O(n). I think it can be, but its hard to visualize this...

anyway, do you care which pattern they are in, need to re-create it?
Last edited on
Maybe something like this?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <algorithm>
#include <iostream>
#include <algorithm>
#include <cstdint>

using namespace std;

constexpr size_t factorial(int x)
{
  size_t result = 1; 
  while (x > 1) result *= x--; 
  return result;
}

size_t get_order(uint32_t const* curr, uint32_t const* end)
{
  if (curr == end) return 0ll;
  
  size_t result = 0ll;
  for (uint32_t const* prev = curr++; curr != end; ++prev, ++curr)
  {
    auto const f = [=](auto const& e){ return e < *prev; };
    auto const n = count_if(curr, end, f);
    auto const m = factorial(distance(curr, end));
      
    result += n * m;
  } 
  
  return result;
}

int main()
{
  uint32_t xs[] { 1, 2, 3, 4 };
  
  do
    cout << get_order(begin(xs), end(xs)) << ' '; 
  while (next_permutation(begin(xs), end(xs)));
  
  cout << '\n';
}
so, are you asking for a unique ID for what pattern your data has, eg if N is 200, you want one value from 1 to 788657... ?

Well, yes. But only the order of the numbers (i.e. how are the biggest number, second biggest number, third biggest number, and so on, arranged in the array) needs to be encoded. Not the concrete numbers! For example, input [1,2,3] would be the same order as input [1,10,100] or [7,42,666] and should give the same return value, whereas input [1,3,2] would obviously be a different order.

Also we can assume that n is relatively small, so that the return value (encoding) fits in a uint64_t.

To convert that to a number, use base 256 to generate bytes. so if you got array locations 0,1,2,3,4,5 and sorted it and got 054213, convert 54213 into bytes by dividing it by 256 repeatedly. the low byte is 211, the high byte is 197, I believe. You can do that any number of ways, but you want out D3C5 in hex, the most compact form you can get of the common formats. 256 isnt the only way, you can use any base, that just happens to be hex. At the end of the day you have to turn your data in the parallel array into 'digits' in some base. Im going to stop and assume you see how to do that, but if not, we can do more.

Yes, I can convert the array containing the indices of the concrete "order", e.g. [1,0,3,1] or [3,0,2,1], into a single number, by encoding each array element as a byte (or nibble) and then shift+OR them together. So [1,0,3,1] becomes 0x1031; and [3,0,2,1] becomes 0x3021. But the problem is that the resulting numbers will not be in a contiguous range from 0 to k-1 (with k=n!), as I need them to be! 😳

For example, the numbers 0x0123, 0x1023 and 0x1203 are possible as result. But numbers like 0x1123 or 0x1223 or 0x1233 can not appear as result. That's because each index can only occur once in a concrete order! So, there are many "holes" in the output range.

Any idea how I can map those "encoded" numbers to a contiguous range from 0 to k-1?

As mentioned above, I can implement an explicit mapping, by using a giant switch statement. But that's not very elegant and doesn't scale well. I would have to generate a separate explicit mapping for each n. For the n=4 case, it would look like:
1
2
3
4
5
6
7
8
9
10
static size_t get_index4(const uint32_t *const v)
{
    /* encode each index in the array as two bits (sufficient for n=4, as indices are in 0 to 3 range) */
    /* ...then map the result to a number in 0 to k-1 range */
    switch (((v[0U] & 0x3) << 6) | ((v[1U] & 0x3) << 4) | ((v[2U] & 0x3) << 2) | (v[3U] & 0x3))
    {
        case 0x1B: return  0U;
        case 0x1E: return  1U;
        case 0x27: return  2U;
        [...]

Can you think of a "smarter" and more scalable way to do this?

this seems like a lot of work vs a known hash.

I don't think that a "hash" function can help me here. Also, I don't want to build a "hash" function.

A "hash" function does not give (with very high probability) to the same hash value for the input arrays [1,2,3] and [1,10,100]. But I need all orders that are considered the same to produce the same return value. And different orders to produce different return values.

Also, the return value needs to be in 0 to k-1 range, so that each possible return value represents a different order.
Last edited on
Assumes:
- "order" as in list of tests below (essentially, "ascending" order of size if they were digits of a number)
- N! is within range of unsigned long long
- all elements of array are distinct
- counts from 0 (but you can easily change that below)


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include <iostream>
#include <vector>
using namespace std;

using INT = unsigned long long;

//==========================================================

INT Factorial( int N ){ return N <= 1 ? N : N * Factorial( N - 1 ); }

//==========================================================

INT order( const vector<int> &A )
{
   INT result = 0;
   int N = A.size();
   INT nFactorial = Factorial( N );

   for ( int p = 0; p < N - 1; p++ )
   {
       nFactorial /= ( N - p );        // how many permutations of REMAINING elements after p
       int position = 0;               // rank in A[p], ... , A[N-1]
       for ( int j = p + 1; j < N; j++ ) if ( A[p] > A[j] ) position++;
       result += position * nFactorial;
   }

   return result;
}

//==========================================================

int main()
{
   bool oneBased = false;              // change to true if you want to count from 1
   // Order assumed as following tests
   vector< vector<int> > tests = { {0,1,2,3}, {0,1,3,2}, {0,2,1,3}, {0,2,3,1}, {0,3,1,2}, {0,3,2,1},
                                   {1,0,2,3}, {1,0,3,2}, {1,2,0,3}, {1,2,3,0}, {1,3,0,2}, {1,3,2,0},
                                   {2,0,1,3}, {2,0,3,1}, {2,1,0,3}, {2,1,3,0}, {2,3,0,1}, {2,3,1,0},
                                   {3,0,1,2}, {3,0,2,1}, {3,1,0,2}, {3,1,2,0}, {3,2,0,1}, {3,2,1,0} };

   for ( vector<int> &A : tests ) cout << order( A ) + oneBased << '\n';
}

Last edited on
@lastchance
Thanks, this approach seems to work great and it avoids some of the headaches of the "sorting"-based approach! 👌

I think by using a LUT to store the pre-computed factorial values, we can even simplify this a bit:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
static size_t get_order(const uint32_t *const values, const size_t n)
{
    static const size_t FACTORIAL[] = { 0U, 1U, 2U, 6U, 24U, 120U, 720U, 5040U, 40320U };
    size_t i, j, rank, fact, result;
    for (i = 0U, result = 0U, fact = n; i < n - 1U; ++i)
    {
        for (j = i + 1U, rank = 0U; j < n; j++)
        {
            if (values[i] > values[j])
                ++rank;
        }
        result += rank * FACTORIAL[--fact];
    }
    return result;
}
Last edited on
Actually, @kigar64551, I've just realised that my approach is essentially the same as @mbozzi's earlier:
https://cplusplus.com/forum/general/284725/#msg1233843
so I think he has priority here.

But I definitely prefer my variable names!



Yes, you can pre-compute the factorials if you wish, but N is never going to be all that big, or N! would overflow. As a fairly minor detail, if you get the first factorial then you can find successively lower ones by knocking off the biggest factor sequentially, as in
nFactorial /= ( N - p );
in my code.
Last edited on
Topic archived. No new replies allowed.