I think you can use any proper
hash function to produce a (pseudo) "random" value from a given input message โ in such a way that
different inputs will produce
different (pseudo) "random" outputs (with extremely high probability), while also having the guarantee that the
same input is always going to produce the
same output. Well, that's exactly what hash functions are designed to do ๐
The input data somehow needs to be encoded as a byte-array, in an
unambiguous way, so that it can be feed into the hash function as input message. For example, if you have 3 floating point values, then you can just concatenate their IEEE 754 representations (e.g. float64 or float32) in a specific order (e.g. "x || y || z"). But you can really choose
any encoding here, as long as it is unambiguous/reversible.
I don't think you need a
cryptographic hash function for your application (as far as I can tell), so something "fast" with good hashing properties will do. Candidates include
xxHash,
MurmurHash (version 3), as well as
CityHash or
FarmHash. Maybe even something as simple as
FNV64 will do. Of course, you could always just pick a
cryptographic hash function like
SHA-1,
SHA-256 or
SHA3/SHAKE, which protect against pre-image attacks, but they also are
way slower than the aforementioned non-cryptographic hash functions!
Mapping the output of the hash function, i.e. the hash value, to a number in
[0, 1) range is easy. Most non-cryptographic hash functions produce a 64-Bit or 128-Bit hash value as their output. Since every possible bit-string should appear with same probability as output of the hash function, you can just interpret it as a 64-Bit (or 128-Bit) unsigned integral number. That number converted to floating-pint, e.g. float64 (aka
double
), and then divided by 2^64 (
UINT64_MAX + 1
) or by 2^128, respectively, will give you a value in the
[0, 1) range.
Here's another option: Use the hash value produced by the hash function as the "seed" for the PRNG (pseudo-random number generator) of your choice and, after seeding the PRNG with the hash value, call
nextDouble()
to get a floating-point number in the
[0, 1) range. In C++, this can be achieved, for example, by using
std::mt19937_64
in combination with
std::uniform_real_distribution<> dis(0.0, 1.0)
. In most examples you will see that
std::mt19937_64
is seeded from the system's entropy source (e.g.
std::random_device
), but you can just seed it with the hash value computed from your original input (coordinates) instead in order to get what you want ๐
[EDIT]
Mapping the hash values from
uint64_t
to a
double
value in the
[0,1) range is actually a bit tricky. Even though
double
can represent numbers way bigger than
UINT64_MAX
, it can
not represent
all numbers from 0 up to and including
UINT64_MAX
, because of how floating-point numbers work โ which means that hash values that are distinct in
uint64_t
would be mapped (rounded) to the same
double
value, when we convert them! The biggest number
x such that
all numbers from
zero up to and including
x can be represented as
double
(
without loss) is 2^53. So, I think, a proper solution is to truncate the 64-Bit hash value to 53-Bit, then convert to
double
and finally divide by 2^53.
____________________________________
Here is an example:
1 2 3 4 5 6 7
|
#include "xxh64.h"
#include <cmath>
double hash_to_double(const void *const key, const size_t len)
{
return (XXH64(key, len, 0U) >> 11) * 0x1.0p-53; /* (hash โ 11) ร (1.0 รท (1L โช 53)) */
}
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
|
typedef struct
{
double x, y, z;
}
point3_t;
int main()
{
const char *const KEY_1 = "Foo";
const char *const KEY_2 = "Bar";
const char *const KEY_3 = "Baz";
const char *const KEY_4 = "Qux";
printf("%.16f\n", hash_to_double(KEY_1, strlen(KEY_1)));
printf("%.16f\n", hash_to_double(KEY_2, strlen(KEY_2)));
printf("%.16f\n", hash_to_double(KEY_3, strlen(KEY_3)));
printf("%.16f\n", hash_to_double(KEY_4, strlen(KEY_4)));
const point3_t point_1 = { 1.0, 42.0, 666.0 };
const point3_t point_2 = { 1.0, 42.0, 667.0 };
const point3_t point_3 = { 1.0, 42.1, 666.0 };
const point3_t point_4 = { 0.9, 42.0, 666.0 };
printf("%.16f\n", hash_to_double(&point_1, sizeof(point3_t)));
printf("%.16f\n", hash_to_double(&point_2, sizeof(point3_t)));
printf("%.16f\n", hash_to_double(&point_3, sizeof(point3_t)));
printf("%.16f\n", hash_to_double(&point_4, sizeof(point3_t)));
}
|
See here for details:
https://gist.github.com/dEajL3kA/1e8f374653eaac0298d5eea63874bb10
____________________________________
Alternatively, we can use
std::uniform_real_distribution<>
, but it's probably quite a bit slower:
1 2 3 4 5 6 7 8 9 10
|
#include "xxh64.h"
#include <cmath>
#include <random>
double hash_to_double(const void *const key, const size_t len)
{
std::mt19937_64 mt(XXH64(key, len, 0U));
std::uniform_real_distribution<double> dis(0.0, 1.0);
return dis(mt);
}
|