How to create white noise random hashing

Forum

Forum
General C++ Programming
How to create white noise random hashing

How to create white noise random hashing function?

I'm trying to figure out how to build a random hashing function that emphasizes speed and repeatability. Basically, for input it will have an unsigned int seed and 3 floats, representing a point in 3D space. It will return a value on [0, 1). This function needs to always return the same value if given the same inputs; however, over different inputs it should return a uniform random distribution of values.

I was thinking I could turn the input values into an array of bytes, hash it, and then use the hash value as a lookup in a random function, but am getting lost. Between new C++ and old C++, I'm not even sure what the right way is to turn ints and floats into an array of bytes anymore. The random functions also all seem to be pseudorandom iterators rather than anything that maps a hash value to a repeatable spot on a uniform distribution.

Any tips on how I can solve this?

kigar64551 (853)

I think you can use any proper hash function to produce a (pseudo) "random" value from a given input message – in such a way that different inputs will produce different (pseudo) "random" outputs (with extremely high probability), while also having the guarantee that the same input is always going to produce the same output. Well, that's exactly what hash functions are designed to do 😏

The input data somehow needs to be encoded as a byte-array, in an unambiguous way, so that it can be feed into the hash function as input message. For example, if you have 3 floating point values, then you can just concatenate their IEEE 754 representations (e.g. float64 or float32) in a specific order (e.g. "x || y || z"). But you can really choose any encoding here, as long as it is unambiguous/reversible.

I don't think you need a cryptographic hash function for your application (as far as I can tell), so something "fast" with good hashing properties will do. Candidates include xxHash, MurmurHash (version 3), as well as CityHash or FarmHash. Maybe even something as simple as FNV64 will do. Of course, you could always just pick a cryptographic hash function like SHA-1, SHA-256 or SHA3/SHAKE, which protect against pre-image attacks, but they also are way slower than the aforementioned non-cryptographic hash functions!

Mapping the output of the hash function, i.e. the hash value, to a number in [0, 1) range is easy. Most non-cryptographic hash functions produce a 64-Bit or 128-Bit hash value as their output. Since every possible bit-string should appear with same probability as output of the hash function, you can just interpret it as a 64-Bit (or 128-Bit) unsigned integral number. That number converted to floating-pint, e.g. float64 (aka double), and then divided by 2^64 (UINT64_MAX + 1) or by 2^128, respectively, will give you a value in the [0, 1) range.

Here's another option: Use the hash value produced by the hash function as the "seed" for the PRNG (pseudo-random number generator) of your choice and, after seeding the PRNG with the hash value, call nextDouble() to get a floating-point number in the [0, 1) range. In C++, this can be achieved, for example, by using std::mt19937_64 in combination with std::uniform_real_distribution<> dis(0.0, 1.0). In most examples you will see that std::mt19937_64 is seeded from the system's entropy source (e.g. std::random_device), but you can just seed it with the hash value computed from your original input (coordinates) instead in order to get what you want 😎

[EDIT]

Mapping the hash values from uint64_t to a double value in the [0,1) range is actually a bit tricky. Even though double can represent numbers way bigger than UINT64_MAX, it can not represent all numbers from 0 up to and including UINT64_MAX, because of how floating-point numbers work – which means that hash values that are distinct in uint64_t would be mapped (rounded) to the same double value, when we convert them! The biggest number x such that all numbers from zero up to and including x can be represented as double (without loss) is 2^53. So, I think, a proper solution is to truncate the 64-Bit hash value to 53-Bit, then convert to double and finally divide by 2^53.

____________________________________

Here is an example:

#include "xxh64.h"
#include <cmath>

double hash_to_double(const void *const key, const size_t len)
{
	return (XXH64(key, len, 0U) >> 11) * 0x1.0p-53; /* (hash ⋙ 11) × (1.0 ÷ (1L ≪ 53)) */
}

typedef struct
{
	double x, y, z;
}
point3_t;

int main()
{
	const char *const KEY_1 = "Foo";
	const char *const KEY_2 = "Bar";
	const char *const KEY_3 = "Baz";
	const char *const KEY_4 = "Qux";

	printf("%.16f\n", hash_to_double(KEY_1, strlen(KEY_1)));
	printf("%.16f\n", hash_to_double(KEY_2, strlen(KEY_2)));
	printf("%.16f\n", hash_to_double(KEY_3, strlen(KEY_3)));
	printf("%.16f\n", hash_to_double(KEY_4, strlen(KEY_4)));

	const point3_t point_1 = { 1.0, 42.0, 666.0 };
	const point3_t point_2 = { 1.0, 42.0, 667.0 };
	const point3_t point_3 = { 1.0, 42.1, 666.0 };
	const point3_t point_4 = { 0.9, 42.0, 666.0 };

	printf("%.16f\n", hash_to_double(&point_1, sizeof(point3_t)));
	printf("%.16f\n", hash_to_double(&point_2, sizeof(point3_t)));
	printf("%.16f\n", hash_to_double(&point_3, sizeof(point3_t)));
	printf("%.16f\n", hash_to_double(&point_4, sizeof(point3_t)));
}

See here for details:
https://gist.github.com/dEajL3kA/1e8f374653eaac0298d5eea63874bb10

____________________________________

Alternatively, we can use std::uniform_real_distribution<>, but it's probably quite a bit slower:

#include "xxh64.h"
#include <cmath>
#include <random>

double hash_to_double(const void *const key, const size_t len)
{
	std::mt19937_64 mt(XXH64(key, len, 0U));
	std::uniform_real_distribution<double> dis(0.0, 1.0);
	return dis(mt);
}

Last edited on

Topic archived. No new replies allowed.