Yes, at least in most of the tests I have run, there is between 5-35% performance improvement using erlang:random_integer/1 compared with the erlang:phash2(erlang:unique_integer(), Range) trick and 60-80% performance improvement compared with rand:uniform/1.
I also neglected to share the performance results from erlang:random_integer/0 in my initial comment.
Single process test with erlang:random_integer/0:
erlperf -c 1 'erlang:random_integer().' 'erlang:unique_integer().' 'rand:uniform().'
Code || QPS Rel
erlang:random_integer(). 1 40138 Ki 100%
erlang:unique_integer(). 1 36524 Ki 91%
rand:uniform(). 1 10316 Ki 26%
64 process test with erlang:random_integer/0 (matches the number of online schedulers):
erlperf -c 64 'erlang:random_integer().' 'erlang:unique_integer().' 'rand:uniform().'
Code || QPS Rel
erlang:random_integer(). 64 42728 Ki 100%
erlang:unique_integer(). 64 36211 Ki 84%
rand:uniform(). 64 16375 Ki 38%
Yeah, I agree: this would not be a full replacement for rand and would primarily be useful in very high QPS use-cases (for example: random routing/load balancing).
That makes sense, but would this mostly be for API complete-ness or do you have any specific use-cases that something like erlang:random_binary/1 would be a perfect fit for that crypto:strong_rand_bytes/1 or rand:bytes/1 are sub-optimal for today?
Also makes sense for 32-bit. I figured we would have to address the 32-bit issue before this could be accepted. Right now I’m just blindly casting to Uint (all integers are just 64-bit everywhere, anyway, right?
).
Right now: I’m mimicking the rand:uniform/1 behavior that requires Range >= 1 while also enforcing Range =< (1 bsl 58) - 1. I’m using v != 0 to terminate the loop, which accomplishes something similar for the “V in the truncated top range” case in the ?uniform_range macro in rand.erl.
One (probably very rare) case that isn’t currently covered is if, for whatever reason, the scheduler’s state gets set to two zeros, any call to erlang:random_integer/1 will always return 1 forever. The same “no recovery from zero state” behavior can be simulated with rand using rand:uniform_s(1000, {element(1, rand:seed(exrop)), [0|0]}).. Regardless of the input N, the resulting output is always 1 and the state never changes.
Another issue is the initial value for splitmix64_seed, which I assumed we could either use some low-grade “entropy” (like system time, number of schedulers, etc; similar to rand:seed/1) that could be overridden with something like erl +randseed 123456.
Context for my following questions: I don’t know a ton about PRNG and the details of the pros/cons of the various algorithms. So I’m legitimately curious.
Why Xoshiro256++ versus Xoroshiro116+? If I’m understanding correctly, I think the cycles/B is roughly double, unless we were able to get the AVX2/NEON vectorization version to work correctly. If we split Xoshiro256++ for integers and Xoshiro256+ for floats, would the idea be that separate state would be kept on each scheduler for each type of output? If so, I think that would be 32-bytes (or 256-bytes for the SIMD version) stored twice, so either 64-bytes or 512-bytes per scheduler.