Yes, at least in most of the tests I have run, there is between 5-35% performance improvement using erlang:random_integer/1
compared with the erlang:phash2(erlang:unique_integer(), Range)
trick and 60-80% performance improvement compared with rand:uniform/1
.
I also neglected to share the performance results from erlang:random_integer/0
in my initial comment.
Single process test with erlang:random_integer/0
:
erlperf -c 1 'erlang:random_integer().' 'erlang:unique_integer().' 'rand:uniform().'
Code || QPS Rel
erlang:random_integer(). 1 40138 Ki 100%
erlang:unique_integer(). 1 36524 Ki 91%
rand:uniform(). 1 10316 Ki 26%
64 process test with erlang:random_integer/0
(matches the number of online schedulers):
erlperf -c 64 'erlang:random_integer().' 'erlang:unique_integer().' 'rand:uniform().'
Code || QPS Rel
erlang:random_integer(). 64 42728 Ki 100%
erlang:unique_integer(). 64 36211 Ki 84%
rand:uniform(). 64 16375 Ki 38%
Yeah, I agree: this would not be a full replacement for rand
and would primarily be useful in very high QPS use-cases (for example: random routing/load balancing).
That makes sense, but would this mostly be for API complete-ness or do you have any specific use-cases that something like erlang:random_binary/1
would be a perfect fit for that crypto:strong_rand_bytes/1
or rand:bytes/1
are sub-optimal for today?
Also makes sense for 32-bit. I figured we would have to address the 32-bit issue before this could be accepted. Right now I’m just blindly casting to Uint
(all integers are just 64-bit everywhere, anyway, right? ).
Right now: I’m mimicking the rand:uniform/1
behavior that requires Range >= 1
while also enforcing Range =< (1 bsl 58) - 1
. I’m using v != 0
to terminate the loop, which accomplishes something similar for the “V in the truncated top range” case in the ?uniform_range
macro in rand.erl
.
One (probably very rare) case that isn’t currently covered is if, for whatever reason, the scheduler’s state gets set to two zeros, any call to erlang:random_integer/1
will always return 1
forever. The same “no recovery from zero state” behavior can be simulated with rand
using rand:uniform_s(1000, {element(1, rand:seed(exrop)), [0|0]}).
. Regardless of the input N
, the resulting output is always 1 and the state never changes.
Another issue is the initial value for splitmix64_seed
, which I assumed we could either use some low-grade “entropy” (like system time, number of schedulers, etc; similar to rand:seed/1
) that could be overridden with something like erl +randseed 123456
.
Context for my following questions: I don’t know a ton about PRNG and the details of the pros/cons of the various algorithms. So I’m legitimately curious.
Why Xoshiro256++ versus Xoroshiro116+? If I’m understanding correctly, I think the cycles/B is roughly double, unless we were able to get the AVX2/NEON vectorization version to work correctly. If we split Xoshiro256++ for integers and Xoshiro256+ for floats, would the idea be that separate state would be kept on each scheduler for each type of output? If so, I think that would be 32-bytes (or 256-bytes for the SIMD version) stored twice, so either 64-bytes or 512-bytes per scheduler.