Looking for a faster RNG

Interesting speed figures!

Now, a NIF call overhead should not be much larger (if even) than
a BIF call overhead. So if this could well enough be implemented
in a NIF - that would be very much preferred.

Before getting deeper into the discussion of whether this is an
extension of the VM that we want we must explore implementing this
as a NIF.

I have tried once implementing one of the Xoroshiro generators
with a NIF, and did not get much performance improvement, but that
attempt was adhering to the framework of rand, meaning passing the state
back and forth between Erlang and the NIF.

Storing the state in thread specific data should have much less overhead.

:

That makes sense, but would this mostly be for API complete-ness or do you have any specific use-cases that something like erlang:random_binary/1 would be a perfect fit for that crypto:strong_rand_bytes/1 or rand:bytes/1 are sub-optimal for today?

That suggestion I guess was for API completeness.
I have no good use cases - often, I guess, there would be other
requirements so generating binaries would be a too narrow problem.

But I think there are use cases for fast random floats.

Also makes sense for 32-bit. I figured we would have to address the 32-bit issue before this could be accepted. Right now I’m just blindly casting to Uint (all integers are just 64-bit everywhere, anyway, right? :slight_smile:).

Let’s postpone this problem for later.

:
:

One (probably very rare) case that isn’t currently covered is if, for whatever reason, the scheduler’s state gets set to two zeros, any call to erlang:random_integer/1 will always return 1 forever. The same “no recovery from zero state” behavior can be simulated with rand using rand:uniform_s(1000, {element(1, rand:seed(exrop)), [0|0]}).. Regardless of the input N, the resulting output is always 1 and the state never changes.

That should never happen unless you seed it to [0|0], which must be avoided.
[0|0] is not part of the generator’s sequence, or rather; this class of
generators have two sequences; one is the singular [0|0] and the one used
is a full sequence of all other state values.

Another issue is the initial value for splitmix64_seed, which I assumed we could either use some low-grade “entropy” (like system time, number of schedulers, etc; similar to rand:seed/1) that could be overridden with something like erl +randseed 123456.

Possibly, but auto seed only would be prefered.

Also, the Splitmix64 state updates need to be thread safe, right?

Context for my following questions: I don’t know a ton about PRNG and the details of the pros/cons of the various algorithms. So I’m legitimately curious.

Why Xoshiro256++ versus Xoroshiro116+? If I’m understanding correctly, I think the cycles/B is roughly double, unless we were able to get the AVX2/NEON vectorization version to work correctly. If we split Xoshiro256++ for integers and Xoshiro256+, would the idea be that separate state would be kept on each scheduler for each type of output? If so, I think that would be 32-bytes (or 256-bytes for the SIMD version) stored twice, so either 64-bytes or 512-bytes per scheduler.

It is just that Xoshiro256 is recommended by the creator(s) as an
all-purpose generator. Its state is 32 bytes which I do not think
is much to store per scheduler. The jump distance is long: 2^128
compared to 2^64 for Xoroshiro128. If you run a VM very long it is
maybe even possible to use 2^64 random numbers in a scheduler.

The difference between Xhoshiro256++ and Xoshiro256+ is in
the pre state update output scrambling (++ vs.+) so my idea was to use
the same state update and scramble differently between integers and floats
since the + scrambler is slightly faster and its small disadvantage
vs. the ++ scrambler can be hidden when generating floats.

I also suspect that getting down from about 1 ns generation time
per 64-bit integer to a fraction of it will be buried in BIF or NIF
call overhead so there is not much point in looking at e.g. vectorization.
The same reasoning applies between Xoshiro256 vs. Xoroshiro128.

Cheers

3 Likes