Looking for a faster RNG

potatosalad · March 15, 2022, 5:04pm

I threw together a NIF here: GitHub - potatosalad/erlang-fastrandom

I also started down the path of storing the state in a reference(), but found that the performance was nearly identical to pure-Erlang rand when it was written as a rand “plugin”.

The NIF has support for Xoroshiro116+, Xoshiro256+, Xoshiro256+X8, Xoshiro256++, and Xoshiro256++X8. SplitMix64 is used along with enif_tsd_get to store thread specific data similar to how the BIF functions.

However, in its current form, the BIF implementation is still roughly 30-35% more efficient compared to the NIF implementation.

Single process performance:

Code                                                  ||        QPS     Rel
erlang:random_integer(1000).                           1   34123 Ki    100%
erlang:phash2(erlang:unique_integer(), 1000).          1   22562 Ki     66%
fastrandom_nif:xoroshiro116p_next(1000).               1   17359 Ki     51%
fastrandom_nif:xoshiro256p_next(1000).                 1   17319 Ki     51%
fastrandom_nif:xoshiro256px8_next(1000).               1   16886 Ki     50%
fastrandom_nif:xoshiro256ppx8_next(1000).              1   16861 Ki     49%
fastrandom_nif:xoshiro256pp_next(1000).                1   16662 Ki     48%
rand:uniform(1000).                                    1   10465 Ki     30%

64 process performance:

Code                                                  ||        QPS     Rel
erlang:random_integer(1000).                          64   36578 Ki    100%
fastrandom_nif:xoroshiro116p_next(1000).              64   26174 Ki     71%
fastrandom_nif:xoshiro256p_next(1000).                64   20460 Ki     56%
erlang:phash2(erlang:unique_integer(), 1000).         64   18774 Ki     51%
fastrandom_nif:xoshiro256pp_next(1000).               64   15208 Ki     41%
fastrandom_nif:xoshiro256ppx8_next(1000).             64   15018 Ki     41%
fastrandom_nif:xoshiro256px8_next(1000).              64   13834 Ki     37%
rand:uniform(1000).                                   64    9105 Ki     24%

That makes sense, thanks for the explanation

Normally: yes. However, at first glance, it looked like erts_init_scheduling calls init_scheduler_data in a for loop, which would result in erts_sched_bif_unique_init being called sequentially on a single thread. So…I think SplitMix64 is only ever called on a single thread. However, if I am misinterpreting the code there, then yes: some sort of lock would need to be added.

That makes sense, thank you for the explanation.

What are your thoughts on the direction of the experiments so far? I think given the fairly small performance improvement between the phash2/unique integer trick and the NIF, we’ll likely stick with the former for the time being. However, the BIF implementation could potentially be useful to us for very high QPS random-based load balancing workloads.