I threw together a NIF here: GitHub - potatosalad/erlang-fastrandom
I also started down the path of storing the state in a reference()
, but found that the performance was nearly identical to pure-Erlang rand
when it was written as a rand
“plugin”.
The NIF has support for Xoroshiro116+, Xoshiro256+, Xoshiro256+X8, Xoshiro256++, and Xoshiro256++X8. SplitMix64 is used along with enif_tsd_get
to store thread specific data similar to how the BIF functions.
However, in its current form, the BIF implementation is still roughly 30-35% more efficient compared to the NIF implementation.
Single process performance:
Code || QPS Rel
erlang:random_integer(1000). 1 34123 Ki 100%
erlang:phash2(erlang:unique_integer(), 1000). 1 22562 Ki 66%
fastrandom_nif:xoroshiro116p_next(1000). 1 17359 Ki 51%
fastrandom_nif:xoshiro256p_next(1000). 1 17319 Ki 51%
fastrandom_nif:xoshiro256px8_next(1000). 1 16886 Ki 50%
fastrandom_nif:xoshiro256ppx8_next(1000). 1 16861 Ki 49%
fastrandom_nif:xoshiro256pp_next(1000). 1 16662 Ki 48%
rand:uniform(1000). 1 10465 Ki 30%
64 process performance:
Code || QPS Rel
erlang:random_integer(1000). 64 36578 Ki 100%
fastrandom_nif:xoroshiro116p_next(1000). 64 26174 Ki 71%
fastrandom_nif:xoshiro256p_next(1000). 64 20460 Ki 56%
erlang:phash2(erlang:unique_integer(), 1000). 64 18774 Ki 51%
fastrandom_nif:xoshiro256pp_next(1000). 64 15208 Ki 41%
fastrandom_nif:xoshiro256ppx8_next(1000). 64 15018 Ki 41%
fastrandom_nif:xoshiro256px8_next(1000). 64 13834 Ki 37%
rand:uniform(1000). 64 9105 Ki 24%
That makes sense, thanks for the explanation
Normally: yes. However, at first glance, it looked like erts_init_scheduling
calls init_scheduler_data
in a for
loop, which would result in erts_sched_bif_unique_init
being called sequentially on a single thread. So…I think SplitMix64
is only ever called on a single thread. However, if I am misinterpreting the code there, then yes: some sort of lock would need to be added.
That makes sense, thank you for the explanation.
What are your thoughts on the direction of the experiments so far? I think given the fairly small performance improvement between the phash2/unique integer trick and the NIF, we’ll likely stick with the former for the time being. However, the BIF implementation could potentially be useful to us for very high QPS random-based load balancing workloads.