Random Sort - should be included in the lists module?

For what it’s worth, I implemented three list shuffling algorithms.
(1) the sort based version, which is a cute one-liner;
(2) a recursive splitting version that divides a list using 2 random bits per element among four lists without trying to balanced the lengths;
(3) the recursive splitting version based on selecting N div 2 and N - N div 2 elements using rand:uniform(N) =< K.
While all three do O(N log N) list processing work, method (1) calls the RNG roughly N times, method (2) roughly 1/2 N log(N) times, and method (3) roughly N log(N) times.
It should be no surprise that in my crude bnchmarks method (1) was by far the fastest, and method (2) about twice as fast as method (3).
Everything depends on the cost of generating a random number, and it is clearly high.
The same relative speeds were observed in Smalltalk and Erlang.

4 Likes

There is also a hidden GC cost. (1) creates garbage for the attached random numbers, which has to be handled during the sort, and bias from duplicate numbers must be handled by burying them in extra PRNG bits, which is not currently a real problem.

For (2) the constant factor can be reduced by caching bits so the actual PRNG is called about once every 25 splits.

For (2) and (3) it is possible to cut a lot of leaves in the recursion tree by permuting lists of 2,3 and 4 elements directly from 1,5 and 7 random bits. I guess this also reduces the constant factor.

With these optimizations I get (2) as the clearly fastest for list lengths my laptop can handle less than (2^28).

3 Likes

Both (2) and (3) in my test handled length 2 specially. All three methods create O(N log N) garbage, and that’s with append avoided.

1 Like

Can you show some pseudo code - I have not understood this algorithm yet…

Merged into ‘master’ for OTP 29.0.

3 Likes