Benchmarking of sofs:relation_to_family/1 and a variant of the proposed lists:groupby

bjorng · February 25, 2022, 3:20am

I just realized that the way I generated the test data favored the sofs-based implementations. My function for generating data looked like this:

gen_data(Limit, Limit, Acc) ->
    Acc;
gen_data(N, Limit, Acc) ->
    Item = {N div 100,rand:uniform(Limit)},
    gen_data(N+1, Limit, [Item|Acc]).

(for 1 percent unique keys)

The problem is that the generated tuples are already sorted (in reverse order). The sorting that sofs will need to do will be much faster compared to sorting a list with random elements.

So I changed my data generation to generate keys in a more random way:

gen_data(Limit, Limit, Acc) ->
    Acc;
gen_data(N, Limit, Acc) ->
    Item = {erlang:phash2(N div 100),rand:uniform(Limit)},
    gen_data(N+1, Limit, [Item|Acc]).

Here are the new results:

generated 2000000 two-tuples with 1% unique keys
groupby/1           : 0.318817
sofs_groupby/1      : 1.329506
groupby_as_list/1   : 0.351737
sofs_groupby_as_list: 1.306519

generated 2000000 two-tuples with 50% unique keys
groupby/1           : 1.111122
sofs_groupby/1      : 1.967342
groupby_as_list/1   : 1.116259
sofs_groupby_as_list: 1.782708

generated 2000000 two-tuples with 99% unique keys
groupby/1           : 1.616968
sofs_groupby/1      : 2.390517
groupby_as_list/1   : 1.762403
sofs_groupby_as_list: 1.971433

That is, the map-based groupby functions are consistently faster than the sofs-based functions for unordered input data.