UPDATE: This post has been edited because the uniq_all_at_once/1
is broken. See below.
I wanted to share the results of some benchmarking I’ve done as part of evaluating the proposed lists:uniq/1 function.
First I compared two functions that create a map in two different ways. This one creates a map by adding one element at a time to an initially empty map:
one_by_one(Data) ->
one_by_one(Data, #{}).
one_by_one([Key|T], Map) ->
one_by_one(T, Map#{Key => {Key, Key+1}});
one_by_one([], Map) ->
Map.
The other one creates a list of {Key, Value}
pairs and then turns it into a map using maps:from_list/1
:
all_at_once(Data) ->
all_at_once(Data, []).
all_at_once([Key|T], Acc) ->
all_at_once(T, [{Key, Key+1}|Acc]);
all_at_once([], Acc) ->
maps:from_list(Acc).
So which one is fastest? Here is the result from the benchmark run (using Erlang/OTP 24.2):
1> mbench:run().
generated 2000000 data items
one_by_one/1 : 1.825983
all_at_once/1 : 0.504215
.
.
.
ok
Before all of you start rewriting of all map updating code , note that the map in the benchmark had two million elements. Unless your code regularly have to handle that many elements or if part of a frequently used library or the Erlang compiler, you probably don’t have to worry about the performance difference.
I then turned to benchmarking two possible implementations of lists:uniq/1
. Here is the first, similar to the one proposed by @cons in the pull request:
uniq_one_by_one(L) ->
uniq_one_by_one(L, #{}).
uniq_one_by_one([X|Xs], M) ->
case is_map_key(X, M) of
true ->
uniq_one_by_one(Xs, M);
false ->
[X|uniq_one_by_one(Xs, M#{X => true})]
end;
uniq_one_by_one([], _) ->
[].
The other one constructs the map by using maps:from_keys/1
in one go (UPDATED: but it is wrong):
uniq_all_at_once(L) ->
M = maps:from_keys(L, true),
uniq_all_at_once(L, M).
uniq_all_at_once([X|Xs], M) ->
case is_map_key(X, M) of
true ->
uniq_all_at_once(Xs, M);
false ->
[X|uniq_all_at_once(Xs, M)]
end;
uniq_all_at_once([], _) ->
[].
(UPDATED: is_map_key/1
will always return true
, so the result will always be []
.)
Based on the previous benchmarking of map creation, it is no surprise that uniq_all_at_once/1
is almost three times as fast (UPDATE: an incorrect program can run arbitrarily fast):
uniq_one_by_one/1 : 1.942674
uniq_all_at_once/1 : 0.736359
However, those results were produced with an input list where all items are unique. If I change the input list to two million identical elements, uniq_one_by_one/1
is the winner:
uniq_one_by_one/1 : 0.007935
uniq_all_at_once/1 : 0.085007
Therefore, in my opinion, if lists:uniq/1
is to be included in the lists
module, the implementation should be based on uniq_one_by_one/1
.
Finally, I wanted to know how those implementations compared to lists:usort/1
, which can be used if you don’t care about the order of the elements. So I added this function to the benchmark:
uniq_by_usort(L) ->
lists:usort(L).
Here are the results:
uniq_one_by_one/1 : 1.942674
uniq_by_usort/1 : 0.030334
Seeing that sorting can make the unique operation an order of magnitude faster, I then tried to implement uniq/1
by sorting while still keeping the order:
uniq_by_sofs(L0) ->
L = number(L0, 0),
S0 = sofs:relation(L),
S1 = sofs:relation_to_family(S0),
S2 = sofs:converse(S1),
S3 = sofs:to_external(S2),
[E || {_,E} <- S3].
number([H|T], N) ->
[{H,N}|number(T, N+1)];
number([], _) ->
[].
When all elements in the input list are unique, uniq_by_sofs/1
turned out to be the fastest function:
uniq_one_by_one/1 : 1.934827
uniq_by_sofs/1 : 0.534989
When all elements are identical, uniq_one_by_one/1
is again the winner:
uniq_one_by_one/1 : 0.007935
uniq_by_sofs/1 : 0.349756
Here is my benchmark module: