# Benchmarking of maps and the proposed lists:uniq/1

UPDATE: This post has been edited because the `uniq_all_at_once/1` is broken. See below.

I wanted to share the results of some benchmarking I’ve done as part of evaluating the proposed lists:uniq/1 function.

First I compared two functions that create a map in two different ways. This one creates a map by adding one element at a time to an initially empty map:

``````one_by_one(Data) ->
one_by_one(Data, #{}).

one_by_one([Key|T], Map) ->
one_by_one(T, Map#{Key => {Key, Key+1}});
one_by_one([], Map) ->
Map.
``````

The other one creates a list of `{Key, Value}` pairs and then turns it into a map using `maps:from_list/1`:

``````all_at_once(Data) ->
all_at_once(Data, []).

all_at_once([Key|T], Acc) ->
all_at_once(T, [{Key, Key+1}|Acc]);
all_at_once([], Acc) ->
maps:from_list(Acc).
``````

So which one is fastest? Here is the result from the benchmark run (using Erlang/OTP 24.2):

``````1> mbench:run().
generated 2000000 data items
one_by_one/1        : 1.825983
all_at_once/1       : 0.504215
.
.
.
ok
``````

Before all of you start rewriting of all map updating code , note that the map in the benchmark had two million elements. Unless your code regularly have to handle that many elements or if part of a frequently used library or the Erlang compiler, you probably don’t have to worry about the performance difference.

I then turned to benchmarking two possible implementations of `lists:uniq/1`. Here is the first, similar to the one proposed by @cons in the pull request:

``````uniq_one_by_one(L) ->
uniq_one_by_one(L, #{}).

uniq_one_by_one([X|Xs], M) ->
case is_map_key(X, M) of
true ->
uniq_one_by_one(Xs, M);
false ->
[X|uniq_one_by_one(Xs, M#{X => true})]
end;
uniq_one_by_one([], _) ->
[].
``````

The other one constructs the map by using `maps:from_keys/1` in one go (UPDATED: but it is wrong):

``````uniq_all_at_once(L) ->
M = maps:from_keys(L, true),
uniq_all_at_once(L, M).

uniq_all_at_once([X|Xs], M) ->
case is_map_key(X, M) of
true ->
uniq_all_at_once(Xs, M);
false ->
[X|uniq_all_at_once(Xs, M)]
end;
uniq_all_at_once([], _) ->
[].
``````

(UPDATED: `is_map_key/1` will always return `true`, so the result will always be `[]`.)

Based on the previous benchmarking of map creation, it is no surprise that `uniq_all_at_once/1` is almost three times as fast (UPDATE: an incorrect program can run arbitrarily fast):

``````uniq_one_by_one/1   : 1.942674
uniq_all_at_once/1  : 0.736359
``````

However, those results were produced with an input list where all items are unique. If I change the input list to two million identical elements, `uniq_one_by_one/1` is the winner:

``````uniq_one_by_one/1   : 0.007935
uniq_all_at_once/1  : 0.085007
``````

Therefore, in my opinion, if `lists:uniq/1` is to be included in the `lists` module, the implementation should be based on `uniq_one_by_one/1`.

Finally, I wanted to know how those implementations compared to `lists:usort/1`, which can be used if you don’t care about the order of the elements. So I added this function to the benchmark:

``````uniq_by_usort(L) ->
lists:usort(L).
``````

Here are the results:

``````uniq_one_by_one/1   : 1.942674
uniq_by_usort/1     : 0.030334
``````

Seeing that sorting can make the unique operation an order of magnitude faster, I then tried to implement `uniq/1` by sorting while still keeping the order:

``````uniq_by_sofs(L0) ->
L = number(L0, 0),
S0 = sofs:relation(L),
S1 = sofs:relation_to_family(S0),
S2 = sofs:converse(S1),
S3 = sofs:to_external(S2),
[E || {_,E} <- S3].

number([H|T], N) ->
[{H,N}|number(T, N+1)];
number([], _) ->
[].
``````

When all elements in the input list are unique, `uniq_by_sofs/1` turned out to be the fastest function:

``````uniq_one_by_one/1   : 1.934827
uniq_by_sofs/1      : 0.534989
``````

When all elements are identical, `uniq_one_by_one/1` is again the winner:

``````uniq_one_by_one/1   : 0.007935
uniq_by_sofs/1      : 0.349756
``````

Here is my benchmark module:

5 Likes

Hummm… I don’t think `uniq_all_at_once/1` will work as expected.
I bet it will always return `[]` since `is_map_key(X, M)` will always return `true`.

4 Likes

Yes, you are right, of course.

Note to self: Always add test cases for the functions I benchmark. If it isn’t tested, it doesn’t work.

4 Likes

I wonder if this approach could work:

1. Pair all elements in the list with their index. So `[a, b, c]` becomes `[[a | 0], [b | 1], [c | 2]]`

2. Perform a usort on the head of the list

3. Perform a sort on the tail of the list

4. Now traverse the original list matching indexes on the sorted list. Something like:

`````` keep_indexed([H | T], I, [[_ | I] | Is]) ->
[H | keep_indexed(T, I + 1, Is)];
keep_indexed([_ | T], I, Is) ->
keep_indexed(T, I + 1, Is);
keep_indexed([], _I, []) ->
[].
``````

Perhaps this is what sofs is doing?

Step 1 is going to add additional processing but it could be handled by a `usort_index` implementation that adds the indexes as you traverse, instead of requiring a pre-pass.

Another approach is to have a slightly modified `usort_indexes` that returns the indexes of the removed elements. The benefit is that you can do a quick `Removed =:= []` to check if you should return the original list as is.

4 Likes

Here is a naive implementation:

``````uniq_indexes(L0) ->
L2 = lists:usort(fun([A|_], [B|_]) -> A =< B end, L1),
L3 = lists:sort(fun([_|A], [_|B]) -> A =< B end, L2),
take_indexed(L0, 0, L3).

add_index([H | T], I) -> [[H | I] | add_index(T, I + 1)];

take_indexed([H | T], I, [[_ | I] | Is]) ->
[H | take_indexed(T, I + 1, Is)];
take_indexed([_ | T], I, Is) ->
take_indexed(T, I + 1, Is);
take_indexed([], _I, []) ->
[].
``````

Here are the results:

``````benchmarking 2000000 data items
uniq_indexes/1      : 0.629356
uniq_one_by_one/1   : 2.724498
uniq_by_sofs/1      : 0.740294
uniq_by_usort/1     : 0.042347
uniq_by_usort/2     : 0.077485
``````

`uniq_by_usort/2` is calling `usort/2` cause I wanted to measure the cost of the anonymous function.

If all elements are equal, then it behaves poorly:

``````benchmarking 2000000 data items
uniq_indexes/1      : 0.449931
uniq_one_by_one/1   : 0.009655
uniq_by_sofs/1      : 0.410562
uniq_by_usort/1     : 0.010609
uniq_by_usort/2     : 0.026661
``````

It even performs worse than `sofs` but it is much better than `sofs` for small collections.

I have also added a benchmark where half of the elements are different and half is unique:

``````uniq_indexes/1      : 0.871175
uniq_one_by_one/1   : 2.727809
uniq_by_sofs/1      : 2.753117
uniq_by_usort/1     : 0.042967
uniq_by_usort/2     : 0.084409
``````

Some basic profiling shows that 1/6 of the time is spent on `add_index`, 2/6 on `take_indexes`, and the rest in the sort operations. Part of the 1/6 can likely be squeezed by reimplementing the initial steps of `usort` to sort and index, and then delegating to the rest to `ufmergel` and `rufmergel`.

4 Likes

Interesting. In the “half/half” case `uniq_indexes` is much faster than the other two. I guess that is because it throws away so many elements in the first sorting pass, so the second sorting pass has much less work to do.

2 Likes

Here is the merging of add_indexed with the first pass of usort in `uniq_indexes`:

``````benchmarking 2000000 data items
uniq_indexes/1      : 0.420352
uniq_one_by_one/1   : 2.808513
uniq_by_sofs/1      : 0.752575
uniq_by_usort/1     : 0.034249
uniq_by_usort/2     : 0.07497
``````

All equal:

``````benchmarking 2000000 data items
uniq_indexes/1      : 0.009649
uniq_one_by_one/1   : 0.00919
uniq_by_sofs/1      : 0.501969
uniq_by_usort/1     : 0.011112
uniq_by_usort/2     : 0.035831
``````

Implementation here (to be added into the lists module as it uses private functions).

``````%% uniq/1

uniq([X, Y | L] = L0) when X < Y ->
case L of
[] ->
L0;
[Z] when X == Z; Y == Z ->
[X, Y];
[Z] ->
[X, Y, Z];
_ ->
uniq_indexed(L0, uisplit_1([X|0], [Y|1], L, [], [], 2))
end;
uniq([X, Y | L] = L0) when X > Y ->
case L of
[] ->
[X, Y];
[Z] when X == Z; Y == Z ->
[X, Y];
[Z] ->
[X, Y, Z];
_ ->
uniq_indexed(L0, uisplit_2([X|0], [Y|1], L, [], [], 2))
end;
uniq([X, _Y | L] = L0) ->
uniq_indexed(L0, uisort_1(X, L, 2));
uniq([X]) ->
[X];
uniq([]) ->
[].

uniq_indexed(L0, Indexed) ->
Sorted = sort(fun([_|A], [_|B]) -> A =< B end, Indexed),
take_indexed(L0, 0, Sorted).

take_indexed([H | T], I, [[_ | I] | Is]) ->
[H | take_indexed(T, I + 1, Is)];
take_indexed([_ | T], I, Is) ->
take_indexed(T, I + 1, Is);
take_indexed([], _I, []) ->
[].

uisort_1(X, [Y | L], I) when X == Y ->
uisort_1(X, L, I + 1);
uisort_1(X, [Y | L], I) when X < Y ->
uisplit_1([X|0], [Y|I], L, [], [], I + 1);
uisort_1(X, [Y | L], I) ->
uisplit_2([X|0], [Y|I], L, [], [], I + 1);
uisort_1(X, [], _I) ->
[[X|0]].

%% This is a version of usort that splits and indexes at once.

uisplit_1(X, Y, [Z | L], R, Rs, I) when Z > hd(Y) ->
uisplit_1(Y, [Z | I], L, [X | R], Rs, I + 1);
uisplit_1(X, Y, [Z | L], R, Rs, I) when Z == hd(Y) ->
uisplit_1(X, Y, L, R, Rs, I + 1);
uisplit_1(X, Y, [Z | L], R, Rs, I) when Z > hd(X) ->
uisplit_1([Z | I], Y, L, [X | R], Rs, I + 1);
uisplit_1(X, Y, [Z | L], R, Rs, I) when Z == hd(X) ->
uisplit_1(X, Y, L, R, Rs, I + 1);
uisplit_1(X, Y, [Z | L], [], Rs, I) ->
uisplit_1(X, Y, L, [[Z | I]], Rs, I + 1);
uisplit_1(X, Y, [Z | L], R, Rs, I) ->
uisplit_1_1(X, Y, L, R, Rs, [Z | I], I + 1);
uisplit_1(X, Y, [], R, Rs, _I) ->
rufmergel([[Y, X | R] | Rs], [], fun([A|_], [B|_]) -> A =< B end).

uisplit_1_1(X, Y, [Z | L], R, Rs, S, I) when Z > hd(Y) ->
uisplit_1_1(Y, [Z | I], L, [X | R], Rs, S, I + 1);
uisplit_1_1(X, Y, [Z | L], R, Rs, S, I) when Z == hd(Y) ->
uisplit_1_1(X, Y, L, R, Rs, S, I + 1);
uisplit_1_1(X, Y, [Z | L], R, Rs, S, I) when Z > hd(X) ->
uisplit_1_1([Z | I], Y, L, [X | R], Rs, S, I + 1);
uisplit_1_1(X, Y, [Z | L], R, Rs, S, I) when Z == hd(X) ->
uisplit_1_1(X, Y, L, R, Rs, S, I + 1);
uisplit_1_1(X, Y, [Z | L], R, Rs, S, I) when Z > hd(S) ->
uisplit_1(S, [Z | I], L, [], [[Y, X | R] | Rs], I + 1);
uisplit_1_1(X, Y, [Z | L], R, Rs, S, I) when Z == hd(S) ->
uisplit_1_1(X, Y, L, R, Rs, S, I + 1);
uisplit_1_1(X, Y, [Z | L], R, Rs, S, I) ->
uisplit_1([Z | I], S, L, [], [[Y, X | R] | Rs], I + 1);
uisplit_1_1(X, Y, [], R, Rs, S, _I) ->
rufmergel([[S], [Y, X | R] | Rs], [], fun([A|_], [B|_]) -> A =< B end).

%% Descending.
uisplit_2(X, Y, [Z | L], R, Rs, I) when Z < hd(Y) ->
uisplit_2(Y, [Z | I], L, [X | R], Rs, I + 1);
uisplit_2(X, Y, [Z | L], R, Rs, I) when Z == hd(Y) ->
uisplit_2(X, Y, L, R, Rs, I + 1);
uisplit_2(X, Y, [Z | L], R, Rs, I) when Z < hd(X) ->
uisplit_2([Z | I], Y, L, [X | R], Rs, I + 1);
uisplit_2(X, Y, [Z | L], R, Rs, I) when Z == hd(X) ->
uisplit_2(X, Y, L, R, Rs, I + 1);
uisplit_2(X, Y, [Z | L], [], Rs, I) ->
uisplit_2(X, Y, L, [[Z | I]], Rs, I + 1);
uisplit_2(X, Y, [Z | L], R, Rs, I) ->
uisplit_2_1(X, Y, L, R, Rs, [Z | I], I + 1);
uisplit_2(X, Y, [], R, Rs, _I) ->
ufmergel([[Y, X | R] | Rs], [], fun([A|_], [B|_]) -> A =< B end).

uisplit_2_1(X, Y, [Z | L], R, Rs, S, I) when Z < hd(Y) ->
uisplit_2_1(Y, [Z | I], L, [X | R], Rs, S, I + 1);
uisplit_2_1(X, Y, [Z | L], R, Rs, S, I) when Z == hd(Y) ->
uisplit_2_1(X, Y, L, R, Rs, S, I + 1);
uisplit_2_1(X, Y, [Z | L], R, Rs, S, I) when Z < hd(X) ->
uisplit_2_1([Z | I], Y, L, [X | R], Rs, S, I + 1);
uisplit_2_1(X, Y, [Z | L], R, Rs, S, I) when Z == hd(X) ->
uisplit_2_1(X, Y, L, R, Rs, S, I + 1);
uisplit_2_1(X, Y, [Z | L], R, Rs, S, I) when Z < hd(S) ->
uisplit_2(S, [Z | I], L, [], [[Y, X | R] | Rs], I + 1);
uisplit_2_1(X, Y, [Z | L], R, Rs, S, I) when Z == hd(S) ->
uisplit_2_1(X, Y, L, R, Rs, S, I + 1);
uisplit_2_1(X, Y, [Z | L], R, Rs, S, I) ->
uisplit_2([Z | I], S, L, [], [[Y, X | R] | Rs], I + 1);
uisplit_2_1(X, Y, [], R, Rs, S, _I) ->
ufmergel([[S], [Y, X | R] | Rs], [], fun([A|_], [B|_]) -> A =< B end).
``````

Disclaimer: I have only ran very limited tests! There are probably more optimizations but duplicating all of `usort` is likely too much.

4 Likes

Very nice speed up of the sort-based `uniq`, but I am not sure that we would to add that much extra code the `lists` module. I couldn’t resist thinking about whether some of the pathological cases could be eliminated using less code and came up with the following (now added to my `mbench` module):

``````uniq_indexes(L0) ->
L2 = lists:usort(fun([_|A], [_|B]) -> A =< B end, L1),
L3 = lists:sort(L2),
take_indexed(L0, 0, L3).

add_index([H1 | T], I, [[_ | H2] | _]=Acc) when H1 == H2 ->
add_index([H | T], I, Acc) ->
add_index(T, I + 1, [[I | H] | Acc]);
Acc.

take_indexed([H | T], I, [[I | _] | Is]) ->
[H | take_indexed(T, I + 1, Is)];
take_indexed([_ | T], I, [_ | _]=Is) ->
take_indexed(T, I + 1, Is);
take_indexed([_ | _], _I, []) ->
[];
take_indexed([], _I, []) ->
[].
``````

I did the following modifications:

• I modified `add_index` to remove consecutive duplicate elements.

• I placed the index in the head so that the second sort could be done without using a sorting fun.

• I modified `take_indexed` to stop immeditely when the second list becomes empty.

Here are the results when all elements are equal:

``````uniq_one_by_one/1   : 0.007664
uniq_by_sofs/1      : 0.37169
uniq_indexes/1      : 0.00515
uniq_by_usort/1     : 0.007006
``````

Unfortunately, my benchmark was broken in a way that favored the sort-based `uniq`. See my next post.

2 Likes

Unfortunately, my benchmark module generated a list sorted in descending order. That favored the sort-based implementations because the sorting became much cheaper (O(N) instead of O(N * log N)).

I have now updated the data generation to increase the randomness:

``````gen_data(Limit, Limit, Acc) ->
Acc;
gen_data(N, Limit, Acc) ->
Item = erlang:phash2(N),
gen_data(N+1, Limit, [Item|Acc]).
``````

Here are the new results:

``````uniq_one_by_one/1   : 1.970418
uniq_by_sofs/1      : 4.147654
uniq_indexes/1      : 3.922919
uniq_by_usort/1     : 0.462431
``````

That is, the map-based implementation is about twice as fast as the sort-based implementations.

I also added @josevalim’s implementation to the `lists` module (on the `master` branch):

``````generated 2000000 data items
one_by_one/1        : 1.878218
all_at_once/1       : 0.582533
uniq_one_by_one/1   : 2.009027
uniq_by_sofs/1      : 3.837489
uniq_indexes/1      : 3.702631
lists:uniq/1        : 3.429415
uniq_by_usort/1     : 0.453432
``````
3 Likes

@bjorng I posted a suggestion for an augmentation of the `uniq` function in the use cases thread: Uses cases? for proposed new lists functions - #10 by Maria-12648430. I think something like this is necessary to make `uniq` more than just superficially useful. At the same time, I fear that it will take a lot of the steam out of the pre-filled map approaches discussed here, or make them impossible even (sorry ).

3 Likes

Nothing to be sorry for. After many twists and turns, the simple map-based solution (`uniq_one_by_one/1` in the first post) turned out to be fastest implementation in general and that implementation can easily be extended to take a fun.

4 Likes