Hi all
A problem arose when I was looking at ‘erl_maps.c’.
Why do you keep order when adding new elements to the flatmap, but still find the data by traversing instead of using a binary search, is it cache optimization?
We keep the keys sorted to optimize matching out multiple elements at once from a map. For example:
#{key1 := Value, key2 := Value2, key3 := Value} = Map
With sorted keys, a single linear traversal can match out all values. The code is found in beam_jit_map_elements() in beam_jit_common.c.
A similar optimization is possible when updating multiple elements at once. For example:
Map#{key1 := NewValue1, key2 := NewValue2}
(The update code is found in beam_common.c)
The reason that we don’t do a binary search when looking up a single key is that testing two atoms for equality is faster than comparing whether an atom is smaller than another atom. It would probably be slower to do a binary search even for the largest possible small map (with 32 elements).
I assume multiple key matches in code are sorted at compile time?
They are actually sorted by the loader.
The reason is that sorting using the standard term order will not work because an integer can compare equal to a float:
1> 42 == 42.0.
true
and therefore we will not get a consistent key order if we sort the keys in the standard term order:
2> lists:sort([42.0,42]).
[42.0,42]
3> lists:sort([42,42.0]).
[42,42.0]
(lists:sort/1
uses a stable sorting algorithm, that is, when two elements compare equal, their original order is preserved.)
When the loader sorts the map keys, is uses a different comparison function that considers an integer to be smaller than the corresponding float. That can be seen if we construct a map with the keys 42
and 42.0
:
4> #{42.0 => float, 42 => integer}.
#{42 => integer,42.0 => float}
5> #{42 => integer, 42.0 => float}.
#{42 => integer,42.0 => float}