Why is finding data in flatmap traversal?

Hi all
A problem arose when I was looking at ‘erl_maps.c’.
Why do you keep order when adding new elements to the flatmap, but still find the data by traversing instead of using a binary search, is it cache optimization?

2 Likes

We keep the keys sorted to optimize matching out multiple elements at once from a map. For example:

#{key1 := Value, key2 := Value2, key3 := Value} = Map

With sorted keys, a single linear traversal can match out all values. The code is found in beam_jit_map_elements() in beam_jit_common.c.

A similar optimization is possible when updating multiple elements at once. For example:

Map#{key1 := NewValue1, key2 := NewValue2} 

(The update code is found in beam_common.c)

The reason that we don’t do a binary search when looking up a single key is that testing two atoms for equality is faster than comparing whether an atom is smaller than another atom. It would probably be slower to do a binary search even for the largest possible small map (with 32 elements).

3 Likes

I assume multiple key matches in code are sorted at compile time?

2 Likes

They are actually sorted by the loader.

The reason is that sorting using the standard term order will not work because an integer can compare equal to a float:

1> 42 == 42.0.
true

and therefore we will not get a consistent key order if we sort the keys in the standard term order:

2> lists:sort([42.0,42]).
[42.0,42]
3> lists:sort([42,42.0]).  
[42,42.0]

(lists:sort/1 uses a stable sorting algorithm, that is, when two elements compare equal, their original order is preserved.)

When the loader sorts the map keys, is uses a different comparison function that considers an integer to be smaller than the corresponding float. That can be seen if we construct a map with the keys 42 and 42.0:

4> #{42.0 => float, 42 => integer}.
#{42 => integer,42.0 => float}
5> #{42 => integer, 42.0 => float}.
#{42 => integer,42.0 => float}
2 Likes