Erlang efficiency guide Accidental Copying and Loss of Sharing - how can it determine which values to copy?

Erlang -- Common Caveats says

accidental2(State) ->
    spawn(fun() ->
                  io:format("~p\n", [map_get(info, State)])

will cause the whole State to be copied into the new process, while

fixed_accidental2(State) ->
    Info = map_get(info, State),
    spawn(fun() ->
                  io:format("~p\n", [Info])

will only copy Info but not the whole State.

The question is, how can it determine which values to copy? Does it keep a list of all the referenced variables in the function body in order to do so?

1 Like

Sort of. In the BEAM instruction that creates the fun there is a list of the values that need to be part of the fun’s environment. To demonstrate the concept of the environment, I have created the following function in module t:

incrementer(Inc) ->
    fun(Value) -> Value + Inc end.

I can use it like this:

1> I1 = t:incrementer(1). 
2> I2 = t:incrementer(999).
3> I1(7).
4> I2(7).
5> erlang:fun_info(I1, env).
6> erlang:fun_info(I2, env).

That is, references to any variables in the fun containing the fun are stored in the environment for the fun. When storing a value in the environment, no deep copy is done. If a term in the environment is, for example, a tuple, only a tagged pointer to the contents of the tuple is stored in the environment.

When a fun is passed to spawn/1, as part of copying the fun, all values in the fun’s environment will be deep-copied into the the heap of the newly spawned process (that is, tagged pointers will be followed and what they point to will be copied).


Thanks for the detailed explanation. That helps a lot to understand how funs are working in Erlang. A follow up question is, since all the information is parsed and stored, is it possible to further analyze to avoid the accidental copying automatically? In the given example, we can tell from map_get(info, State) that only Info is needed then we just store Info instead of the whole State. That way even the code is written as accidental2 it still gets automatically optimized by the compiler to avoid unnecessary copying.

1 Like

Define an extractable expression as follows:

  • it is a constant
  • or variable from the enclosing environent
  • or a ‘safe’ function applied to extractable
    expressions, where map_get/2 and + and so on
    might count as safe.
    Let a Fun contain an extractable expression E
    that contains every occurrence of a non-local
    variable V in Fun. Then the value of E can be
    computed and stored in Fun’s environment instead
    of V.

This is a well known compiler optimisation, called
“code motion out of loops”. (Well, it’s basically
that optimisation wearing a Voltaire mask and
whistling a secret spy tune. It’s what code motion
out of loops would look like in Erlang.)

The problem is that there are no safe functions,
and in particular, map_get/2 isn’t one.

Risk 1: evaluating an extractable expression might
result in an expression being raised that would NOT
have been raised.

Risk 2: when Fun does use the value of E, and an
exception is raised, it will be raised in the wrong
process, not just the wrong function.

Risk 3: the value of E might be bigger than V.

It’s only safe to hoist map_get(info, State) out of
the fun+spawn when you KNOW that State has an
info slot. Type checking can help with that, and in
the presence of type information this optimisation might
actually be useful.


It is hard to do the optimization right. But at least the compiler can issue a warning in places where such accidental copying may be happening. The hunt for the cluster-killer Erlang bug | by Dániel Szoboszlay | Klarna Engineering and Fix the exponential growth of the producer buffer by dszoboszlay · Pull Request #452 · kafka4beam/brod · GitHub is a good example. It would help debugging or preventing such as issue if the compiler can issue a warning in such situations.