Why does the efficiency guide not mention how expensive exported functions can be for compilation times

I was only explaining why my code triggered the bottleneck while @josevalim’s didn’t. It was not meant as a generalization. To make it true in general, some more context and nuance is needed.

So let’s take at look at the details beginning with the following example:

björn(1) -> true;
björn(2) -> true;
björn(3) -> true;
björn(4) -> true.

In the beam_core_to_ssa pass in the compiler, when compiling a case, there is an optimization to shares identical bodies. That optimization will be applied to this function, causing this BEAM code to be emitted:

{function, björn, 1, 2}.
  {label,1}.
    {line,[{location,"t.erl",5}]}.
    {func_info,{atom,t},{atom,björn},1}.
  {label,2}.
    {select_val,{x,0},
                {f,1},
                {list,[{integer,1},
                       {f,3},
                       {integer,2},
                       {f,3},
                       {integer,3},
                       {f,3},
                       {integer,4},
                       {f,3}]}}.
  {label,3}.
    {move,{atom,true},{x,0}}.
    return.

When there was a huge number of clauses, this optimization would become very slow. That was the bottleneck I was talking about, and which is fixed in my pull request.

For your example, the optimization is not applied and the bottleneck is not triggered. To explain why, let’s look a simplified version of your code:

bart(1, 1) -> true;
bart(2, 2) -> true;
bart(3, 3) -> true;
bart(4, 4) -> true.

Internally in this compiler, the compiler rewrites this to:

bart(A, B) ->
    case A of
        1 ->
            case B of
                1 -> true;
                _ -> error(function_clause, [A,B])
            end;
        2 ->
            case B of
                2 -> true;
                _ -> error(function_clause, [A,B])
            end;
        3 ->
            case B of
                3 -> true;
                _ -> error(function_clause, [A,B])
            end;
        4 ->
            case B of
                4 -> true;
                _ -> error(function_clause, [A,B])
            end;
        _ ->
            error(function_clause, [A,B])
    end.

(This is a simplification; the compiler does not do this rewrite on the Erlang level but on a lower-level intermediate representation.)

There are no longer any identical bodies, so the optimization will not be applied. The resulting BEAM code looks like this:

{function, bart, 2, 5}.
  {label,4}.
    {line,[{location,"t.erl",10}]}.
    {func_info,{atom,t},{atom,bart},2}.
  {label,5}.
    {select_val,{x,0},
                {f,4},
                {list,[{integer,1},
                       {f,9},
                       {integer,2},
                       {f,8},
                       {integer,3},
                       {f,7},
                       {integer,4},
                       {f,6}]}}.
  {label,6}.
    {test,is_eq_exact,{f,4},[{x,1},{integer,4}]}.
    {jump,{f,10}}.
  {label,7}.
    {test,is_eq_exact,{f,4},[{x,1},{integer,3}]}.
    {jump,{f,10}}.
  {label,8}.
    {test,is_eq_exact,{f,4},[{x,1},{integer,2}]}.
    {jump,{f,10}}.
  {label,9}.
    {test,is_eq_exact,{f,4},[{x,1},{integer,1}]}.
  {label,10}.
    {move,{atom,true},{x,0}}.
    return.

To make it possible to apply the optimization (and reduce the code size), your example can be written like this:

bar(1=X, X) -> true;
bar(2=X, X) -> true;
bar(3=X, X) -> true;
bar(4=X, X) -> true.

Internally, the compiler rewrites that to:

bar(A, B) ->
    case A of
        1 when A =:= B -> true;
        2 when A =:= B -> true;
        3 when A =:= B -> true;
        4 when A =:= B -> true;
        _ -> error(function_clause, [A,B])
    end.

which results in the following BEAM code:

{function, bar, 2, 21}.
  {label,20}.
    {line,[{location,"t.erl",41}]}.
    {func_info,{atom,t},{atom,bar},2}.
  {label,21}.
    {select_val,{x,0},
                {f,20},
                {list,[{integer,1},
                       {f,22},
                       {integer,2},
                       {f,22},
                       {integer,3},
                       {f,22},
                       {integer,4},
                       {f,22}]}}.
  {label,22}.
    {test,is_eq_exact,{f,20},[{x,1},{tr,{x,0},{t_integer,{1,4}}}]}.
    {move,{atom,true},{x,0}}.
    return.
4 Likes

Really appreciate your effort to explain. It all makes sense now and I can apply the logic to other cases (pun unintended).

Glad to see the thread resulted in a speed fix and some lessons how to reduce AST; the ‘true/false’ approach being a prime example.

2 Likes