I was only explaining why my code triggered the bottleneck while @josevalim’s didn’t. It was not meant as a generalization. To make it true in general, some more context and nuance is needed.
So let’s take at look at the details beginning with the following example:
björn(1) -> true;
björn(2) -> true;
björn(3) -> true;
björn(4) -> true.
In the beam_core_to_ssa pass in the compiler, when compiling a case, there is an optimization to shares identical bodies. That optimization will be applied to this function, causing this BEAM code to be emitted:
{function, björn, 1, 2}.
{label,1}.
{line,[{location,"t.erl",5}]}.
{func_info,{atom,t},{atom,björn},1}.
{label,2}.
{select_val,{x,0},
{f,1},
{list,[{integer,1},
{f,3},
{integer,2},
{f,3},
{integer,3},
{f,3},
{integer,4},
{f,3}]}}.
{label,3}.
{move,{atom,true},{x,0}}.
return.
When there was a huge number of clauses, this optimization would become very slow. That was the bottleneck I was talking about, and which is fixed in my pull request.
For your example, the optimization is not applied and the bottleneck is not triggered. To explain why, let’s look a simplified version of your code:
bart(1, 1) -> true;
bart(2, 2) -> true;
bart(3, 3) -> true;
bart(4, 4) -> true.
Internally in this compiler, the compiler rewrites this to:
bart(A, B) ->
case A of
1 ->
case B of
1 -> true;
_ -> error(function_clause, [A,B])
end;
2 ->
case B of
2 -> true;
_ -> error(function_clause, [A,B])
end;
3 ->
case B of
3 -> true;
_ -> error(function_clause, [A,B])
end;
4 ->
case B of
4 -> true;
_ -> error(function_clause, [A,B])
end;
_ ->
error(function_clause, [A,B])
end.
(This is a simplification; the compiler does not do this rewrite on the Erlang level but on a lower-level intermediate representation.)
There are no longer any identical bodies, so the optimization will not be applied. The resulting BEAM code looks like this:
{function, bart, 2, 5}.
{label,4}.
{line,[{location,"t.erl",10}]}.
{func_info,{atom,t},{atom,bart},2}.
{label,5}.
{select_val,{x,0},
{f,4},
{list,[{integer,1},
{f,9},
{integer,2},
{f,8},
{integer,3},
{f,7},
{integer,4},
{f,6}]}}.
{label,6}.
{test,is_eq_exact,{f,4},[{x,1},{integer,4}]}.
{jump,{f,10}}.
{label,7}.
{test,is_eq_exact,{f,4},[{x,1},{integer,3}]}.
{jump,{f,10}}.
{label,8}.
{test,is_eq_exact,{f,4},[{x,1},{integer,2}]}.
{jump,{f,10}}.
{label,9}.
{test,is_eq_exact,{f,4},[{x,1},{integer,1}]}.
{label,10}.
{move,{atom,true},{x,0}}.
return.
To make it possible to apply the optimization (and reduce the code size), your example can be written like this:
bar(1=X, X) -> true;
bar(2=X, X) -> true;
bar(3=X, X) -> true;
bar(4=X, X) -> true.
Internally, the compiler rewrites that to:
bar(A, B) ->
case A of
1 when A =:= B -> true;
2 when A =:= B -> true;
3 when A =:= B -> true;
4 when A =:= B -> true;
_ -> error(function_clause, [A,B])
end.
which results in the following BEAM code:
{function, bar, 2, 21}.
{label,20}.
{line,[{location,"t.erl",41}]}.
{func_info,{atom,t},{atom,bar},2}.
{label,21}.
{select_val,{x,0},
{f,20},
{list,[{integer,1},
{f,22},
{integer,2},
{f,22},
{integer,3},
{f,22},
{integer,4},
{f,22}]}}.
{label,22}.
{test,is_eq_exact,{f,20},[{x,1},{tr,{x,0},{t_integer,{1,4}}}]}.
{move,{atom,true},{x,0}}.
return.