Will/does adding type-based guards make code faster with the JIT?

cmo · April 30, 2022, 3:53am

I recall reading that there are more type-based enhancements are coming with the ongoing work on the JIT.

Will adding guards to functions so that Erlang knows what types things will be make any impact on performance?

bjorng · April 30, 2022, 4:09am

Yes, adding guards may improve performance. For more details, see the blog post Type-Based Optimizations in the JIT.

filmor · April 30, 2022, 8:00am

Has anyone ever looked into writing a parse transform that enforces specs as guards on simple functions? That could be really helpful in this case, I don’t want to litter my code with two sets of type definitions if not necessary. A compound type check would be nice for this, by the wy, something like is_tuple_with_types.

garazdawi · April 30, 2022, 9:57am

@bjorng will have to correct me if I’m wrong, but I think in a majority of cases, adding guards will not lead to any performance benefits. There are cases where they do help, but if you automatically add guards to all your functions I think you will see a performance loss rather than gain.

bjorng · April 30, 2022, 1:34pm

Yes, that is correct.

I should have expressed myself more clearly. Adding guards may improve performance but it may also reduce performance. It all depends on how expensive the guards are and how much useful type information they provide to the compiler and how much optimisation the compiler and/or JIT can do with the help of it. The most likely outcome of adding guards is that there will be no noticeable impact on performance.

The way you phrase it, I get the impression that you writes specs for both local and exported functions. It is generally not necessary. If you write specs for only the exported functions, Dialyzer will in most cases be able to infer the types for the local functions. Similarly, the compiler will infer types for local functions based on the patterns and guards for the exported functions.

OvermindDL1 · May 3, 2022, 5:12pm

I did some test mathy benchmarks with 25.0-rc2 a couple days ago for fun. Basically, it’s a ton faster than python, it’s “almost” as fast as pypy (about 50% slower give or take), it’s not at all bad.

lpil · May 3, 2022, 9:38pm

Is there a way to add type information for the JIT without the runtime overhead? Languages such as Gleam, PureScript, and Hamler have near perfect type information and could provide a lot of hints.

bjorng · May 4, 2022, 3:30am

No, the Erlang compiler and the JIT will not accept hints and will not generate unsafe code.

You will have to use your type information in Gleam, PureScript, or Hamlet to generate actual guard tests in strategic places in the code. For example, by placing type tests in all exported functions, the compiler will in principle propagate the type information to all local functions within the module. You could also place guard tests that establishes ranges before sequences of arithmetic instructions (because those operations are the ones that gain most from having good type information).

lpil · May 4, 2022, 7:44am

Brilliant, thank you.

Is there any documentation on how to effectively use guards with the JIT? Something similar to the Erlang efficiency guide perhaps?

bjorng · May 5, 2022, 3:55am

No, and we don’t have any plans to document it for OTP 25 because we expect most of the advice would be out of date for OTP 26 (especially that one must use band instead of relational operators to establish ranges).

kura · May 10, 2023, 1:06pm

Adding guards may improve performance but it may also reduce performance. It all depends on how expensive the guards are and how much useful type information they provide to the compiler and how much optimisation the compiler and/or JIT can do with the help of it. The most likely outcome of adding guards is that there will be no noticeable impact on performance.

Any thoughts on whether OTP26 with the new JIT optimizations changes this answer / general guidance? Would you still expect guards to have a negligible impact on performance in most cases?

bjorng · May 11, 2023, 4:48am

Yes, I think that answer still stands, although it is now more likely that a guard will provide useful type information, with which the compiler and/or JIT can do some useful optimization.

Still, in most cases, I expect guards to have negligible impact on performance.

The exception where guards can make a difference is when they are used before entering a loop. In my blog post, I gave this example:

-export([expand/1]).

expand(Bin) when is_binary(Bin) ->
    expand(Bin, <<>>).

expand(<<B:8,T/binary>>, Acc) ->
    expand(T, <<Acc/binary,B:16>>);
expand(<<>>, Acc) ->
    Acc.

It turns out that the is_binary(Bin) guard in expand/1 is beneficial because the binary matching in expand/2 can be done more cheaply (there is no need to test whether it is a binary or whether its size in bits is a multiple of 8). See The benefit of is_binary/1 guards in the blog post for more details.

Here is another example. lists:foldl/3 used to be implemented like this:

foldl(F, Accu, [Hd|Tail]) ->
    foldl(F, F(Hd, Accu), Tail);
foldl(F, Accu, []) when is_function(F, 2) -> Accu.

In OTP 25, it was rewritten like this:

foldl(F, Accu, List) when is_function(F, 2) ->
    case List of
        [Hd | Tail] -> foldl_1(F, F(Hd, Accu), Tail);
        [] -> Accu
    end.

foldl_1(F, Accu, [Hd | Tail]) ->
    foldl_1(F, F(Hd, Accu), Tail);
foldl_1(_F, Accu, []) ->
    Accu.

This is faster because the JIT can optimize the call F(Hd, Accu) (there is no need to test each time that F is a fun of the correct arity). On my computer (a M1 MacBook Pro) I got the following benchmark results:

$ erlperf --init_runner_all 'lists:seq(1, 1000).' \
'r(L) -> lists:foldl(fun(_, Acc) -> Acc end, 0, L).' \
'r(L) -> t:foldl(fun(_, Acc) -> Acc end, 0, L).'
Code                                                       ||        QPS       Time   Rel
r(L) -> lists:foldl(fun(_, Acc) -> Acc end, 0, L).          1     184 Ki    5430 ns  100%
r(L) -> t:foldl(fun(_, Acc) -> Acc end, 0, L).              1     160 Ki    6268 ns   87%

As a final example showing the benefit of moving tests out of loops, see the following pull request by @Maria-12648430: Improve lists:nth/2 and nthtail/2

gorillainduction · May 15, 2023, 8:07am

bjorng:

In OTP 25, it was rewritten like this:

foldl(F, Accu, List) when is_function(F, 2) ->
    case List of
        [Hd | Tail] -> foldl_1(F, F(Hd, Accu), Tail);
        [] -> Accu
    end.

foldl_1(F, Accu, [Hd | Tail]) ->
    foldl_1(F, F(Hd, Accu), Tail);
foldl_1(_F, Accu, []) ->
    Accu.

Interestingly enough, this is similar to the way we went about trying to use specs as dynamic type tests (as @filmor asked about earlier in this thread) in the HiPE group some 15 years ago.

Using type specs as guards works reasonably well for simple types (e.g., atom(), integer(), etc) but for recursive types it breaks down. Even for the case of lists, where you need to traverse it to the end to find out if it is terminated by [].

We generated a copy of the module with a module name based on the original one, so a qualified call would end up in the original code, but when the first type test was done, we could call the generated version where the type tests could be removed.

I guess this could be a way forward for the JIT as well. While rewriting the code to be more JIT-friendly is all good, the example transformation above would be relatively easy to do in an automated way.

jhogberg · May 15, 2023, 9:46am

Indeed, the tricky part is that it breaks tracing if we try to apply it generally. We’re trying to find good ways to make tracing follow arbitrary transformations but it’ll take a while before we get all the way there.