When exactly does the Erlang compiler inline code?

Qqwy · April 9, 2022, 10:50am

Hi there!

The compile documentation contains some information about inlining.

However, it is a bit vague in one particular area of interest: Will the compiler ever inline functions from other modules (i.e. ‘remote’ functions)?
There is the separate inline_list_funcs option, but I wonder if it is possible to convince the compiler to do something similar for other modules in certain situations.

It is rather common to have Erlang (or Elixir) code pertaining different datastructures separated in different modules for human readability. But if the code were to be inlined, some overhead (like wrapping and immediately unwrapping certain maps, tuples or other datastructures) might be eliminated by the compiler.

Thanks for your help!

hackerjones · April 10, 2022, 2:00am

While I am no expert on the compiler, I don’t believe remote calls could be inlined. How would hot code upgrades work?

mat-hek · April 10, 2022, 8:32pm

That’s my experience as well, and though I’m not really into hot upgrades, I’m wondering if a module is always the best boundary for them. Maybe ability to group a bunch of modules that would have to be always reloaded all together could be a solution to that? Compiler could do a lot of optimisations while hot upgrades wouldn’t suffer, since reloading tightly related modules separately seems hard and not really beneficial

bjorng · April 11, 2022, 5:16am

No, the compiler will never inline code from other modules. When the inline_list_funcs option is given, the compiler uses hard-coded knowledge of a few of the functions in the lists module.

Hot code upgrades would be much more complicated if inlining from remote modules were allowed. One way to handle it would be for the loader to load and unload all modules in an application (or other groups of modules) all at once. Another way would be for the compiler to generate code both with and without inlining and let the loader make sure that the correct version of the code would be used.

We have not tried to implement either of those ways to handle inlining combined with hot code upgrading, because all experiments we have done so far with inlining code between modules in applications such as compiler and dialyzer have been disappointing. The issue seems to be that function calls in BEAM are very fast, and thus inlining will only gain performance if the inlined code can be simplified. If the code cannot be simplified, inlining will increase the code size, which usually results in worse performance.

Qqwy · April 11, 2022, 11:44am

@bjorng thank you very much for your in-depth reply! I wonder whether the code size argument still holds as strongly now that the code is compiled to native code.
And besides this, I do think that cases in which the code can be simplified are rather common. The main case which comes to mind, is the one where records, tuples, maps (, Elixir structs) are built on one side of a remote module call and immediately deconstructed on the other side.

Of course, it would definitely make hot-code upgrades more difficult, if it were to be expected to work everywhere.
Maybe it is possible to create something similar to inline_list_funcs that works for a user-specified module using a compiler option. Or maybe this could be emulated fully in user-code with a clever use of parse transforms (or Elixir macros).

SisMaker · April 18, 2022, 1:34am

I just tested it and found that if you want to see if a function is inline you can use erlc -s xxx.erl to check. The .S file is the final execution file. It seems like an interesting thing to look at .s, but there doesn’t seem to be a full assembly instruction description.

williamthome · October 28, 2023, 1:44pm

How/when exactly the code size can result in worse performance? There is a good practice in the number of lines or the amount of the code of a module? Also, can the number of exported functions also result in worse performance?

jhogberg · October 28, 2023, 1:54pm

When the hot parts of your code no longer fit well into the instruction cache, for instance.

It varies so much from case to case that we can’t give any advice other than to measure performance religiously.

No.