Core Erlang: backward and forward compatibility?

Hi!

As part of my work on Horus, I study Core Erlang and I’m wondering about the compatibility between Erlang versions.

For the context, Horus is a library that attemps to extract functions, including anonymous functions, from a BEAM module, and generate a standalone module out of it. The goal is to be able to store or transmit to another Erlang node that function and all its “dependencies” (i.e. the other functions it calls), to be able to execute it regardless if the module that hosted it is different or unavailable at the time of execution. So far, it works with the assembly instructions because of various reasons at the time of the design. However it has a big drawback: the produced assembly may not be backward- or forward-compatible with other versions of the Erlang runtime.

Now, I’m looking at the Erlang Abstract Format and Core Erlang. I have a working prototype with Erlang Abstract Format, but there is one problem: the variables that were defined outside of an anonymous function and passed in its enviroment. They are sorted using an ordsets by the compiler, but they might have been renamed internally. That’s why I’m looking at the Core Erlang code which has variables with their final names.

I see that Core Erlang is specified, but it can also change. I read the series of blog posts about what Core Erlang is. I also read that Björn Gustavsson recommends to use the Erlang Abstract Format in Core Erlang: Why have <e_1,...,e_n>? - #9 by bjorng . But it is still unclear to me what I should and should not expect.

  1. If I’m working with Core Erlang code (in its Erlang term format, not the textual one), is it ok to pass a Core Erlang code produced by the compiler of Erlang N as an input to the compiler of Erlang N+1? And the opposite, if the original Erlang source code is backward-compatible?
  2. When writing a core_transform module using the cerl API, is it possible to have the same module work with any versions of the compiler? Or will it need to be adapted?

I would love an official statement :slight_smile: Thank you!

An official statement can be found in the documentation for the compiler in OTP 29:

It is usually ok in both directions. We only do changes to how we use Core Erlang when we have a good reason.

However, going from OTP 29 to OTP 30 or vice versa does not work directly. In OTP 29 and earlier, the records representing Core Erlang constructs are tuple records, while in OTP 30 they will be native records.

Within reason, yes. In any case, it should be more portable than directly using the records. However, the cerl API doesn’t automatically protect you against all possible changes, such as the changes in PR-2521.

Thank you! Indeed, I didn’t read the “Recommendations for language implementors”.

Given what is written in this section of the documentation, plus what you added here, it might not be safe or future-proof to rely entirely on Core Erlang code for my use case, even though it would be way easier and more straightformard to work with Core Erlang. I’m thinking of anonymous functions extracted as Core Erlang with e.g. Erlang 29, but recompiled/executed with Erlang 30: it might be a problem during upgrades of Erlang.

Therefore, the next questions that I have are:

How to map a anonymous function reference back to the function definition in the Erlang Abstract Format

For now:

  1. I compile the abstract code obtained from the module debug_info to assembly

  2. Locate the function -my_function/0-fun-0-

  3. Get the line of the function and the line of the first executable instruction in this assembly

  4. Find this location back in the abstract code.

I believe I can have a more reliable way of doing it via Core Erlang which already contains the name of the anonymous function and a precise location (with line and column).

Would all versions of the compiler compute the same anonymous function name?

How to map the environment variables passed to the anonymous function to variable definitions in the abstract code before the function definition

Some context: because I want the anonymous function to be standalone, I want to rewrite:

Var = receive M -> M end,
fun() ->
    do_something(Var)
end

to:

%% The value of `VarFromEnvironment' is taken from the function info and stored with
%% the extracted function/generated module.
fun(VarFromEnvironment) ->
    do_something(VarFromEnvironment)
end

Based on the new (to me) information from your answer, I should be able to:

  1. Compile the abstract code to Core Erlang
  2. Walk though it to see if variables were renamed, building a map of “SourceCodeName” → “_internalname” in the process
  3. Use this to sort environment variables; also filtering variables that were inlined because they were literals.

Would all versions of the compiler rename variables the same way?

I understand these two ploblems would not exist if I worked directly on only with Core Erlang, as Core Erlang already knows the anonymous function name, has the renamed variables and even supports a single clause.

Does it mean that a Core Erlang code produced by Erlang 29 can still be passed to the Erlang 30 compiler? Or because Core Erlang in Erlang 30 will use native records to represent e.g. #c_var{} and so on, the Erlang 30 compiler will no longer understand and support Core Erlang code based on tuple records?