Proposed changes to the Erlang archives - are you using them?

max-au · January 23, 2023, 12:43am

I know of a similar thing for .NET, but haven’t seen many other implementations. There are enough caveats with that approach compared to existing approach. It won’t be easy to move such binaries between machines (especially when JIT is smart enough to leverage CPU instruction support). Even on the same machine I can imagine a failure mode when OS or system component update renders precompiled binary unusable.

Hence I’d be more interested in leveraging multi-core CPUs to their full extend. In my experience, while Erlang shines at executing concurrent code, the startup and shutdown is still mostly single-threaded.

josevalim · January 23, 2023, 7:30am

Sorry, I didn’t get this part. Do you mean application_controller can load apps concurrently today? If so, what would be the API?

max-au · January 23, 2023, 5:02pm

Do you mean application_controller can load apps concurrently today

It can start applications concurrently (with application:start/1). Although it’s not documented, application_controller internally has a list of starting applications. So if myapp1 and myapp2 do not have any mutual dependencies, you can run

spawn(fun () -> application:start(myapp1) end),
spawn(fun () -> application:start(myapp2) end),

Unfortunately, boot script does not leverage that (as it’s sequential).

(to my reading of the code, this non-blocking startup was implemented for dist_ac cases, hence my speculation about “a different reason” rather than speeding up boot sequence).

bjorng · January 24, 2023, 6:12am

The two low-level (undocumented) BIFs used by code:atomic_load/1 and by the init module are erlang:prepare_loading/2 and erlang:finish_loading/1.

erlang:prepare_loading(ModuleName, Beam) will prepare a module for loading, returning a “magic” term. Calls to prepare_loading can be made in parallel (in different Erlang processes), because it does not update any system tables (except, I think, the atom table). prepare_loading does all the heavy work of parsing all chunks in the BEAM binary, including code generation for the JIT or instruction loading for the traditional BEAM interpreter.

erlang:finish_loading/1 will take a list of those prepared “magic” terms and finish the loading for all of them once, doing all necessary updates to system tables.

josevalim · January 24, 2023, 8:07am

Thank you! If that’s the case, it should be even easier (I hope I am not being too optimistic) because it should be a matter of adding code in application that builds the DAG and runs it. Then we can expose it for use in releases. I will carve some time to play with this too.

Oh, this is very interesting! Could code:ensure_loaded/1 then use the same functionality as code:ensure_modules_loaded/1 and code:atomic_load/1? This would reduce the amount of work and blocking done the code server and enable more concurrency when loading modules too (specially on interactive mode via the error handler). Or are there pitfalls here, especially in relation to on_load?

bjorng · January 24, 2023, 8:28am

Not sure what you mean exactly. code:ensure_loaded/1 is only given one module, so I don’t see how using prepare_loading and finish_loading directly would increase the concurrency. code:ensure_loaded/1 already uses them indirectly, because the old traditional erlang:load_module/2 BIF is now implemented in terms of them:

load_module(Mod, Code) ->
    try
        Allowed =
            case erlang:module_loaded(erl_features) of
                true ->
                    erl_features:load_allowed(Code);
                false -> ok
            end,
        case Allowed of
            {not_allowed, NotEnabled} ->
                {error, {features_not_allowed, NotEnabled}};
            ok ->
                case erlang:prepare_loading(Mod, Code) of
                    {error,_}=Error ->
                        Error;
                    Prep when erlang:is_reference(Prep) ->
                        case erlang:finish_loading([Prep]) of
                            ok ->
                                {module,Mod};
                            {Error,[Mod]} ->
                                {error,Error}
                        end
                end
        end
    catch
        error:Reason -> error_with_info(Reason, [Mod, Code])
    end.

Yes, there are always pitfalls with on_load, , but not when loading a single module. erlang:finish_loading/1 will refuse to handle a list of more than one prepared BEAM module if any of them has an on_load function. There is another function to help handling on_load: erlang:has_prepared_code_on_load(Prepared).

josevalim · January 24, 2023, 8:30am

My thought was to execute most of the module loading on the client and only do finish loading on the code server with the hopes that would increase concurrency.

bjorng · January 24, 2023, 8:31am

Yes, good idea. I didn’t think of that.

max-au · January 24, 2023, 7:11pm

That, combined with modules loading in parallel, plus boot script re-processing, is exactly what the code I linked is doing. Also taking into account problems with on_load (that fails atomic_load and therefore should be retries with non-atomic load_binary function call).

It works well for us, but the amount of hacks I put in there is quite ridiculous. For example, concurrent shutdown (traversing the DAG in the opposite direction) hacks the internal state of the application controller.

That is why I think it would be the best to refactor application_controller to include official support for all of these features. Also, I wonder if there are any real users of dist_ac (and distributed applications in general). I’d really love to deprecate that feature, as it introduces a lot of complexity that blocks application_controller replacement.

peerst · January 24, 2023, 9:01pm

Well its the same caveats that apply to any compiled C program (like also the Erlang Runtime is). OS component updates only can render binaries unusable if one uses shared libraries (but usually several versions of each shared library are available on an OS. The other case would be on incompatible changes to the system call interface (which are almost never done, exactly because it breaks everything)

Portability of the binaries would be exactly the same as for the Erlang Runtime System, so if you can transplant a Erlang release that contains the runtime you could also transplant a Erlang runtime with baked in statically linked beam files.

But this is just another positive side effect of statically linked BEAM files and not my main reason to build them.

BTW shipping compiled binaries was the sole way commercial software was distributed in the past so it usually works better than people think.

These are orthogonal use cases. Statically linked BEAM files are not good for the interactive development use case but for code that’s in production and needs to start fast (either because its started many times or needs be started quickly on embedded systems).

I’m looking into this mainly for embedded systems where the expectation is they are functional 5 seconds after switching them on (and there are many things to do before a erlang runtime even starts). On smaller systems usually one has only one (or few cores) – crunching at full speed on multiple cores often would use too much energy. Also often the filesystem and/or storage is quite slow especially for looking up many small files in nested directories.

When looking at normal operating systems the complete combination of runtime and BEAM files would be mmapped and if the same release is run multiple times in parallel or frequently all pages would stay in RAM and be sharable. This can open up new use cases since the common caveat that Erlang is not starting fast enough for building command line tools is gone then.

There is also an intermediary use case when building Unikernel based Erlang Relases which can be started super quickly directly on a hypervisor.

Generally it doesn’t hurt if after the completely skipped module loading and JITting application startup is done in parallel. So lots of both approached can be combined for even better startup performance.

josevalim · January 24, 2023, 9:19pm

I have finally submitted a pull request with the initial sketch of this: Add code path caching by josevalim · Pull Request #6729 · erlang/otp · GitHub

wojtekmach · January 24, 2023, 10:51pm

This is very compelling for CLIs indeed. Relatively slow boot time is part of the problem. Another part is actually deploying these CLIs on people’s machines. Tools like GitHub - burrito-elixir/burrito: Wrap your application in a BEAM Burrito! exist to create executables that are self-extracting archives of app releases. If we could instead have a single executable that is the app release, that would go a long way. Statically linking BEAM modules would simplify deploying to another target: iOS. Thank you for looking into this @peerst.

josevalim · January 25, 2023, 7:20am

Cool. I will submit a PR for this as well. I was also thinking we can apply “similar” ideas to purge/delete too. The erts_code_purger is already a separate process, so I don’t think we need to block the code_server waiting on it. The idea is to check if it is sticky on the code server, go back to the client, and contact the erts_code_purger.

josevalim · January 25, 2023, 12:50pm

@bjorng, I wrote a PR that moved the purging to the client, but now the following test gets stuck:

ts:run(kernel, code_SUITE, on_load_deleted, [batch]).

I think we may run into races if we allow purging while loading is happening. In any case, I am dropping the purging changes for now. You don’t need to take a look, I am only commenting for completeness.

josevalim · January 26, 2023, 11:07am

@max-au here is a pull request that does concurrent application load: Start children applications concurrently by josevalim · Pull Request #6737 · erlang/otp · GitHub - I saw 5% benefits here on a relatively small app. However, I haven’t modified releases to use this new function. I will leave that as an exercise for someone else.