Investigating Slow Boot Time in Small Erlang Release - Seeking Optimization Strategies

garazdawi · June 3, 2025, 7:06am

We have tried to optimize erlc for fast startup, when doing so we use these flags +sbtu +A0 -mode minimal. We also make sure to load exactly the modules that we know we will need from the compiler and stdlib so that we don’t have to pay for the interactive lookup, nor the cost of loading modules that we don’t need.

zabrane · June 3, 2025, 8:41am

@garazdawi I tested +sbtu +A0 with no substantial boot time improvement.

Following @starbelly 's code pointers, I also tried optimizing the module loading logic in load_modules/2 to eliminate list comprehensions and reduce intermediate allocations:

do_load_modules(Mods, F, Init) ->
    case erl_prim_loader:get_modules(Mods, F) of
        {ok,{Prep0,[]}} ->
            filter_modules(Prep0, Init);
        {ok,{_,[_|_]=Errors}} ->
            Ms = [M || {M,_} <- Errors],
            exit({load_failed,Ms})
    end.

filter_modules(Prep0, Init) ->
    filter_modules(Prep0, [], [], [], Init).

filter_modules([{Mod,{prepared,Code,Full}}|Rest], Prepared, Loaded, OnLoad, Init) ->
    filter_modules(Rest, [Code|Prepared], [{Mod,Full}|Loaded], OnLoad, Init);
filter_modules([{Mod,{on_load,Beam,Full}}|Rest], Prepared, Loaded, OnLoad, Init) ->
    filter_modules(Rest, Prepared, [{Mod,Full}|Loaded], [{Mod,Beam,Full}|OnLoad], Init);
filter_modules([], Prepared, Loaded, OnLoad, Init) ->
    ok = erlang:finish_loading(Prepared),
    load_rest(OnLoad, Init, Loaded).

load_rest([{Mod,Beam,Full}|T], Init, Acc) ->
    do_load_module(Mod, Beam),
    load_rest(T, Init, [{Mod,Full}|Acc]);
load_rest([], _Init, []) ->
    ok;
load_rest([], Init, Acc) ->
    Init ! {self(),loaded,Acc},
    ok.

Result: No meaningful boot time reduction (30ms - 40ms) despite cleaner code paths.

The bottleneck appears to be in erlang:finish_loading/1 and individual module processing rather than list operations. Not sure, OTP experts please.

maxlapshin · June 3, 2025, 11:42am

Oh, we are in the start of “multithreaded module loading” initiative?

zabrane · June 3, 2025, 5:51pm

Development vs Production Boot Time Analysis

When running rebar3 shell with -init_debug on the development machine (Apple M4 16GB), the kernel application starts significantly faster:

$ ERL_FLAGS="-args_file ./config/vm.args -init_debug" rebar3 shell
{progress,preloaded}
{path,["$ROOT/lib/kernel-10.3/ebin","$ROOT/lib/stdlib-7.0/ebin"]}
{done_in_microseconds,7}
{primLoad,[error_handler,application,application_controller,application_master,code,code_server,erl_eval,erl_lint,erl_parse,error_logger,ets,file,filename,file_server,file_io_server,gen,gen_event,gen_server,heart,kernel,logger,logger_filters,logger_server,logger_backend,logger_config,logger_simple_h,lists,proc_lib,supervisor]}
...
{apply,{application,start_boot,[kernel,permanent]}}
{done_in_microseconds,31907} <============
{apply,{application,start_boot,[stdlib,permanent]}}
{done_in_microseconds,15}
...

Key Observations:

Kernel boot: ~32ms (dev) vs ~3000ms (production)
100x performance difference suggests fundamental differences in:
- Module loading strategy (interactive vs embedded mode)
- Available system resources
- Container overhead
- Release packaging differences

Questions:

Are there significant differences in the number of preloaded modules between environments?
Could container resource constraints be impacting boot performance?

starbelly · June 7, 2025, 3:23pm

I tried to replicate this using docker on my laptop, but could not, consistently anyway. There were a few times when a container first initialized that the boot was slow, but not consistent. That would of course seem to indicate it was unrelated to erlang, directly. However, I also tested with a release that does nothing, just boots.

I think it’d be helpful to know more about what production looks like.

Of course. Also, did you test locally and in production using the same code base? Once again, it would be helpful to know what production looks like, and even more so, a way to reproduce this.

zabrane · August 11, 2025, 6:52pm

Problem definitely solved! @dgud’s networking intuition was spot-on - kernel does network setup during boot

Root Cause: Stale Configuration

CI/CD config had a copy-paste from an old project with a dead node reference:

% PROBLEMATIC (causing 13-seccond, then 3-second delay)
{kernel, [ {shell_history, enabled}
         ...
         , {sync_nodes_optional, [ 'hpi_45842' ]}
         ]}

The Fix

Simply commented out the dead node reference:

% FIXED (boot time now ~800ms)  
{kernel, [ {shell_history, enabled}
         ...
      %% , {sync_nodes_optional, [ ]}
         ]}

What Happened

sync_nodes_optional made kernel wait for non-existent node 'hpi_45842', causing connection retries and timeouts before proceeding with boot.

Performance Impact:

Before: 13sec
After: 800ms
Improvement: ~95% faster boot time

Note to self: Always audit config files when migrating between projects. A single line of stale configuration can create major operational overhead, especially in containerized deployments where fast startup is critical.

Huge thanks to @michalmuskala @dgud @starbelly @garazdawi @Led @maxlapshin for all the debugging guidance and insights throughout this investigation!