We have tried to optimize erlc for fast startup, when doing so we use these flags +sbtu +A0 -mode minimal. We also make sure to load exactly the modules that we know we will need from the compiler and stdlib so that we don’t have to pay for the interactive lookup, nor the cost of loading modules that we don’t need.
@garazdawi I tested +sbtu +A0 with no substantial boot time improvement.
Following @starbelly 's code pointers, I also tried optimizing the module loading logic in load_modules/2 to eliminate list comprehensions and reduce intermediate allocations:
do_load_modules(Mods, F, Init) ->
case erl_prim_loader:get_modules(Mods, F) of
{ok,{Prep0,[]}} ->
filter_modules(Prep0, Init);
{ok,{_,[_|_]=Errors}} ->
Ms = [M || {M,_} <- Errors],
exit({load_failed,Ms})
end.
filter_modules(Prep0, Init) ->
filter_modules(Prep0, [], [], [], Init).
filter_modules([{Mod,{prepared,Code,Full}}|Rest], Prepared, Loaded, OnLoad, Init) ->
filter_modules(Rest, [Code|Prepared], [{Mod,Full}|Loaded], OnLoad, Init);
filter_modules([{Mod,{on_load,Beam,Full}}|Rest], Prepared, Loaded, OnLoad, Init) ->
filter_modules(Rest, Prepared, [{Mod,Full}|Loaded], [{Mod,Beam,Full}|OnLoad], Init);
filter_modules([], Prepared, Loaded, OnLoad, Init) ->
ok = erlang:finish_loading(Prepared),
load_rest(OnLoad, Init, Loaded).
load_rest([{Mod,Beam,Full}|T], Init, Acc) ->
do_load_module(Mod, Beam),
load_rest(T, Init, [{Mod,Full}|Acc]);
load_rest([], _Init, []) ->
ok;
load_rest([], Init, Acc) ->
Init ! {self(),loaded,Acc},
ok.
Result: No meaningful boot time reduction (30ms - 40ms) despite cleaner code paths.
The bottleneck appears to be in erlang:finish_loading/1 and individual module processing rather than list operations. Not sure, OTP experts please.
Oh, we are in the start of “multithreaded module loading” initiative?
Development vs Production Boot Time Analysis
When running rebar3 shell with -init_debug on the development machine (Apple M4 16GB), the kernel application starts significantly faster:
$ ERL_FLAGS="-args_file ./config/vm.args -init_debug" rebar3 shell
{progress,preloaded}
{path,["$ROOT/lib/kernel-10.3/ebin","$ROOT/lib/stdlib-7.0/ebin"]}
{done_in_microseconds,7}
{primLoad,[error_handler,application,application_controller,application_master,code,code_server,erl_eval,erl_lint,erl_parse,error_logger,ets,file,filename,file_server,file_io_server,gen,gen_event,gen_server,heart,kernel,logger,logger_filters,logger_server,logger_backend,logger_config,logger_simple_h,lists,proc_lib,supervisor]}
...
{apply,{application,start_boot,[kernel,permanent]}}
{done_in_microseconds,31907} <============
{apply,{application,start_boot,[stdlib,permanent]}}
{done_in_microseconds,15}
...
Key Observations:
- Kernel boot: ~32ms (dev) vs ~3000ms (production)
- 100x performance difference suggests fundamental differences in:
- Module loading strategy (interactive vs embedded mode)
- Available system resources
- Container overhead
- Release packaging differences
Questions:
- Are there significant differences in the number of preloaded modules between environments?
- Could container resource constraints be impacting boot performance?
I tried to replicate this using docker on my laptop, but could not, consistently anyway. There were a few times when a container first initialized that the boot was slow, but not consistent. That would of course seem to indicate it was unrelated to erlang, directly. However, I also tested with a release that does nothing, just boots.
I think it’d be helpful to know more about what production looks like.
Of course. Also, did you test locally and in production using the same code base? Once again, it would be helpful to know what production looks like, and even more so, a way to reproduce this.
Problem definitely solved! @dgud’s networking intuition was spot-on - kernel does network setup during boot ![]()
Root Cause: Stale Configuration
CI/CD config had a copy-paste from an old project with a dead node reference:
% PROBLEMATIC (causing 13-seccond, then 3-second delay)
{kernel, [ {shell_history, enabled}
...
, {sync_nodes_optional, [ 'hpi_45842' ]}
]}
The Fix
Simply commented out the dead node reference:
% FIXED (boot time now ~800ms)
{kernel, [ {shell_history, enabled}
...
%% , {sync_nodes_optional, [ ]}
]}
What Happened
sync_nodes_optional made kernel wait for non-existent node 'hpi_45842', causing connection retries and timeouts before proceeding with boot.
Performance Impact:
- Before: 13sec
- After: 800ms
- Improvement: ~95% faster boot time
Note to self: Always audit config files when migrating between projects. A single line of stale configuration can create major operational overhead, especially in containerized deployments where fast startup is critical.
Huge thanks to @michalmuskala @dgud @starbelly @garazdawi @Led @maxlapshin for all the debugging guidance and insights throughout this investigation!