Investigating Slow Boot Time in Small Erlang Release - Seeking Optimization Strategies

garazdawi · May 26, 2025, 10:56am

It is not intentionally complicated. It is the way it is because no-one has contributed that functionality yet.

If you want to only the emu VM you can pass --disable-jit.

Led · May 26, 2025, 11:01am

I’m sorry, maybe I was wrong.

I know, but I would like both:)

starbelly · May 26, 2025, 5:11pm

You’re welcome A few things to note however :

You can absolutely use short names, there’s no problem with that. Depending on your use case, short names may be preferable. For example, if you use foo@127.0.0.1 as the node name in Tristan’s epmdlessless example project, you will not be able to connect from one node to another. However, if you don’t plan on connecting nodes like that, then using 127.0.0.1 is fine. Otherwise, you need to use short names, long names where ip address are distinct and reachable from nodes that would connect to others, or a long name with a resolvable host part.
The issue that I ran into and I believe you were running into was simply because the host you were attempting to connect from did not have an entry in /etc/hosts (or similar). When bin/app eval ... is called the extended_bin script will try to determine the host name to use if a long name is not used or provided as an argument to the script. As such, it may end up with a non-fqdn as the name to use (not the name of the node you’re connecting to) for erl_call (such as foo or foo.local). Given that such a host entry does not exist in /etc/hosts, resolution will fail.
When using a long name such as foo@127.0.0.1, no binding happens, it’s literally just a name. However, when trying to connect to a node with this name from say another node, the connection attempt will be made to `127.0.0.1, which of course will not work.

I might push up some examples later on to illustrate some of these concepts and snags you can run into.

Glad you got resolved

maxlapshin · May 29, 2025, 8:56am

Wow! Great thread!

Seems that we have to pay for JIT with a non-instant boot time?

Led · May 29, 2025, 2:32pm

Seems that we have to pay for NOTHING with a non-instant boot time.

zabrane · May 30, 2025, 5:00pm

I’ve tried to recompile 28.0 using kerl with --disable-jit by I ended up with JIT still enabled

Led · May 30, 2025, 5:05pm

Do it rigth:

KERL_CONFIGURE_OPTIONS="--disable-jit" kerl build 28.0

zabrane · May 30, 2025, 5:32pm

This is what I did. Gonna try one more time

zabrane · May 30, 2025, 6:31pm

@Led Thanks for the guidance! Successfully got it running without JIT on a fresh VM.

$ erl
Erlang/OTP 28 [erts-16.0] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1]
Eshell V16.0 (press Ctrl+G to abort, type help(). for help)

Boot Performance without JIT

{apply,{application,start_boot,[kernel,permanent]}}
{done_in_microseconds,3060058}
...

As you can see, the boot time remains comparable to JIT-enabled configuration, but memory consumption dropped by ~20% with JIT disabled.

This confirms that for our use case, JIT overhead isn’t justified by the performance gains - we’re getting significant memory savings without sacrificing startup performance.

Led · May 30, 2025, 6:37pm

interactive mode?
This could be more noticeable on embedded mode and large project with many modules.

zabrane · May 30, 2025, 6:41pm

Yes, interactive mode.

Agreed. This project is relatively small, so the 20% memory reduction may be more significant in larger codebases. I’ll coordinate with a colleague working on a larger project to run similar benchmarks and report back with those findings.

The current results suggest JIT overhead may not be worthwhile for smaller applications, but larger projects could show different trade-offs.

@Led appreciate your continued guidance on this investigation.

Led · May 30, 2025, 6:48pm

You can also use {debug_info, strip} relx’s parameter.
And strip built OTP like this:

erl -noshell -eval 'beam_lib:strip_release("/usr/local/lib/erlang", ["Attr"])' -s init stop

zabrane · May 30, 2025, 7:02pm

Got it - so {debug_info, strip} only strips debugging info from my app and its dependencies, not OTP itself.

Would this be a correct approach instead:

$ rebar3 as prod release
$ erl -noshell -eval 'beam_lib:strip_release("_build/prod/rel/<release_name>", ["Attr"])' -s init stop

This should strip the entire release directory including the bundled OTP libraries, right?

Led · May 30, 2025, 7:13pm

I’m not sure. I never use bundled OTP.

zabrane · May 30, 2025, 8:30pm

I think it worked

$ ls -la /opt/erlang/28.0-no-jit/lib/kernel-10.3/ebin
-rw-r--r-- 1 zab zab  27760 May 30 18:13 application.beam
-rw-r--r-- 1 zab zab  91892 May 30 18:13 application_controller.beam
-rw-r--r-- 1 zab zab  15904 May 30 18:13 application_master.beam
-rw-r--r-- 1 zab zab   3620 May 30 18:13 application_starter.beam
-rw-r--r-- 1 zab zab  20748 May 30 18:13 auth.beam
-rw-r--r-- 1 zab zab  63676 May 30 18:13 code.beam
-rw-r--r-- 1 zab zab  61588 May 30 18:13 code_server.beam
-rw-r--r-- 1 zab zab  77336 May 30 18:13 disk_log_1.beam
...

$ ls -la _build/prod/rel/mirakle/lib/kernel-10.3/ebin/
-rw-r--r-- 1 zab zab  3962 May 30 20:25 application.beam
-rw-r--r-- 1 zab zab 20826 May 30 20:25 application_controller.beam
-rw-r--r-- 1 zab zab  4169 May 30 20:25 application_master.beam
-rw-r--r-- 1 zab zab   947 May 30 20:25 application_starter.beam
-rw-r--r-- 1 zab zab  4732 May 30 20:25 auth.beam
-rw-r--r-- 1 zab zab 11333 May 30 20:25 code.beam
-rw-r--r-- 1 zab zab 14484 May 30 20:25 code_server.beam
-rw-r--r-- 1 zab zab 15714 May 30 20:25 disk_log_1.beam
...

Really, why?

zabrane · May 30, 2025, 10:56pm

Erlang Boot Time with “tmpfs”

I tested loading the Erlang lib directory (/app/mirakle/lib) as tmpfs within Docker to eliminate disk I/O as a potential bottleneck:

docker run --tmpfs /tmp/erl_libdir:size=256M,exec \
--mount type=bind,source=/tmp/erl_libdir,target=/app/mirakle/lib mirakle

Results

Boot time remained unchanged at ~3 seconds:

{apply,{application,start_boot,[kernel,permanent]}}
{done_in_microseconds,3053565}

Conclusion

This confirms that disk I/O isn’t the limiting factor for boot performance. The bottleneck appears to be VM initialization and application loading rather than file access.

starbelly · May 31, 2025, 12:29am

I believe way up you said :

You should try switching back

zabrane · May 31, 2025, 4:08am

Which parts of the Erlang/OTP codebase handle the embedded vs interactive boot modes?

I’d like tto understand how these modes differ in the code loading process. Specifically looking for:

Where the mode decision affects module loading behavior
Code paths that handle preloading vs lazy loading
Any performance-critical differences in the boot sequence

starbelly · June 2, 2025, 2:10am

This is documented here . I’ll try to answer in a little more detail though, however, I feel like some of this might need to be addressed by an OTP team member.

In embedded mode, all modules required and/or specified (as in a release) are loaded during. init. Specifically, if the mode is embedded prim load is set to true, and results in all modules (required and specified) being loaded. While there are no strict rules here, I would say, interactive mode is for testing and development. Besides possibly having a faster boot time, you also get some guarantees with embedded (i.e., you know before your application starts everything is ready and nothing shall be implicitly loaded after).

If you follow the path in the load_modules function, you’ll see a decision is made whether to load modules one at a time to reduce peak memory usage or just load in parallel.

You should have also noted that in the case of interactive mode, nothing is done. This of course means, modules will be lazily loaded as functions on modules are called, apply/3 is used, etc. Thus, we continue to boot.

As applications start that contain modules that were not loaded ahead of time and are referenced via a function call, apply/3, and so on, and can not be found through several twists and turns, we will end up in the error handler for the process. Every process by default has an error handler, and by default it is the error_handler module. As an example, if you call foo:bar(bla), and the function bar with arity 1 on module foo can not be found, we should end up in error_handler:undefined_function/3 . In turn this will call the ensure_loaded function defined in this module. This will look for the code_server, and fallback to init if it can not be found. Unless something has gone horribly wrong, code:ensure_loaded/1 shall be called. Now we can see there that a check is done to get the mode the system is running in (embedded or interactive), and only attempt to load code if it is interactive. The code is loaded, then back in the error handler, a check is to see if the function is exported, and if so, apply/3 is called.

There’s a lot that has to happen in order for a module to be loaded this way right? Given that every function call has to be scheduled, the error handler called, the code loaded, etc and the process might have to be scheduled back out and in, a few times over during all of it, well, that’s a good bit of overhead. Ergo, the boot time should be slower, as it should take longer for applications to startup per the above.

Additionally, there’s no guarantee that all of all your modules have been loaded either. Maybe some of your application code isn’t referenced until a certain event happens. As such you may have some initial latency issues on paths that trigger a module load for the first time some time in the future after the system has been started.

It’s also quite nice that the loading of modules in embedded mode must be explicit too

I think that answers perhaps some of your questions and gives you some starting points if you’re interested in tracing further.

There is one more thing to to note, you will have a slower boot with jit enabled, since the translation to native machine code must be performed. Yet, the boost in runtime performance should be worth it.

zabrane · June 2, 2025, 9:00am

@starbelly Thank you for the detailed explanation and code pointers! This really clarifies the difference between embedded and interactive modes.

I’ll trace through the load_modules function you mentioned to better understand the parallel vs sequential loading decision.

Much appreciated for the comprehensive response!