Investigating Slow Boot Time in Small Erlang Release - Seeking Optimization Strategies

It is not intentionally complicated. It is the way it is because no-one has contributed that functionality yet.

If you want to only the emu VM you can pass --disable-jit.

1 Like

I’m sorry, maybe I was wrong.

I know, but I would like both:)

You’re welcome :smiley: A few things to note however :

  1. You can absolutely use short names, there’s no problem with that. Depending on your use case, short names may be preferable. For example, if you use foo@127.0.0.1 as the node name in Tristan’s epmdlessless example project, you will not be able to connect from one node to another. However, if you don’t plan on connecting nodes like that, then using 127.0.0.1 is fine. Otherwise, you need to use short names, long names where ip address are distinct and reachable from nodes that would connect to others, or a long name with a resolvable host part.

  2. The issue that I ran into and I believe you were running into was simply because the host you were attempting to connect from did not have an entry in /etc/hosts (or similar). When bin/app eval ... is called the extended_bin script will try to determine the host name to use if a long name is not used or provided as an argument to the script. As such, it may end up with a non-fqdn as the name to use (not the name of the node you’re connecting to) for erl_call (such as foo or foo.local). Given that such a host entry does not exist in /etc/hosts, resolution will fail.

  3. When using a long name such as foo@127.0.0.1, no binding happens, it’s literally just a name. However, when trying to connect to a node with this name from say another node, the connection attempt will be made to `127.0.0.1, which of course will not work.

I might push up some examples later on to illustrate some of these concepts and snags you can run into.

Glad you got resolved :slight_smile:

2 Likes

Wow! Great thread!

Seems that we have to pay for JIT with a non-instant boot time?

Seems that we have to pay for NOTHING with a non-instant boot time.

1 Like

I’ve tried to recompile 28.0 using kerl with --disable-jit by I ended up with JIT still enabled :roll_eyes:

Do it rigth:

KERL_CONFIGURE_OPTIONS="--disable-jit" kerl build 28.0
2 Likes

This is what I did. Gonna try one more time

@Led Thanks for the guidance! Successfully got it running without JIT on a fresh VM.

$ erl
Erlang/OTP 28 [erts-16.0] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1]
Eshell V16.0 (press Ctrl+G to abort, type help(). for help)

Boot Performance without JIT

{apply,{application,start_boot,[kernel,permanent]}}
{done_in_microseconds,3060058}
...

As you can see, the boot time remains comparable to JIT-enabled configuration, but memory consumption dropped by ~20% with JIT disabled.

This confirms that for our use case, JIT overhead isn’t justified by the performance gains - we’re getting significant memory savings without sacrificing startup performance.

interactive mode?
This could be more noticeable on embedded mode and large project with many modules.

1 Like

Yes, interactive mode.

Agreed. This project is relatively small, so the 20% memory reduction may be more significant in larger codebases. I’ll coordinate with a colleague working on a larger project to run similar benchmarks and report back with those findings.

The current results suggest JIT overhead may not be worthwhile for smaller applications, but larger projects could show different trade-offs.

@Led appreciate your continued guidance on this investigation.

You can also use {debug_info, strip} relx’s parameter.
And strip built OTP like this:

erl -noshell -eval 'beam_lib:strip_release("/usr/local/lib/erlang", ["Attr"])' -s init stop
1 Like

Got it - so {debug_info, strip} only strips debugging info from my app and its dependencies, not OTP itself.

Would this be a correct approach instead:

$ rebar3 as prod release
$ erl -noshell -eval 'beam_lib:strip_release("_build/prod/rel/<release_name>", ["Attr"])' -s init stop

This should strip the entire release directory including the bundled OTP libraries, right?

I’m not sure. I never use bundled OTP.

I think it worked

$ ls -la /opt/erlang/28.0-no-jit/lib/kernel-10.3/ebin
-rw-r--r-- 1 zab zab  27760 May 30 18:13 application.beam
-rw-r--r-- 1 zab zab  91892 May 30 18:13 application_controller.beam
-rw-r--r-- 1 zab zab  15904 May 30 18:13 application_master.beam
-rw-r--r-- 1 zab zab   3620 May 30 18:13 application_starter.beam
-rw-r--r-- 1 zab zab  20748 May 30 18:13 auth.beam
-rw-r--r-- 1 zab zab  63676 May 30 18:13 code.beam
-rw-r--r-- 1 zab zab  61588 May 30 18:13 code_server.beam
-rw-r--r-- 1 zab zab  77336 May 30 18:13 disk_log_1.beam
...

$ ls -la _build/prod/rel/mirakle/lib/kernel-10.3/ebin/
-rw-r--r-- 1 zab zab  3962 May 30 20:25 application.beam
-rw-r--r-- 1 zab zab 20826 May 30 20:25 application_controller.beam
-rw-r--r-- 1 zab zab  4169 May 30 20:25 application_master.beam
-rw-r--r-- 1 zab zab   947 May 30 20:25 application_starter.beam
-rw-r--r-- 1 zab zab  4732 May 30 20:25 auth.beam
-rw-r--r-- 1 zab zab 11333 May 30 20:25 code.beam
-rw-r--r-- 1 zab zab 14484 May 30 20:25 code_server.beam
-rw-r--r-- 1 zab zab 15714 May 30 20:25 disk_log_1.beam
...

Really, why?

Erlang Boot Time with ā€œtmpfsā€

I tested loading the Erlang lib directory (/app/mirakle/lib) as tmpfs within Docker to eliminate disk I/O as a potential bottleneck:

docker run --tmpfs /tmp/erl_libdir:size=256M,exec \
--mount type=bind,source=/tmp/erl_libdir,target=/app/mirakle/lib mirakle

Results

Boot time remained unchanged at ~3 seconds:

{apply,{application,start_boot,[kernel,permanent]}}
{done_in_microseconds,3053565}

Conclusion

This confirms that disk I/O isn’t the limiting factor for boot performance. The bottleneck appears to be VM initialization and application loading rather than file access.

I believe way up you said :

You should try switching back :smiley:

1 Like

Which parts of the Erlang/OTP codebase handle the embedded vs interactive boot modes?

I’d like tto understand how these modes differ in the code loading process. Specifically looking for:

  • Where the mode decision affects module loading behavior
  • Code paths that handle preloading vs lazy loading
  • Any performance-critical differences in the boot sequence

This is documented here . I’ll try to answer in a little more detail though, however, I feel like some of this might need to be addressed by an OTP team member.

In embedded mode, all modules required and/or specified (as in a release) are loaded during. init. Specifically, if the mode is embedded prim load is set to true, and results in all modules (required and specified) being loaded. While there are no strict rules here, I would say, interactive mode is for testing and development. Besides possibly having a faster boot time, you also get some guarantees with embedded (i.e., you know before your application starts everything is ready and nothing shall be implicitly loaded after).

If you follow the path in the load_modules function, you’ll see a decision is made whether to load modules one at a time to reduce peak memory usage or just load in parallel.

You should have also noted that in the case of interactive mode, nothing is done. This of course means, modules will be lazily loaded as functions on modules are called, apply/3 is used, etc. Thus, we continue to boot.

As applications start that contain modules that were not loaded ahead of time and are referenced via a function call, apply/3, and so on, and can not be found through several twists and turns, we will end up in the error handler for the process. Every process by default has an error handler, and by default it is the error_handler module. As an example, if you call foo:bar(bla), and the function bar with arity 1 on module foo can not be found, we should end up in error_handler:undefined_function/3 . In turn this will call the ensure_loaded function defined in this module. This will look for the code_server, and fallback to init if it can not be found. Unless something has gone horribly wrong, code:ensure_loaded/1 shall be called. Now we can see there that a check is done to get the mode the system is running in (embedded or interactive), and only attempt to load code if it is interactive. The code is loaded, then back in the error handler, a check is to see if the function is exported, and if so, apply/3 is called.

There’s a lot that has to happen in order for a module to be loaded this way right? :slight_smile: Given that every function call has to be scheduled, the error handler called, the code loaded, etc and the process might have to be scheduled back out and in, a few times over during all of it, well, that’s a good bit of overhead. Ergo, the boot time should be slower, as it should take longer for applications to startup per the above.

Additionally, there’s no guarantee that all of all your modules have been loaded either. Maybe some of your application code isn’t referenced until a certain event happens. As such you may have some initial latency issues on paths that trigger a module load for the first time some time in the future after the system has been started.

It’s also quite nice that the loading of modules in embedded mode must be explicit too :slight_smile:

I think that answers perhaps some of your questions and gives you some starting points if you’re interested in tracing further.

There is one more thing to to note, you will have a slower boot with jit enabled, since the translation to native machine code must be performed. Yet, the boost in runtime performance should be worth it.

2 Likes

@starbelly Thank you for the detailed explanation and code pointers! This really clarifies the difference between embedded and interactive modes.

I’ll trace through the load_modules function you mentioned to better understand the parallel vs sequential loading decision.

Much appreciated for the comprehensive response!

1 Like