I’m compiling Erlang/OTP (28+) from source on Linux using these GCC flags: -O2 -march=native -fomit-frame-pointer -funroll-loops
Looking for community guidance on:
Which additional optimization flags are safe to use for maximum performance? I’ve avoided -O3 due to concerns about potential instability, but wondering if that’s overly cautious for Erlang.
Are any of my current flags considered problematic for production Erlang systems?
For Docker deployments where portability matters, what’s the recommended approach instead of -march=native while still maintaining good performance?
Would appreciate any insights from your production experiences with optimized Erlang builds.
Even the Linux kernel, one of the most performance-critical codebases in
existence, explicitly uses -O2 as its default optimization level rather than -O3. This can be seen directly in the official Linux kernel Makefile
where CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE sets KBUILD_CFLAGS += -O2.
Linus (Torvalds) has explicitly rejected attempts to use -O3 in the kernel,
citing concerns about compiler bugs and lack of performance benefits. As one
kernel documentation source notes: “This is the default optimization level for
the kernel, building with the -O2 compiler flag for best performance and
most helpful compile-time warnings.”
Note that the Erlang/OTP build itself enables -O3 for select files, in particular the BEAM emulator beam_emu.c. (I haven’t checked if that’s changed with JIT.) -O3 has historically been problematic because it enabled auto-vectorization which has had a number of bugs. However some of that is enabled even at -O2 these days.
-march=native is problematic if build and run hosts aren’t exactly the same. You may even run into problems on big.LITTLE systems unless the two sets of cores have identical feature sets.
We use rpmbuild’s defaults, currently AL2023 but previously AL2 and generations of CentOS.