GCC optimization flags for Erlang/OTP compilation - performance vs safety

Hi Folks,

I’m compiling Erlang/OTP (28+) from source on Linux using these GCC flags: -O2 -march=native -fomit-frame-pointer -funroll-loops

Looking for community guidance on:

  1. Which additional optimization flags are safe to use for maximum performance? I’ve avoided -O3 due to concerns about potential instability, but wondering if that’s overly cautious for Erlang.
  2. Are any of my current flags considered problematic for production Erlang systems?
  3. For Docker deployments where portability matters, what’s the recommended approach instead of -march=native while still maintaining good performance?

Would appreciate any insights from your production experiences with optimized Erlang builds.

Thanks,

1 Like

I’ve never heard of -O3 causing instability before. Could you please attach your source for this?

@Benjamin-Philip The -O3 compilation flag frequently underperforms compared to widespread
expectations and also tends to generate larger object files.

This issue is not new and is well-documented across the software development
community: Ubuntu Provides More Insight Into Their Decision Not To “-O3” Optimize All Packages

Even the Linux kernel, one of the most performance-critical codebases in
existence, explicitly uses -O2 as its default optimization level rather than
-O3. This can be seen directly in the official Linux kernel Makefile
where CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE sets KBUILD_CFLAGS += -O2.

Linus (Torvalds) has explicitly rejected attempts to use -O3 in the kernel,
citing concerns about compiler bugs and lack of performance benefits. As one
kernel documentation source notes: “This is the default optimization level for
the kernel, building with the -O2 compiler flag for best performance and
most helpful compile-time warnings.”

Hence my original question.

1 Like
  1. Note that the Erlang/OTP build itself enables -O3 for select files, in particular the BEAM emulator beam_emu.c. (I haven’t checked if that’s changed with JIT.) -O3 has historically been problematic because it enabled auto-vectorization which has had a number of bugs. However some of that is enabled even at -O2 these days.
  2. -march=native is problematic if build and run hosts aren’t exactly the same. You may even run into problems on big.LITTLE systems unless the two sets of cores have identical feature sets.

We use rpmbuild’s defaults, currently AL2023 but previously AL2 and generations of CentOS.

2 Likes