GCC optimization flags for Erlang/OTP compilation - performance vs safety

Hi Folks,

I’m compiling Erlang/OTP (28+) from source on Linux using these GCC flags: -O2 -march=native -fomit-frame-pointer -funroll-loops

Looking for community guidance on:

  1. Which additional optimization flags are safe to use for maximum performance? I’ve avoided -O3 due to concerns about potential instability, but wondering if that’s overly cautious for Erlang.
  2. Are any of my current flags considered problematic for production Erlang systems?
  3. For Docker deployments where portability matters, what’s the recommended approach instead of -march=native while still maintaining good performance?

Would appreciate any insights from your production experiences with optimized Erlang builds.

Thanks,

2 Likes

I’ve never heard of -O3 causing instability before. Could you please attach your source for this?

@Benjamin-Philip The -O3 compilation flag frequently underperforms compared to widespread
expectations and also tends to generate larger object files.

This issue is not new and is well-documented across the software development
community: Ubuntu Provides More Insight Into Their Decision Not To “-O3” Optimize All Packages

Even the Linux kernel, one of the most performance-critical codebases in
existence, explicitly uses -O2 as its default optimization level rather than
-O3. This can be seen directly in the official Linux kernel Makefile
where CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE sets KBUILD_CFLAGS += -O2.

Linus (Torvalds) has explicitly rejected attempts to use -O3 in the kernel,
citing concerns about compiler bugs and lack of performance benefits. As one
kernel documentation source notes: “This is the default optimization level for
the kernel, building with the -O2 compiler flag for best performance and
most helpful compile-time warnings.”

Hence my original question.

1 Like
  1. Note that the Erlang/OTP build itself enables -O3 for select files, in particular the BEAM emulator beam_emu.c. (I haven’t checked if that’s changed with JIT.) -O3 has historically been problematic because it enabled auto-vectorization which has had a number of bugs. However some of that is enabled even at -O2 these days.
  2. -march=native is problematic if build and run hosts aren’t exactly the same. You may even run into problems on big.LITTLE systems unless the two sets of cores have identical feature sets.

We use rpmbuild’s defaults, currently AL2023 but previously AL2 and generations of CentOS.

3 Likes

Some other flags to consider:

  • if -march=native is too restrictive, usually you can find a combination of -march and -mtune that will allow you to compile for processors newer than from the 90s, to take advantage of newer features
  • -fdata-sections and -ffunction-sections combined with the linker flag -Wl,–gc-sections can shrink the size of the binary quite a bit by removing unused stuff
  • Finally, there’s various flavours of LTO, PGO, and BOLT that combined can give really nice wins, but usually involve some more complex setup
3 Likes