Compiler performance regression OTP 23.1+

I recently upgraded a couple of elixir applications that were on OTP 23.0 to 25.1. I noticed a significant performance hit while compiling an application. E.g. for an application with ~50 dependencies, compiling just the top level project (dependencies already compiled) went from 20 seconds to 80 seconds. I tried a few different OTP versions and found the behavior consistent with anything 23.1+.

strace shows a major change: .beam file lookup in the code path switched from using stat() to open() syscall. These look ups happen a lot while compiling (100-150k seems normal for these applications). If I’m understanding correctly, it seems on the order of Enum.sum(1..num_dependencies). In my environment, the open() calls take significantly longer and (cumulatively) add up to roughly the time total time difference.

I believe the system call change occurred with this commit.

I put together a little performance test for open vs stat. Sure enough in my environment, open() take about 200 us vs 2 us for stat(). It’s fair to say that is absurd (cause it is), but I assume this is the consequence of some corporate on-access virus scanners (or something else outside of my control).

Are there any clever ideas for how to remediate this problem?


Just to give you another data-point: On my non-corporate-infested laptop, both open and stat take about 2 µs (with open sometimes peaking at 5 while stat being consistent). You could try to just revert the patch (and live with the possible racyness) or to find the exact location of the check (which is probably actually somewhere in mix).


Thanks for the additional data point. I agree on many modern systems, it seems the performance difference between the two calls is small. My hunch is there are other systems with a more noticeable difference between the two, but I don’t have many hard numbers to back it up (though here’s one report of a 3x difference).

I should point out the change from stat() to open() is for all calls to code:load_file/1, whether that’s the compiler or a normal module load. Since these calls are made tens of thousands of times (depending on the number of dependencies / .beam files to load), even a small change in the performance of those calls can be noticeable for things like the start up speed of an application.

Note there is at least one proposal to speed up the code load speed, so these problems are seen beyond my bloated corporate environment.

I have reverted 2e16d7 in my environment, which solves my immediate problem. I do suspect the broader OTP community would be server better by searching the code path with stat() instead of open().

1 Like

Have you identified the actual part of code:load_file that runs into this? The patch as such strikes me as very reasonable, as the unpatched form has race conditions. If these can be handled in the code server directly, it could probably just be adjusted there.

1 Like

For what it’s worth, dynamic linking has many benefits,
BUT it means that you never know in advance what you’re
going to get when you call a system function like stat()
or open(). On a good day with the wind behind it, my old
laptop running Bionic Beaver gets 6.9 usec for stat() and
9.1 usec for close(open()), BUT run it under strace and
watch those times explode. If you don’t know what “corporate”
are up to, you have no particular reason to trust the results
of stat(), or anything else, really.

Possibly the simplest way of avoiding overheads would be
to pack library files into an analogue of .zip / .ar files,
open the archive once and read the catalogue into a hash
table, and do stat() by looking in the hash table.

This is an old technique: Smalltalk-80 kept all sources in
a .sources file and a .changes file. Way back in the 1990s
I suggested this as an idea for Erlang, through SERC, noting
that spaceless word-coded source files offered good compression.
(A nice marriage of Erlang and Information Retrieval…)

1 Like

I think this commit would change the behavior of code:load_file to use stat() instead of open().

Some simple performance tests (e.g. time iex -S mix eval ":erlang.halt()" in an elixir project) seem around 50 ms faster in a non-corporate-bloated environment, but I haven’t measured it rigorously.