What determines whether new instructions are supported by the JIT?

TD5 · August 21, 2023, 12:20pm

I can see OTP uses asmjit for emitting native instructions from the BEAM’s JIT. asmjit itself seems to regularly add new instructions as they become available on new microarchitectures.

How does the OTP team decide whether to add support for emitting new instructions?

I can see, for example, there’s some AVX and AVX-512 support inside OTP (very nice!), but how are new instructions evaluated? Also, is the OTP team open to pull requests which make use of new instructions? My understanding has been that the JIT is purposely kept fairly lightweight for maintainability reasons, so perhaps contributions with that spirit of simplicity are more welcome?

bjorng · August 22, 2023, 6:24am

We use new instructions if there is some clear benefit, for example making the emitted code faster and/or smaller.

Note that the JIT-enabled runtime system must work on any x86_64/AArch64 CPU, so there must always be a fallback that does not use the new instructions. That means that we never use new instructions to simplify the code generators in the JIT.

Yes.

Yes, the JIT must be possible to maintain by our current team.

As the JIT translates all code that is loaded, the code generator must be fairly fast. So in that sense the JIT must be lightweight.

We appreciate simplicity, but an absolute requirement is correctness. We avoid approaches to code generation that cannot be verified (usually by testing) to be correct in all circumstances. For example, with very few exceptions, we don’t allow dependencies between BEAM instructions. That is, the code generation for one BEAM instruction is generally not allowed to assume that the code for the previous BEAM instruction has left certain values in CPU registers. The reason is that would be hard to verify the correctness for all combinations of BEAM instructions, and a small change to the code generation for a BEAM instruction could introduce a bug.

It is allowed to leverage dependencies between instructions if it can be done in an automatic way that cannot be broken by a small change to code generation. In OTP 26, we introduced an optimization that avoids reading a BEAM register from memory if the value is available in a CPU register.