JIT bug(?) on OTP 26: Assertion failed: !err && "Could not resolve all links"

I run into the error below on Mac OS Sequoia 15.0 on arm64, using borh OTP 26.2.5 and 26.2.5.3. Unfortunately I don’t have a minimal code that triggers the issue, only this error message:
beam/jit/beam_jit_common.cpp:89:codegen() Assertion failed: !err && "Could not resolve all links"

/Users/nar/otp/26.2.5.3/.cache/rebar3/bin/rebar3: line 4: 54536 Abort trap: 6 erl -pz /Users/nar/otp/26.2.5.3/.cache/rebar3/vsns/${REBAR3_VSN}/lib/*/ebin +sbtu +A1 -noshell -boot start_clean -s rebar3 main $REBAR3_ERL_ARGS -extra "$@"

I compiled both OTP versions with kerl. I don’t get this error on x86 Linux with the very same software (using OTP 26.2.5.2). I found a similar problem (see JIT raises with "Could not resolve all links" on large binaries in 27.0.1 · Issue #8815 · erlang/otp · GitHub), but it’s only fixed for OTP 27. Is it possible that the same problem exists in OTP 26?

Can you show us a test case? And if you can, please open a ticket on GH.

I can reproduce #8815 on Linux arm64 (M1) with 27.0.1 but not with 26.2.5.2.

Unfortunately not (or at least not yet). It’s a codebase with 3000 lines (without dependencies). What’s more worrying is that yesterday it worked, this morning it doesn’t. I can’t rule out hardware problems :frowning:

I managed to get a stack trace from the lldb debugger (from the running process - I can’t find the core file):

* thread #9, name = 'erts_sched_5', stop reason = signal SIGABRT
  * frame #0: 0x000000019f97e600 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x000000019f9b6f70 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x000000019f8c3908 libsystem_c.dylib`abort + 128
    frame #3: 0x00000001044361c4 beam.smp`erl_assert_error(expr="!err && \"Could not resolve all links\"", func="codegen", file="beam/jit/beam_jit_common.cpp", line=89) at sys.c:959:5 [opt]
    frame #4: 0x00000001042784d0 beam.smp`BeamAssemblerCommon::codegen(this=<unavailable>, allocator=0x00006000005200f0, executable_ptr=0x0000000158c5ec88, writable_ptr=0x0000000158c5ec90) at beam_jit_common.cpp:89:5 [opt]
    frame #5: 0x0000000104278944 beam.smp`BeamModuleAssembler::codegen(this=0x0000000154808200, allocator=<unavailable>, executable_ptr=0x0000000158c5ec88, writable_ptr=0x0000000158c5ec90, in_hdr=0x00000001522851e8, out_exec_hdr=0x000000016c262c38, out_rw_hdr=0x000000016c262c30) at beam_jit_common.cpp:216:20 [opt]
    frame #6: 0x00000001042662a8 beam.smp`beam_load_finish_emit(stp=0x0000000158c5ebf8) at asm_load.c:769:5 [opt]
    frame #7: 0x00000001042e591c beam.smp`load_code(stp=0x0000000158c5ebf8) at beam_load.c:621:12 [opt]
    frame #8: 0x00000001042e5110 beam.smp`erts_prepare_loading(magic=0x0000000158c5ebc8, c_p=<unavailable>, group_leader=<unavailable>, modp=0x000000016c262e40, code=<unavailable>, unloaded_size=<unavailable>) at beam_load.c:193:10 [opt]
    frame #9: 0x00000001042dc7f8 beam.smp`erts_internal_prepare_loading_2(A__p=0x0000000150b92ea8, BIF__ARGS=0x000000016c262e40, A__I=<unavailable>) at beam_bif_load.c:267:14 [opt]
    frame #10: 0x0000000144fe0934
    frame #11: 0x000000014504aab8

Unfortunately I’m not familiar with the OTP code at all, but if you have any idea what to look for in the process state, I can try to look it up. To my uninitiated eye the code variable in frame 5 looks suspicious:

(asmjit::CodeHolder) {
  _environment = {
    _arch = kAArch64
    _subArch = kUnknown
    _vendor = kUnknown
    _platform = kOSX
    _platformABI = kUnknown
    _objectFormat = kUnknown
    _reserved = ""
  }
  _cpuFeatures = {
    _data = {
      _bits = {
        _data = ([0] = 0, [1] = 0, [2] = 0, [3] = 0)
      }
    }
  }
  _baseAddress = 18446744073709551615
  _logger = nullptr
  _errorHandler = 0x0000000154808200
  _zone = {
    _ptr = 0x0000000154852b60 ""
    _end = 0x00000001548569f8 "p\xb6"
    _block = 0x0000000154852a00
     = {
       = (_blockSize = 16328, _isTemporary = 0, _blockAlignmentShift = 0)
      _packedData = 16328
    }
  }
  _allocator = {
    _zone = 0x0000000154808250
    _slots = {
      [0] = 0x0000000154808d80
      [1] = 0x0000000154851c20
      [2] = nullptr
      [3] = 0x0000000154808fe0
      [4] = nullptr
      [5] = nullptr
      [6] = nullptr
      [7] = nullptr
      [8] = nullptr
      [9] = 0x0000000154809220
    }
    _dynamicBlocks = 0x0000000154846800
  }
  _emitters = {
    asmjit::_abi_1_10::ZoneVectorBase = (_data = 0x0000000154808d60, _size = 1, _capacity = 4)
  }
  _sections = {
    asmjit::_abi_1_10::ZoneVectorBase = (_data = 0x0000000154808c20, _size = 2, _capacity = 4)
  }
  _sectionsByOrder = {
    asmjit::_abi_1_10::ZoneVectorBase = (_data = 0x0000000154808c40, _size = 2, _capacity = 4)
  }
  _labelEntries = {
    asmjit::_abi_1_10::ZoneVectorBase = (_data = 0x0000000154846820, _size = 1361, _capacity = 2048)
  }
  _relocations = {
    asmjit::_abi_1_10::ZoneVectorBase = (_data = 0x0000000154816d60, _size = 7, _capacity = 8)
  }
  _namedLabels = {
    asmjit::_abi_1_10::ZoneHashBase = {
      _data = 0x0000000154852a60
      _size = 5
      _bucketsCount = 29
      _bucketsGrow = 26
      _rcpValue = 2369637129
      _rcpShift = '$'
      _primeIndex = '\x02'
      _embedded = {
        [0] = 0x00000001548522a0
      }
    }
  }
  _unresolvedLinkCount = 7
  _addressTableSection = nullptr
  _addressTableEntries = {
    _root = nullptr
  }
}

I’d guess that _unresolvedLinkCount = 7 is the one causing the crash, but it’s really just a guess.

I’d like to add that the same error happens on one of my colleague’s laptop, so I’d guess it’s not a hardware error :frowning: