That’s the approach I’m also taking (rebar3 new release
in my case, so as to reproduce the directory structure), but as yet I’ve had no luck in reproducing the error either
Hmm, that is both good and bad, I guess The first argument to
--/2
is very strange as it looks like the exit reason for a port. How it has ended up there is a mystery since in the code it is the result of getting a value from a proplist. I’ll spend some additional time on it, but rihght now I have nothing to go on. If you manage to reproduce it on your side, I’ll be happy to take any details you have.
Actually, looking a bit deeper and discussing with @jhogberg, we found some lax code that could explain this behaviour as well as making it rather difficult to reproduce. I’ll tighten the code, and then we hope we do not see it again.
Progress (of a sort).
Going back to the original, failing code and with liberal use of a blunt hatchet, I have managed to get to the point where the problem is evident even when using the minimal files:
-module(misbehaviour).
-callback bar() -> ok.
and
-module(myimpl).
-behaviour(misbehaviour).
-export([bar/0]).
bar() -> ok.
However, if I chop too much of the surrounding application code out, it mysteriously disappears back into the magic hat.
Modulo names, that is exactly identical to what I tested with The difficulty of reproduction most likely comes down to a combination of timing and sending/receiving messages. In the current code, an untagged message is received, which might grab the wrong message. Adding a tag should fix that potential problem.
@cons You might be interested that I am encountering the same error with the repo linked from Ram - an in-memory distributed KV store - #9 by dischoen using 25.0-rc3.
rebar3 compile
===> Verifying dependencies...
===> Fetching ram v0.5.0
===> Fetching ra v2.0.3
===> Fetching aten v0.5.7
===> Fetching gen_batch_server v0.8.6
===> Analyzing applications...
===> Compiling gen_batch_server
===> Compiling _build/default/lib/gen_batch_server/src/gen_batch_server.erl failed
_build/default/lib/gen_batch_server/src/gen_batch_server.erl:none: internal error in pass parse_module:
exception error: bad argument
in operator --/2
called as {'EXIT',#Port<0.8>,normal} -- []
in call from compile:metadata_add_features/2 (compile.erl, line 1068)
in call from compile:do_parse_module/2 (compile.erl, line 1037)
in call from compile:parse_module/2 (compile.erl, line 994)
in call from compile:fold_comp/4 (compile.erl, line 405)
in call from compile:internal_comp/5 (compile.erl, line 389)
in call from compile:'-internal_fun/2-anonymous-0-'/2 (compile.erl, line 227)
in call from rebar_compiler_erl:compile_and_track/4 (/home/runner/work/rebar3/rebar3/src/rebar_compiler_erl.erl, line 157)
The error occurs in the dependency gen_batch_server.erl, which utilises -callback
declarations - I assume that is the common ground where the issue is triggered.
Thanks. Using this I can reproduce the behaviour and also verify that the fix I have works. The common ground, however, is not the use of -callback
, but rather using rebar3
. We have some 768 instances of -callback
in the OTP code base and those compile without problems with RC3. Just to be clear, there is no problem with rebar3
– the fault was fully in the new code implementing the features support.
Hi,
may I ask which configuration you are using?
I also tried my rampeer repo with rc3 and it is working over here.
My config is:
- Ubuntu 20.04LTS
- RC3 via asdf
- rebar3 directly installed and via asdf, both working.
The problem has more to do with timing than configuration. The code in RC3 does a receive
on an untagged message, which is fine as long as no other process manages to send message before it is expected. While I have not debugged the issue in depth, with the repo above I managed to reproduce it faithfully on my machine with an unpatched RC3 and see no problems with fix in there.
I might take some time later today and really dive into the issue for the sake of curiosity.
Great news that you have been able to verify your fix.
Since rc2, I’m getting segfaults when trying to build on Aarch64 (in qemu) under Alpine 3.15.
For a sample see: Images · travelping/docker-erlang-otp@a57e64e · GitHub
Thanks, it’s a bit difficult to see what’s happening though. Can you try the following in an interactive session?
# Build the system without JIT to dodge the error
$ ./otp_build setup -a --disable-jit
$ make TYPE=debug
# Build the JIT alone
$ ./configure --enable-jit
$ make emulator TYPE=debug
# Start Erlang under GDB, then run `r` and hope it crashes. You
# might need to compile something to provoke the crash.
#
# Once it crashes, run `bt`, `x/8i $pc-16`, and post the result here.
$ cerl -debug -rgdb
The crash only occurs when using qemu user mode emulation (by just running ./bin/erl -emu_flavor jit
). In that mode debugging like you suggested does not work at all because ptrace is not supported. I tried using qemu gdbstub, but that does also just segfaults.
In qemu system mode the jit version that was compile within that VM works. When I try to run the binaries produced in the VM in qemu user mode, it goes to 100% CPU load and just does nothing (no CLI prompt, no other output).
The binaries compiled with the user mode emulation work fine when moving them into the VM image and running them under system emulation.
The root images for both version are the same. So the tooling for both builds was identical.
The host CPU is a Intel Core 7, so nothing spectacular.
QEMU is 6.2 (Ubuntu Jammy host) and Linux 5.17.5-051705-generic.
Reproducing this should be simple. Enable QEMU user mode emulation for aarch64, grab the Alpine 3.15.4 AArch64 root tar, extranct, chroot to it and build otp in it.
I’m going to try this with a Debian AArch64 root to see if there is difference in muslc vs. glibc based systems.
I think I see why it breaks under user-mode but not system emulation. Can you check if it works with this branch?
That branch successfully compiles and I get a running Erlang shell from bin/erl
.
Would you be inclined to accept a PR that adds a Aarch64 build based on QEMU user mode to the OTP github build actions?
Thanks, I’ve found a bug in QEMU that explains this. The gist of it is:
Instead of interpreting guest code, QEMU dynamically translates it to the host architecture. When the guest overwrites code for one reason or another, the translation is invalidated and redone if needed.
Our JIT:ed code is mapped in two regions to work in the face of W^X restrictions: one executable but not writable, and one writable but not executable. Both of these regions point to the same physical memory and writes to the writable region are “magically” reflected in the executable one.
I would’ve expected QEMU to honor the IC IVAU
/ ISB
instructions we use to tell the processor that we’ve altered code at a particular address, but for some reason QEMU just ignores them and relies entirely on trapping writes to previously translated code.
In system mode QEMU emulates the MMU and sees that these two regions point at the same memory, and has no problem invalidating the executable region after writing to the writable region.
In user mode it instead calls mprotect(..., PROT_READ)
on all code regions it has translated, and invalidates translations in the signal handler. The problem is that we never write to the executable region – just the writable one – so the code doesn’t get invalidated.
Sure, if it runs quickly enough I’m open to adding it once they fix this bug.
I want to try maybe_expr
feature introduced in OTP 25.0. I installed 25.0-rc3 locally using kerl
and when I run r3 compile
or r3 shell
I get no errors at all. However, when I try to run (in rebar3 shell, OTP 25, ERTS v13.0) exported functions from module using maybe
expression I get the following error:
Loading of /home/marko/<project_root>/_build/default/lib/<project_name>/ebin/pfun.beam failed: not_allowed
** exception error: undefined function pfun:<fun_i'm_trying_to_run>
.
In pfun
module I’ve got the following attributes:
-feature(maybe_expr, enable).
-compile({feature, maybe_expr, enable}).
Also, erl_opts
in rebar.config
contains {feature, maybe_expr, enable}
.
Any idea of what is wrong here?
i ran into the same issue and worked around it by:
-
-enable-feature maybe_expr
inconfig/vm.args
and env ERL_FLAGS="-args_file config/vm.args" rebar3 shell
you can also issue erl_features:enable_feature(maybe_expr).
in rebar3 shell
before loading the pfun
module.
The option -enable-feature maybe_expr
has to be given to the runtime to allowing loading modules that have been compiled with the feature enabled. This is to make an active decision in allowing experimental features.
While erl_features:enable_feature/1
is available in RC3, it will not be available in the final release of OTP25. Enabling features in the runtime is only allowed during startup and changing (enabling/disabling) is not allowed thereafter.
Full documentation for the features support will be included with OTP25.
Note that you only need one of these and using the -feature
directive is the preferred use. To be honest, while a -compile
attribute is allowed anywhere in a file, the enabling and disabling of features is only allowed in a prefix of the file, so using -compile
to enable/disable a feature anywhere might have unexpected results, i.e., I have to test and fix this.