AtomVM: 2 Questions regarding Erlang distribution functionality and future version - both for esp32

  1. can anybody tell which and/or how much of erlang distribution functionality is implemented in atomvm for esp32? i would like to use gen_server:call({Name, Node}, Request) where Node =/= localhost on both raspberry (linux) and esp32 (atomvm)
  2. i can see 3 branches [main, release-0.6, feature/bigint] where main is the most up to date one and release-0.6 seems to be outdated soon. can anybody tell me how main and feature/bigint are related? i feel that either main or feature/bigint will lead to the future version. which one is the most complete version? which one should i go for?
1 Like

Thank you for your interest in our project! Indeed you can call a gen_server on a remote node. You can also do rpc calls from an OTP node, it doesn’t have to be another AtomVM node. This is quite handy for testing or debugging an AtomVM device from an erl cli.

The bigjnt branch is a work in progress to support bigger integers, it is unstable and should not be used except for testing bigint functionality. It will be merged to main once it is ready.

The main branch is where work for the future 0.7 release cycle. The release-0.6 branch is where bug fixes for 0.6.x are merged for the next update (I.e. the upcoming 0.6.7 release).

Main does have a lot more nice features, but is not considered stable for production use. There are (or may be) breaking changes from the 0.6 release cycle.

I hope this clears thing up, and don’t hesitate to post any further updates, suggestions or feedback.

— Winford

4 Likes

many thanks winford,

your answer has cleared up some ambiguities.

i decided to go for the main branch because i am interested in distributed applications (besides integrating sensors and actors).

unfortunately i was not able to run the test program (https://github.com/atomvm/AtomVM/blob/main/examples/erlang/esp32/epmd_disterl.erl#L51) on an esp32-s3 (sender on linux) successfully.

both on self-built main and on self-built feature/bigint it terminated with

CRASH 
======
pid: <0.13.0>

Stacktrace:
[{socket_dist_controller,process_recv_buffer,3,[{file,"/home/xxxx/avm/$branch/AtomVM/libs/estdlib/src/socket_dist_controller.erl"},{line,192}]},{socket_dist_controller,recv_data_loop,1,[{file,"/home/xxxx/avm/feature-bigint/AtomVM/libs/estdlib/src/socket_dist_controller.erl"},{line,175}]},{socket_dist_controller,handle_cast,2,[{file,"/home/xxxx/avm/$branch/AtomVM/libs/estdlib/src/socket_dist_controller.erl"},{line,151}]},{gen_server,loop,3,[{file,"/home/xxxx/avm/$branch/AtomVM/libs/estdlib/src/gen_server.erl"},{line,539}]}]

cp: #CP<module: 19, label: 55, offset: 40>

x[0]: error
x[1]: badarg
x[2]: {4,4,150,2,[{6,3226},{19,929},{19,1514},{19,1659}],error}

Stack 
-----

{<<112,131,104,4,97,6,88,119,13,116,64,115,112,120,46,115,112,54,101,46,100,101,0,0,0,95,0,0,0,0,104,129,253,168,119,0,119,7,100,105,115,116,101,114,108,131,119,4,113,117,105,116>>,<<"">>}
0
<<"">>
#CP<module: 19, label: 55, offset: 40>
[]
{state,{<<"">>,#Ref<0.0.28>},<<"">>,undefined,<<"">>,0,0}
#CP<module: 19, label: 39, offset: 69>
#CP<module: 6, label: 112, offset: 24>
[]
[]
[]
{state,{<<"">>,#Ref<0.0.28>},undefined,undefined,<<"">>,0,0}
[]
{state,undefined,socket_dist_controller,{state,{<<"">>,#Ref<0.0.28>},undefined,undefined,<<"">>,0,0}}
<0.12.0>
#CP<module: 6, label: 140, offset: 0>


Mailbox
-------


Monitors
--------
link to <0.14.0>
monitored by resource 0x3fcc65dc ref=44


**End Of Crash Report**

after i set a message and epmd is not included in release-0.6.

of course I had

Creds = [

{ssid, "****"},

{psk, "********"}

],

adjusted according to my network.

what could I be doing wrong?

addendum1: i use Erlang/OTP 27 [erts-15.2.2]

addendum2: i found https://github.com/atomvm/AtomVM/issues/1671 which describes my issue and suggests that the bug was fixed. did i clone a wrong branch? (i cloned using git clone -b main https://github.com/atomvm/AtomVM.git)

Sorry for the delay getting back to you. My day job has been keeping me extra busy lately.

Just to make sure I understand correctly, the error you posted happened after a message was sent from an OTP node?

Can you post the message you sent? There does seem to be a bug here, if the message contained a badarg you should get that back on the sender node, but knowing what message was sent might help to track down the bug.

The timing is a little unfortunate, @pguyot (contributor disterl support in AtomVM) is starting a well earned 2 week vacation, and will be unable to push any fixes. In the meantime we will see if we can get to the bottom of this.

I would suggest opening an issue on GitHub, if you can include the crash log, along with the exact steps you used to start and connect from the OTP node, along with the message that was sent it should help get to the bottom of this a little quicker.

Since you are using distribution, you should definitely stick to the main branch for now. There is no support in the 0.6-release branch (or any 0.6.x release), and the bigint branch is still a work in progress.

We use feature branches when authors are contributing a large set of changes with many commits. You can think of them a large multipart pull requests. This allows for easier review, and lets the authors submit PRs for smaller sets of changes as they are ready. Once all of the necessary changes are merged to the feature branch, and we do some testing of the final work they get merged into the appropriate branch (almost always main).

For esp32 make sure you built first in AtomVM/build, and then in platforms/esp32 and flash, especially when you change branches.

Tip: You can use chrome and flash latest from main (or release-0.6) here AtomVM Flasher just as a bug/sanity check..
(the binaries are scraped from CI eg. at the bottom here Merge pull request #1772 from pguyot/w29/optimize-term-compare · atomvm/AtomVM@8effe63 · GitHub)

sorry, i did not want to push you.

yes, you are right, the error happened after i had sent the message from another otp-27 node. i exactly followed the instruction from the doc and the output of the program i had started on the esp and entered {disterl, 'atomvm@10.28.57.99'} ! quit. at the erlang prompt after i had started the node (erl -name xxx@domain.tld) and had set the cookie using erlang:set_cookie('atomvm@10.28.57.99', 'AtomVM').

to be sure, i just repeated this procedure and saw the same result as before.

as i’m not in a hurry, i think, it might be best to wait until @pguyot will have finished his vacation.

thanks for your assistance, g4v

@outlog: i had built in AtomVM/build, and then in platforms/esp32 and had flashed AtomVm/libs/esp32boot/esp32boot.avm using build/flash.sh -l -p /dev/ttyACM2 ../../../build/libs/esp32boot/esp32boot.avm in the diectory platforms/esp and executed idf.py flash -p /dev/ttyACM2 in the same directory.

i even did this after i had erased flash (using esptool.py --chip esp32s3 --baud 921600 -p /dev/ttyACM2 erase_flash).

1 Like

This may be part of your problem, if you review the Build Instructions for ESP32 you will see that after building the generic unix port in AtomVM/build, and using idf.pt build in the src/platforms/esp32 directory you should assemble a complete image using./build/mkimage.sh (still in the esp32 directory) and then use ./build/flashimage.sh, this will give you a complete deployment including the VM and core beam libraries.

Using the flash.sh script to flash the libraries is almost never needed, this is mainly for development work on the core beam libraries when the changes are pure Erlang (or Elixir) and do not involve any changes to nif or port functions. Also note that if you are using flash.sh to flash libraries and need to specify a port (I.E. the esp32 isn’t using /dev/ttyUSB0) you should place that before or after the -l /path/to/esp32boot.avm, you put the -p /dev/port in the middle. The library path needs to immediately follow the -l (library) directive. It should look like:

./build/flash.sh -p /dev/ttyACM0 -l ../../../build/libs/esp32boot/esp32boot.avm

But in your case, you should just be using the mkimage.sh and flashimage.sh scripts.

I do have a PR open that will simplify flashing a complete deployment using idf.py flash, so that using mkimage.sh and flashimage.sh are optional (if you want to save the image to flash to other devices with the same chipset).

For the time being you should keep using mkimage.sh and flashimage.sh to assemble and flash a complete image that includes the AtomVM standard libraries.

Thank you for the report. There is an empty atom in the message that OTP sends in your dump (confirmed with OTP28).

{<<112,131,104,4,97,6,88,119,13,116,64,115,112,120,46,115,112,54,101,46,100,101,0,0,0,95,0,0,0,0,104,129,253,168,119,0,119,7,100,105,115,116,101,114,108,131,119,4,113,117,105,116>>,<<"">>}

112 is for distribution, then there is an external term that decodes to:

1> binary_to_term(<<131,104,4,97,6,88,119,13,116,64,115,112,120,46,115,112,54,101,46,100,101,0,0,0,95,0,0,0,0,104,129,253,168,119,0,119,7,100,105,115,116,101,114,108,131,119,4,113,117,105,116>>, [used]).
{{6,<10736.95.0>,'',disterl},44}

First tuple element, 6, is the command (OPERATION_REG_SEND)
Second element is the sender pid. OTP doesn’t show it, but the pid is from your node, called “t@spx.sp6e.de”.
Third item, the empty atom here (<<119,0>>), is ignored for this command (OPERATION_REG_SEND), but nevertheless we should be able to decode it.
Fourth item is the name of the process, here disterl.

I’m not sure if the fact that OTP sends empty atoms for this parameter is new or if we introduced a regression in AtomVM and we no longer are able to decode empty atoms on platforms where malloc(0) is NULL such as ESP-IDF.

Anyway, this should be fixed with this PR.

3 Likes

@pguyot: your fix resolved my issu (after self-patching src/libAtomVM/atom_table.c). thank you very much.

@pguyot: gen_server:call({Modul, Node}, Term) seems to work for many terms.

sad to say that gen_server:call({Modul, Node}, fun() -> .. end) crashes AtomVm


Core  1 register dump:
PC      : 0x4205dc91  PS      : 0x00060a30  A0      : 0x8205d4fc  A1      : 0x3fcb08a0
0x4205dc91: memory_estimate_usage at /home/xxxx/ATOM/dist/AtomVM/src/libAtomVM/memory.c:479 (discriminator 1)

A2      : 0x00000002  A3      : 0x00000394  A4      : 0x3fcb0890  A5      : 0x00000001
A6      : 0x00000036  A7      : 0x00000000  A8      : 0xfffffffe  A9      : 0x00000000
A10     : 0x00000000  A11     : 0x00000000  A12     : 0x3fcb0850  A13     : 0x00000000
A14     : 0x0000002a  A15     : 0x3fcb08a0  SAR     : 0x00000008  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0x00000000


Backtrace: 0x4205dc8e:0x3fcb08a0 0x4205d4f9:0x3fcb08f0 0x4205d571:0x3fcb0910 0x4205c2c2:0x3fcb0930 0x42058b96:0x3fcb0950 0x42029e9c:0x3fcb09a0 0x42011a07:0x3fcb0a50 0x4200969c:0x3fcb0a70 0x403804f5:0x3fcb0a90
0x4205dc8e: term_get_list_ptr at /home/xxxx/ATOM/dist/AtomVM/src/libAtomVM/term.h:1770
 (inlined by) term_get_list_tail at /home/xxxx/ATOM/dist/AtomVM/src/libAtomVM/term.h:1806
 (inlined by) memory_estimate_usage at /home/xxxx/ATOM/dist/AtomVM/src/libAtomVM/memory.c:479

0x4205d4f9: mailbox_message_create_from_term at /home/xxxx/ATOM/dist/AtomVM/src/libAtomVM/mailbox.c:237

0x4205d571: mailbox_send at /home/xxxx/ATOM/dist/AtomVM/src/libAtomVM/mailbox.c:269

0x4205c2c2: globalcontext_send_message at /home/xxxx/ATOM/dist/AtomVM/src/libAtomVM/globalcontext.c:357

0x42058b96: nif_erlang_dist_ctrl_put_data at /home/xxxx/ATOM/dist/AtomVM/src/libAtomVM/dist_nifs.c:508 (discriminator 1)

0x42029e9c: scheduler_entry_point at /home/xxxx/ATOM/dist/AtomVM/src/libAtomVM/opcodesswitch.h:1962

0x42011a07: scheduler_thread_entry_point at /home/xxxx/ATOM/dist/AtomVM/src/platforms/esp32/components/avm_sys/smp.c:68

0x4200969c: pthread_task_func at /home/xxxx/esp/esp-idf/components/pthread/pthread.c:241

0x403804f5: vPortTaskWrapper at /home/xxxx/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:139

This is this known bug:

Easier to fix I would say.

@winford thanks for clarification

@g4v thank you again for testing and bringing this to our attention!

For anyone following this topic the fix has been merged, along with a test to make sure this doesn’t come up again.