Issue with distributed erlang within kubernetes

I’m trying to connect to a remote cluster running on kubernetes and am facing a hard to debug issue. I’ve used kubectl to forward both 4369 and the remote’s node port that happens to be 38903. Running epmd -names I get:

epmd -names
epmd: up and running on port 4369 with data:
name ag at port 38903

Running erl directly I get:

EI_TRACELEVEL=6 erl -name observer -remsh ag@127.0.0.1
Could not connect to "ag@127.0.0.1"

Unfortunately EI_TRACELEVEL doesn’t seem to work here although it will on the next example. Using wireshark to sniff the packages I get the packages SEND_STATUS, SEND_STATUS_OK and SEND_CHALLENGE. So it seems that the local node is not replying with the SEND_CHALLENGE_REPLY (thus EI_TRACELEVEL would be super helpful!). If instead I run:

EI_TRACELEVEL=6 erl_call -n ag@127.0.0.1 -a "erlang length [[1,2,3]]" -h observer

it seems I’m able to connect (and notice that EI_TRACELEVEL works here)

ei_epmd_r4_port: Wed Oct 16 16:02:12 2024: -> PORT2_REQ alive=ag ip=127.0.0.1
ei_epmd_r4_port: Wed Oct 16 16:02:13 2024: <- PORT2_RESP result=0 (ok)
ei_epmd_r4_port: Wed Oct 16 16:02:13 2024:    port=38903 ntype=77 proto=0 dist-high=6 dist-low=5
ei_xconnect: Wed Oct 16 16:02:13 2024: -> CONNECT attempt to connect to ag
ei_xconnect: Wed Oct 16 16:02:13 2024: -> CONNECT connected to remote
recv_status: Wed Oct 16 16:02:13 2024: <- RECV_STATUS (sok)
recv_challenge: Wed Oct 16 16:02:13 2024: <- RECV_CHALLENGE (ok) node = ag@10.2.106.115, flags = 132087741, challenge = 1591521612
send_challenge_reply: Wed Oct 16 16:02:13 2024: -> SEND_CHALLENGE_REPLY (ok) challenge = 1738621192, digest = 0acc9c705f470162fa9bdf8067da8dd5
recv_challenge_ack: Wed Oct 16 16:02:13 2024: <- RECV_CHALLENGE_ACK (ok) digest = 216e7c9646b8dcc1bfbafffc37a36b9c
ei_xconnect: Wed Oct 16 16:02:13 2024: -> CONNECT (ok) remote = ag
-> REG_SEND From: #Pid<observer@ip-192-168-68-58.ec2.internal.0.0.3509> To: rex
   {#Pid<observer@ip-192-168-68-58.ec2.internal.0.0.3509>, {call, erlang, length, [[1, 2, 3]], user}}
<- SEND To: #Pid<observer@ip-192-168-68-58.ec2.internal.0.0.3509>
   {rex, 3}

I can see two new extra packages on wireshark: SEND_CHALLENGE_REPLY and SEND_CHALLENGE_ACK.

I’ve checked the diff between the two SEND_CHALLENGE message from both commands and they are exactly the same except for the challenge bit (which makes sense).

I’ve checked cookies and they are the same too (if not I don’t think the erl_call would work anyways).

I’ve read the C source from the recv_challenge function here but couldn’t figure out why it isn’t working.

Can someone help me here? Thanks!

So, it seems that my assumption that the recv_challenge implemented on the C function I’ve linked is incorrect. This is not called during the regular node connection, only on some other tools like erl_call, and that’s why the EI_TRACELEVEL doesn’t affect the output of erl.

It seems the related code is here.

But still no luck on making this work =/

While not an exact answer to your problem, maybe this may help:
https://github.com/danielpilon/kubernetes_observer/blob/main/kubernetes_observer

I’ve tried this script and it basically does what I’m currently doing. The only difference is that it creates a inetrc file with some mappings for when you’re using shortnames, which isn’t my case.

If it never worked I’ll try to debug the cluster and/or connections. But since it works with erl_call I’m lost. Currently I’m following the steps on the erlang code related to it to see if I can at least figure out what is going on, because unfortunately trying to connect to the node simply returns false.