On a 28.1.1 server (or a couple) we noticed several SSL connections stuck in a closing state. The caller called ssl:close and is stuck in a gen:call. The ssl_gen_statem process called a gen_tcp:recv/3 with a timeout but it never received a response message from the port. The port is still alive. When calling gen_tcp:recv/3 for the same socket from the shell it returns {error,enotconn}.
Backtrace of the calling process
> bt(<0.31332.0>).
Program counter: 0x0000f45fd675b210 (gen:do_call/4 + 536)
y(0) []
y(1) []
y(2) []
y(3) #Ref<0.3750388521.1181220866.236756>
0x0000f45f755d5850 Return addr 0x0000f45fd6908ca8 (gen_statem:call/3 + 656)
y(0) {close,5000}
y(1) <0.31328.0>
y(2) Catch 0x0000f45fd6908cf4 (gen_statem:call/3 + 732)
0x0000f45f755d5870 Return addr 0x0000f45fd76bcff0 (ssl_gen_statem:call/2 + 104)
y(0) Catch 0x0000f45fd76bd010 (ssl_gen_statem:call/2 + 136)
0x0000f45f755d5880 Return addr 0x0000f45fd76b4228 (ssl_gen_statem:close/2 + 72)
Stacktrace of the ssl_gen_statem process
> recon:info(<0.31328.0>).
[{meta,[{registered_name,[]},
{dictionary,[{'$initial_call',{ssl_gen_statem,init,1}},
{'$ancestors',[<0.31326.0>,tls_connection_sup,tls_sup,
ssl_connection_sup,ssl_sup,<0.91.0>]},
{tls_role,server},
{'$process_label',{tls,server,
...
{status,waiting}]},
{signals,[{links,[<0.31326.0>,#Port<0.255>]},
{monitors,[{process,<0.31332.0>}]},
{monitored_by,[<0.31332.0>]},
{trap_exit,true}]},
{location,[{initial_call,{proc_lib,init_p,5}},
{current_stacktrace,[{prim_inet,recv0,3,[]},
{tls_gen_connection,close,4,
[{file,"tls_gen_connection.erl"},{line,572}]},
{ssl_gen_statem,handle_call,4,
[{file,"ssl_gen_statem.erl"},{line,764}]},
{gen_statem,loop_state_callback,11,
[{file,"gen_statem.erl"},{line,3748}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,333}]}]}]},
{memory_used,[{memory,34664},
{message_queue_len,0},
Backtrace of the same ssl_gen_statem process
> bt(<0.31328.0>).
Program counter: 0x0000f45fd660ad44 (prim_inet:recv0/3 + 196)
y(0) 0
y(1) #Port<0.255>
0x0000f45f729616f8 Return addr 0x0000f45fd76a5554 (tls_gen_connection:close/4 + 260)
0x0000f45f72961700 Return addr 0x0000f45fd76b814c (ssl_gen_statem:handle_call/4 + 3916)
y(0) {state,#Ref<0.3750388521.1165361153.253007>,
The port still exists
> recon:port_info(#Port<0.255>).
[{meta,[{id,2040},{name,"tcp_inet"},{os_pid,undefined}]},
{signals,[{connected,<0.31328.0>},
{links,[<0.31328.0>]},
{monitors,[]}]},
{io,[{input,0},{output,608216479}]},
{memory_used,[{memory,48},{queue_size,0}]},
{type,[{statistics,[{recv_oct,30602051},
{recv_cnt,658470},
{recv_max,314},
{recv_avg,46},
{recv_dvi,0},
{send_oct,587038959},
{send_cnt,756340},
{send_max,9943},
{send_avg,776},
{send_pend,0}]},
{options,[{active,false},
{buffer,128},
{delay_send,false},
{exit_on_close,true},
{header,0},
{high_watermark,8192},
{low_watermark,4096},
{mode,binary},
{packet,0},
{packet_size,0},
{send_timeout,30000}]}]}]
> gen_tcp:recv(#Port<0.255>, 0, 5000).
{error,enotconn}
Unfortunately beam.smp is stripped of debug symbols. Is this a known issue? Why the port did not respond? Is there any way to debug this without debug symbols? Is an erl_crash.dump any useful in this case?
(The node is still running in this state, it is running on Ubuntu 24.04 with kernel 6.14.0-1017-azure aarch64.)