Thanks @starbelly and @max-au!
I was thinking that tcpdump-ing the traffic would be my last resort, but that’s a great idea. Before that, while digging in the erlang docs, I found out about inet::i()
so after seeing that, I found a better way to get all the ports from the :net_kernel state, loop over all the open ports and trace that with erlang::trace
. So I’m sitting there running flush().
in the console.
I get some info I already knew though:
Shell got {trace_ts, #PID<0.1349.0>, receive, {tcp_closed, #Port<0.24>}, 1,
{1695, 869472, 819890}}
Shell got {trace_ts, #PID<0.1349.0>, getting_unlinked, #Port<0.24>, 1,
{1695, 869472, 819893}}
Shell got {trace_ts, #PID<0.1349.0>, exit, tcp_closed, 1, {1695, 869472, 819897}}
Shell got {trace_ts, #PID<0.1349.0>, out_exited, 0, 1, {1695, 869472, 819900}}
Just to test this out, if I connect just two nodes and close the console in one of them, I’m getting something different:
Shell got {trace_ts,#Port<0.9>,send,
{tcp_closed,#Port<0.9>},
<0.97.0>,4,
{1695,985570,733677}}
Shell got {trace_ts,#Port<0.9>,closed,normal,4,{1695,985570,733686}}
I did get this trace message though, which I found odd. I don’t know enough about Erlang to tell if this
could be the problem though:
Shell got {trace_ts, #PID<0.1343.0>, gc_major_start,
[
{wordsize, 6},
{old_heap_block_size, 0},
{heap_block_size, 610},
{mbuf_size, 0},
{recent_size, 205},
{stack_size, 21},
{old_heap_size, 0},
{heap_size, 583},
{bin_vheap_size, 53},
{bin_vheap_block_size, 46422},
{bin_old_vheap_size, 0},
{bin_old_vheap_block_size, 46422}
], 1, {1695, 869451, 524642}}
Shell got {trace_ts, #PID<0.1343.0>, gc_major_end,
[
{wordsize, 553},
{old_heap_block_size, 0},
{heap_block_size, 233},
{mbuf_size, 0},
{recent_size, 30},
{stack_size, 21},
{old_heap_size, 0},
{heap_size, 30},
{bin_vheap_size, 53},
{bin_vheap_block_size, 46422},
{bin_old_vheap_size, 0},
{bin_old_vheap_block_size, 46422}
], 1, {1695, 869451, 524649}}
Also, worth mentioning, the size of the cluster fluctuates widely, sometimes only 5 are connected, then a few seconds later i can see the number got up every time I run it, back to 29, then dropping again. I can’t tell if it coincides with the garbage collection yet.