Hi there
I have an application to broadcast messages to many remote clients, when doing the performance testing that delivering large messages, I encountered an OOM issue.
When the packet size is set to 64KB, the memory grows steadily and eventually triggers OOM. By continuously monitoring memory usage in both the Erlang shell with erlang:memory() and the Linux shell using grep RSS /proc//status, we observe that the total memory reported by erlang:memory() stabilizes at around 170GB, while the memory usage in Linux continues to grow.
If manually GC all processes periodically while the total memory is greater than 50GB, OOM won’t be triggered.
However, if the total memory is already at a high level(like 300GB), GC all processes won’t make memory usage decreases.
My question is where is the differential memory coming from? Could it be a bug?
PS: It’s easy to reproduce in my test environment, but unfortunately I’m failed to reproduce with simplified test code and steps.
It sounds like the memory in your system is getting fragmented. There is a section about memory fragmentation issues and what you can do in Erlang in Anger that I would recommend reading.
I tried recon_alloc:fragmentation(current), looks like memory utilization is good.
And recon_alloc:memory(used)/recon_alloc:memory(allocated) is about 88%.
This is the memory info of one trial.
erlang reported:
The large binary data may be coming from an external service that out of my control, and the xmlel tuple/record need to be converted to binary by fxml library before sending to the network.
I tried to split the converted binary to several 4K/256B size binaries’ list, it does not improve the memory usage.
I find the comment wrote many years ago in recon_alloc, three sources of memory not counted as ‘allocated’
%% The memory reported by `allocated' should roughly
%% match what the OS reports. If this amount is different by a large margin,
%% it may be the sign that someone is allocating memory in C directly, outside
%% of Erlang's own allocator -- a big warning sign. There are currently
%% three sources of memory alloction that are not counted towards this value:
%% The cached segments in the mseg allocator, any memory allocated as a
%% super carrier, and small pieces of memory allocated during startup
%% before the memory allocators are initialized.
cached segments in the mseg allocator: captured by erlang:system_info({allocator, mseg_alloc}),