We compared Nginx and Erlang in roughly the same role as edge servers: delivering HLS video with caching from a DVR server.
The load was approximately 2 Gbps, host with 24 cpu cores.
When comparing the results of perf stat, a significant difference in the number of page faults becomes evident: 10/s for Nginx versus 4K/s for Erlang.
This is problematic. The flamegraph shows that a lot of time is spent inside readv and exc_page_fault.
Running strace -f -e trace=memory revealed an unexpectedly high number of mmap/munmap calls, which seems strange — why free memory only to allocate it again immediately?
The issue can be partially mitigated by enabling the super carrier, but as traffic increases, it may become insufficient, causing page faults to rise again.
Gradually, by adapting the server to different loads. Interestingly, the latest change was aimed at making memory allocation more efficient. Earlier settings were different: +MMmcs 30 +MBas aoffcaobf +MBsbct 8192 +MBsmbcs 64000 +MBlmbcs 128000 +MBmbcgs 3 +MBacul 10 +Mdai 16
I probably need to check this.
However, the result without any tuning was seemingly the same.
I see. You say you noticed a high number of mmap / munmap calls. Have you ruled out nifs and/or sys allocations? Related, have you looked at +Musac true ?