I don’t have appropriate hardware here to try it but roughly this is what I would do:
- figure out actual cpu NUMA topology AWS docs may help
- for whatever linux you have ensure NICs & OS are on separate chiplet to ones used for BEAM, to reduce contention
- compare NUMA topology to that reported by
erlang:system_info(cpu_topology). - look up +sct and +sbt
- try with & without
+scl falsewhich may reduce scheduler migration & compaction (at cost of wasted cpu)
By binding BEAM to free CPUs, and ensuring logical processers matches physical layout, and reducing scheduler migration you should get a lot better performance out of this hardware.
Very interested to hear about experiments in this area, I just don’t have the load to experiment further in this space.