Pathological scheduler behavior on specific AWS instance types

I don’t have appropriate hardware here to try it but roughly this is what I would do:

  • figure out actual cpu NUMA topology AWS docs may help
  • for whatever linux you have ensure NICs & OS are on separate chiplet to ones used for BEAM, to reduce contention
  • compare NUMA topology to that reported by erlang:system_info(cpu_topology).
  • look up +sct and +sbt
  • try with & without +scl false which may reduce scheduler migration & compaction (at cost of wasted cpu)

By binding BEAM to free CPUs, and ensuring logical processers matches physical layout, and reducing scheduler migration you should get a lot better performance out of this hardware.

Very interested to hear about experiments in this area, I just don’t have the load to experiment further in this space.

2 Likes