Linux RT Kernel scheduling and the BEAM/OTP - anyone tried it?

ocschwar · December 10, 2024, 3:41pm

Hi, all.

I am helping maintain an elaborate software stack for a self-contained application. It’s comprised of a set of daemons that use a messsaging broker to talk to each other, to the user, and to the resources they control.

The daemons are written in house, by my company, in C/C++. The broker is a commercially available product written in Erlang.

The daemons could benefit from being assigned to the Linux FIFO scheduler. But doing that without assigning the broker to the FIFO scheduler and giving it a higher priority just blows things up.

The broker is run from Docker, and giving it RT priority for testing purposes is something I’m working on, but in the meantime, checking online shows that giving a BEAM/OTP application the RT scheduler is something that is Just Not Done™, or at least only done by peopel with way more understanding of both BEAM and Linux than I have.

Have any of you tried this?

ocschwar · December 19, 2024, 3:22pm

Following up on my own work:

I gave the MQTT broker an assignment to SCHED_FIFO, priority = 2.
I gave our own code base SCHED_FIFO, priority=1.

The world did not come to an end. It’s just real important to make sure any OTP process assigned to FIFO uses the right system call to assign all child threads to FIFO as well.

Performance was a little better with this. Nothing world changing, but nice to have.

vkatsuba · December 19, 2024, 11:53pm

Running a BEAM/OTP application with SCHED_FIFO is indeed a rare and complex setup. The BEAM VM isn’t natively designed for real-time scheduling because it prioritizes fairness and cooperative multitasking over strict timing guarantees. However, your approach of assigning the broker SCHED_FIFO with a higher priority than your daemons is valid as long as you manage it carefully.

It’s good to hear your initial experiments didn’t cause major issues. The key here is understanding that the BEAM VM relies heavily on its internal schedulers, and giving it RT priority might interfere with its ability to function properly under high load or in the presence of long-running NIFs or ports.

A few things to keep in mind as you proceed:
• The BEAM schedulers create multiple threads (one per core by default), so it’s critical to ensure all these threads are assigned the correct priority and policy using system calls like pthread_setschedparam.
• Resource contention can become a problem if the broker is given too much priority over the daemons or other system processes. Monitor CPU usage closely to avoid starvation.
• BEAM’s cooperative scheduler may not benefit significantly from SCHED_FIFO, as it depends on yielding tasks voluntarily, which might conflict with strict FIFO behavior.

Your testing results align with what might be expected - a modest improvement in latency, but nothing dramatic. Real-time priorities are more impactful in environments with stringent latency requirements or high contention for CPU resources.

To directly answer your question: While giving BEAM/OTP SCHED_FIFO isn’t a common practice, it can be done if you thoroughly understand the trade-offs and ensure proper configuration of both the BEAM VM and the underlying system. Testing and careful observation, like you’ve already done, are essential to prevent unintended side effects.

Let us know if you observe any specific issues or find further insights - this kind of experimentation is rare and valuable!

ocschwar · December 20, 2024, 4:03am

I did take care to assign *all * of the BEAM’s threads to SCHED_FIFO, and without RESET_FORK so that all their child threads would also receive higher priority. All I had to do was make sure I had the chrt command set with the -a flag and without -R.

The tasks yielding voluntarily explains why giving RT priority to the daemons was such a failure. It resulted in the daemons getting all the CPU time they liked, but without all the messages the broker was trying to deliver. I think giving the broker the higher priority dd the trick. But if the kernel interferes with the BEAM’s cooperative scheduler it might still cause problems as I test it further.