Scheduler use - looking at msacc output from a AWS deployed, is this normal or do we need to do some tweaking?

nhpip · August 15, 2022, 7:30pm

Hi,

So I’ve been looking at msacc output from a AWS deployed system on a 16 core server, however these are 16 vCPUs as AWS call them (c5.4xlarge). The load seems somewhat uneven, is this normal or do we need to do some tweaking?

Average thread real-time    : 20000691 us
Accumulated system run-time : 48067862 us
Average scheduler run-time  :  2978253 us
        Thread      aux check_io emulator       gc    other     port    sleep
Stats per thread:
     async( 0)    0.00%    0.00%    0.00%    0.00%    0.01%    0.04%   99.95%
       aux( 1)    0.03%    0.00%    0.00%    0.00%    0.00%    0.00%   99.97%
dirty_cpu_( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 7)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_( 9)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_(11)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_(12)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_(13)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_(14)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_cpu_(15)    0.00%    0.00%    0.00%    1.34%    0.17%    0.00%   98.48%
dirty_cpu_(16)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 7)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s( 9)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_s(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
      poll( 0)    0.00%    0.48%    0.00%    0.00%    0.00%    0.00%   99.52%
 scheduler( 1)    0.57%    0.31%    8.67%    0.70%   22.82%    1.18%   65.75%
 scheduler( 2)    0.56%    0.31%    9.33%    0.73%   22.53%    1.22%   65.31%
 scheduler( 3)    0.52%    0.29%    8.13%    0.64%   21.57%    1.08%   67.77%
 scheduler( 4)    0.52%    0.28%    7.88%    0.63%   21.59%    1.06%   68.03%
 scheduler( 5)    0.66%    0.30%    8.72%    0.76%   22.30%    1.17%   66.09%
 scheduler( 6)    0.72%    0.27%    7.53%    0.71%   20.89%    1.03%   68.85%
 scheduler( 7)    0.53%    0.27%    7.49%    0.64%   20.88%    1.01%   69.17%
 scheduler( 8)    0.07%    0.02%    0.50%    0.05%    1.72%    0.05%   97.60%
 scheduler( 9)    0.03%    0.01%    0.21%    0.02%    0.70%    0.02%   99.02%
 scheduler(10)    0.03%    0.01%    0.22%    0.02%    1.00%    0.02%   98.70%
 scheduler(11)    0.13%    0.02%    0.71%    0.09%    2.76%    0.07%   96.21%
 scheduler(12)    0.00%    0.00%    0.00%    0.00%    0.11%    0.00%   99.88%
 scheduler(13)    0.00%    0.00%    0.00%    0.00%    0.08%    0.00%   99.92%
 scheduler(14)    0.00%    0.00%    0.00%    0.00%    0.28%    0.00%   99.72%
 scheduler(15)    0.00%    0.00%    0.00%    0.00%    0.27%    0.00%   99.73%
 scheduler(16)    0.00%    0.00%    0.00%    0.00%    0.01%    0.00%   99.99%
Stats per type:
         async    0.00%    0.00%    0.00%    0.00%    0.01%    0.04%   99.95%
           aux    0.03%    0.00%    0.00%    0.00%    0.00%    0.00%   99.97%
dirty_cpu_sche    0.00%    0.00%    0.00%    0.08%    0.01%    0.00%   99.91%
dirty_io_sched    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
          poll    0.00%    0.48%    0.00%    0.00%    0.00%    0.00%   99.52%
     scheduler    0.27%    0.13%    3.71%    0.31%    9.97%    0.49%   85.11%
:ok

Thanks

domi · August 15, 2022, 9:40pm

If your system isn’t very busy, that’s normal and helps efficiency. See this option:

+sub true|false

Enables or disables scheduler utilization balancing of load. By default scheduler utilization balancing is disabled and instead scheduler compaction of load is enabled, which strives for a load distribution that causes as many scheduler threads as possible to be fully loaded (that is, not run out of work). When scheduler utilization balancing is enabled, the system instead tries to balance scheduler utilization between schedulers. That is, strive for equal scheduler utilization on all schedulers.

https://www.erlang.org/doc/man/erl.html

max-au · August 16, 2022, 2:23am

Given how low the load is (<15%), and that MSACC shows “Other” as the primary contributor (everything else is idle), I can imagine that your workload runs 7 processes that execute NIFs (IIRC without extra MSACC states NIFs are accounted as “other”).
To me this distribution looks fair and uniform.

Try loading your system close to 100% scheduler utilisation, and see whether the picture changes.

nhpip · August 16, 2022, 1:25pm

That make’s sense. I have just changed a couple of NIF calls to dirty as they were bordering around 1ms. That goes into production tonight. It will be interesting to run a comparison this evening,

nhpip · August 18, 2022, 4:04am

Thanks for your answer. I am however now curious under what situation would you want compaction of load vs utilization balancing?

max-au · August 18, 2022, 4:24am

In theory, load compaction may be more power-efficient, as CPU can put several cores to sleep (and run remaining cores at higher frequency).

nhpip · August 18, 2022, 4:57am

Interesting. Doesn’t it take time to wake up a core though?

max-au · August 25, 2022, 8:08pm

It does. I don’t have measurements but I think it is quick enough for a human not to notice.

Glibnoxen · June 23, 2024, 8:47pm

Hey! The uneven load you’re noticing could be due to various factors like how your application utilizes resources or AWS’s virtualized environment.
In my experience with similar tech challenges, I’ve found it useful to document observations and potential tweaks in an online notepad. It helps to track changes and collaborate with others who might have insights. Checking AWS metrics for CPU utilization and considering instance size adjustments could also help balance the workload more evenly.