Why does it take hundreds of milliseconds for a process to get CPU time in R15 without SMP?

matthiasl · October 7, 2022, 3:56pm

This is about an ancient version of Erlang, R15 and earlier. I’m hoping someone recognises this and can tell me what’s going on.

I use Erlang to rotate logfiles, on an embedded single-core system. So no SMP. When rotating the logfiles, I notice that other processes sometimes get starved for CPU an unreasonably long time. I can reproduce the problem on a bog-standard x86 linux box, but I can’t explain what’s going on.

Complete code to reproduce:

gist.github.com

https://gist.github.com/matthiasl/0e648e445ed860dc3001aac69d0e5453

schedule.erl

%% Throwaway to investigate how long gzipping a logfile freezes out
%% other processes.
%%
%% Typical run
%% -----------
%%
%%   /usr/local/src/otp_src_R14B03/bin/erl -smp disable -s schedule go -s init stop
%% This leaves a logfile, 'erlang_timestamps.txt' which shows that the
%% 'sleeper()' process doesn't get any CPU time for about 700 milliseconds.
%% Ideally, sleeper() gets CPU every 10 ms or so.

This file has been truncated. show original

In a nutshell, the code runs two processes. One compresses a logfile, using Erlang’s internal gzio, 500k at a time. The other just sleeps 10 ms at a time in a receive. The sleeping process typically gets starved for 700 ms while the compression runs. That’s unexpected.

Here’s what a run looks like in the logfile, the first column is a timestamp in microseconds:

3317458395 sleeping. relative 22
3317469395 sleeping. relative 33
3317480395 sleeping. relative 44
3317528571 compressing. relative 92 ms
3317568268 compressing. relative 131 ms
3317608473 compressing. relative 172 ms
3317651503 compressing. relative 215 ms
3317697024 compressing. relative 260 ms
3317738523 compressing. relative 302 ms
3317778853 compressing. relative 342 ms
3317819305 compressing. relative 382 ms
3317859849 compressing. relative 423 ms
3317900184 compressing. relative 463 ms
3317940621 compressing. relative 504 ms
3317981149 compressing. relative 544 ms
3318021277 compressing. relative 584 ms
3318061482 compressing. relative 625 ms
3318101664 compressing. relative 665 ms
3318142969 compressing. relative 706 ms
3318143101 sleeping. relative 706

So, the ‘compression’ loop takes about 40 ms per iteration, and the sleeping loop normally takes 10 or 11 ms, and then it gets starved for 700 ms.

I instrumented the VM to try and figure out what was going on, but I don’t properly understand how scheduling is meant to work—big thanks though to Happi for the BEAM book, which was a good way to get started. I can see that beam_emu:schedule() is called after about 300 ms of starvation, but it still doesn’t decide to schedule the starving process. So “not enough reductions being counted for gzio” doesn’t seem to explain it all.

Clues?

matthiasl · October 25, 2022, 6:53pm

Short answer to my own question

Roughly 320 ms of starvation comes from the VM not counting a reasonable amount of reductions for GZIP io.

The 320 ms gets doubled because a process with timer expiry gets added to the end of the run queue, so the GZIP process gets a second 320 ms before it gets scheduled out.

The good thing is that non-ancient versions of Erlang do a much better job of scheduling, especially SMP versions.

Long answer

The core of my problem (full code in the parent post above) is a loop which gzip compresses a logfile:

step(In, Out) →
{ok, Bin} = file:read(In, 500000),
ok = file:write(Out, Bin). %% the write is with gzip

running step/2 in a loop starves other processes for a long time. I instrumented a VM with calls to erts_fprintf(). Here’s what the trace looks like:

time (µs) description

6920367 sleeping. relative 43
6926367 scheduler old: <0.36.0> new: <0.2.0>
6968199 compressing. relative 91 ms
7009542 compressing. relative 133 ms
7050640 compressing. relative 174 ms
7094414 compressing. relative 218 ms
7134473 compressing. relative 258 ms
7174895 compressing. relative 298 ms
7216469 compressing. relative 340 ms
7216474 call_bif2_e FCALLS=16
7216565 out of reductions, goto context switch
7216573 after bump_timer runqueue[2] <0.2.0> <0.36.0>
7216576 scheduler old: <0.2.0> new: <0.2.0>
7259592 compressing. relative 383 ms
7300711 compressing. relative 424 ms
7342085 compressing. relative 465 ms
7382944 compressing. relative 506 ms
7423787 compressing. relative 547 ms
7471267 compressing. relative 594 ms
7512587 compressing. relative 636 ms
7552426 compressing. relative 676 ms
7594584 compressing. relative 718 ms
7594590 call_bif2_e FCALLS=7
7594690 out of reductions, goto context switch
7594790 scheduler old: <0.2.0> new: <0.36.0>
7594790 sleeping. relative 718

An Erlang process starts off with 2000 reductions. It turns out that step/2 eats roughly 300 reductions per iteration, so the (non-SMP) scheduler isn’t going to stop it until seven iterations have passed. You can see those seven iterations above as the ‘compressing’ lines.

At t=7216565 we run out reductions and run erl_process.c:schedule(). One of the things schedule() does is take a look at processes with expired timers, in bump_timer(). Those processes are moved into the runqueue—i.e. the queue of processes due to be run, but they’re put at the end.

At t=7216576, the scheduler decides to run the process at the head of the runqueue. It’s <0.2.0>, the compression process. We get another nine iterations until we run out of reductions again at t=7594590.

At t=7594790 the scheduler finally runs the starving process.

Things I learned

erts_fprintf() is a useful call for poking around the VM. It prints
Erlang terms for you, e.g.

erts_fprintf(my_log_file, “%u pid=%T”, timestamp, process->id);
At least in R14B03, the scheduling in Erlang is sort-of implicit. The C code bounces around in OpCase() macros until reductions are exhausted. And only then is schedule() called.
If you’re running Erlang in non-SMP mode, you’re probably going to see some behaviour not many other people see

I didn’t try hacking the VM to alter the timer expiry behaviour. I have a workaround for my actual problem, and I’m a bit wary of screwing around with the scheduler, for fear of creating more problems than I solve.

dgud · October 26, 2022, 5:48am

True, but also using R14B03 is going show you some behaviours not many other people see (any more)

matthiasl · October 26, 2022, 9:15am

The cool thing is that it’s routine for us to see our embedded systems that haven’t been touched for years with 5+ year uptimes. So R14B might have been running for 10 years, with one restart, and that restart was due to a power failure.

lpil · October 26, 2022, 4:19pm

Wow great post! Thanks for sharing what you learnt. Really cool