Handling system messages in generic behaviours

vances · January 10, 2025, 7:50am

I have a process which can take a long time to complete it’s work. I had expected that using handle_continue/2 to break it up into smaller tasks would allow the process to respond to system messages between each loop. however that is not the case.

-module(server).

-export([init/1, handle_cast/2, handle_call/3, handle_continue/2]).

-behaviour(gen_server).

init(_Args) ->
	{ok, #{}}.

handle_cast(N, State) ->
	erlang:display({?MODULE, ?FUNCTION_NAME, N}),
	{noreply, State, {continue, N}}.

handle_call(_, _From, State) ->
	{noreply, State}.

handle_continue(0, State) ->
	erlang:display({?MODULE, ?FUNCTION_NAME, 0}),
	{noreply, State};
handle_continue(N, State) ->
	{noreply, State, {continue, N - 1}}.

This will loop, calling handle_continue/2, N times. Choosing a large value of N keeps it busy enough that the sys:get_status/1 call times out.

1> {ok, S} = gen_server:start(server, [], []).
{ok,<0.86.0>}
2> gen_server:cast(S, 1000000000), sys:get_status(S).
{server,handle_cast,1000000000}
** exception exit: {timeout,{sys,get_status,[<0.86.0>]}}
     in function  sys:send_system_msg/2 (sys.erl, line 754)
3> {server,handle_continue,0}

I am assuming that sys:get_status/1 is representative of system messages, am I wrong?

Should we make it work?

Maria-12648430 · January 10, 2025, 9:01am

handle_continue is not a message-based event (the handle_ part is misleading I think, as it puts it close to the other handle_* callbacks which are message-based). When a callback returns {..., {continue, ...}}, handle_continue is immediately called afterwards, ie without receiving and handling any messages. The documentation says as much

vances · January 10, 2025, 9:39am

What the documentation says is:

The call is invoked immediately after the previous callback, which makes it useful for performing work after initialization or, for splitting the work in a callback into multiple steps, updating the process state along the way.

I think it SHOULD handle system messages before handle_continue/2.

Would this not be a sensible improvement?

I haven’t begun to look at how to implement it …

Maria-12648430 · January 10, 2025, 10:18am

Well, it does that. The important point is “… immediately after the previous callback, …”, it doesn’t say anything about receiving/handling any messages (but maybe it should explicitly say that it doesn’t do ).

Anyway, what the {continue, ...} feature does (and was AFAIK designed for) is to execute some code after a callback has executed. Like, when init returns {ok, ..., {continue, ...}}, proc_lib:init_ack is called (to signal back to a starting process that the process has been started and unblock it), then immediately execute the respective handle_continue clause. When handle_call returns {reply, ..., {continue, ...}}, it will reply to the process having made a gen_server:call, then immediately execute the respective handle_continue clause. In a way, it is something that actually concerns a process communicating with and waiting for the gen_server process in question, ie to unblock it early.

Maybe. Maybe not. Why only system messages then? What about 'EXIT' messages, like, from the parent, or sibling processes, or any process the specific gen_server implementation thinks important?

Well, be my guest, impress and amaze me

Seriously though, the only way (that I can see) for this to go about is via (notoriously expensive) selective receives. I tried something similar with Add ability to prioritize termination to `gen_*` behaviors by Maria-12648430 · Pull Request #8371 · erlang/otp · GitHub, and using selective receives was a major show stopper there

max-au · January 10, 2025, 7:18pm

Here you go! https://erlangforums.com/t/eep-76-priority-messages/

@rickard that’s exactly the case I started with for “priority” messages. Of course it won’t be possible to distinct between gen* priority messages and user-generated priority messages, but I think it’s OK.

vances · January 11, 2025, 4:31am

I guess in my mind I was thinking that system messages were being handled with selective receive. Although I’ve been using Erlang as my primary language since last century, I rarely manage processes outside of OTP and never, ever, write receive. I see now that it is gen_server which contains the loop and calls sys:handle_system_message/6 after receiving. As an OTP developer what I would want is for system messages to take priority, so for me, system messages would be the primary use case for EEP76.

It occurs to me that the semantics of what I actually want is provided by timeout:

-module(server).

-export([init/1, handle_cast/2, handle_call/3, handle_info/2]).

-behaviour(gen_server).

init(_Args) ->
	{ok, 0}.

handle_cast(N, _State) ->
	erlang:display({?MODULE, ?FUNCTION_NAME, N}),
	{noreply, N, 0}.

handle_call(_, _From, State) ->
	{noreply, State}.

handle_info(timeout, 0 = State) ->
	erlang:display({?MODULE, ?FUNCTION_NAME, State}),
	{noreply, State};
handle_info(timeout, State) ->
	{noreply, State - 1, 0}.

I want the opportunity to service incoming messages after each chunk of work, even if I do not expect regular messages. This works:

1> {ok, S} = gen_server:start(server, [], []).
{ok,<0.86.0>}
2> gen_server:cast(S, 100000000), sys:get_status(S).
{server,handle_cast,100000000}
{status,<0.86.0>,
        {module,gen_server},
        [[{'$ancestors',[<0.84.0>,<0.72.0>,<0.70.0>,user_drv,
                         <0.69.0>,<0.64.0>,kernel_sup,<0.47.0>]},
          {'$initial_call',{server,init,1}}],
         running,<0.86.0>,[],
         [{header,"Status for generic server <0.86.0>"},
          {data,[{"Status",running},
                 {"Parent",<0.86.0>},
                 {"Logged events",[]}]},
          {data,[{"State",99987107}]}]]}
3> {server,handle_info,0}

vances · February 20, 2025, 12:26am

The gen_statem behaviour does not handle system messages when following the same pattern as I used successfully for gen_server:

-module(statem).

-behaviour(gen_statem).

-export([init/1, callback_mode/0, handle_event/4]).

callback_mode() ->
   [handle_event_function].

init(_Args) ->
   {ok, undefined, 0}.

handle_event(cast, N, _, _) ->
   erlang:display({?MODULE, ?FUNCTION_NAME, N}),
   {keep_state, N, 0};
handle_event(timeout, _, _, 0 = Data) ->
   erlang:display({?MODULE, ?FUNCTION_NAME, Data}),
   keep_state_and_data;
handle_event(timeout, _, _, N) ->
   {keep_state, N - 1, 0}.

Whereas the gen_server process handled a system message for sys:get_status/1 during a timeout, the gen_statem does not:

1> {ok, S} = gen_statem:start(statem, [], []).
{ok,<0.86.0>}
2> gen_statem:cast(S, 100000000), sys:get_status(S).
{statem,handle_event,100000000}
** exception exit: {timeout,{sys,get_status,[<0.86.0>]}}
     in function  sys:send_system_msg/2 (sys.erl, line 754)
3> {statem,handle_event,0}

I tried each form of timeout_action(), event_timeout(),
state_timeout() and generic_timeout(), with the same results.

raimo · February 20, 2025, 2:20pm

This is a special dark corner of gen_server time-outs that gen_statem doesn’t cover.

Since the gen_server (event) time-out is implemented through essentially receive Msg -> decode(Msg, ...) after TimeOut decode(timeout, ...) end. In decode(Msg, ...) system messages are recognized and passed to sys:handle_system_msg/7 which loops back to gen_server’s receive statement above that restarts the time-out.

The time-out is restarted by a system message so it may become longer (multiplied), but if it is 0 we instead get an infinite loop over receive ... after 0 ... end that polls system messages. Should a different message be received the loop terminates and a handle_* function is called that may of course start a new loop.

The governing principles here are that gen_server cannot do a selective receive (for efficiency), that a system message should not end a time-out prematurely, and that the event time-out is implemented with receive ... after TimeOut ... end (for efficiency and simplicity).

gen_server is the result of those principles.

For gen_statem it was decided that time-outs should be implemented by timers, so they become oblivious to system messages. But a relative time-out of 0 got special treatment: to become predictable it becomes an event that arrives before any external event so that external events will not interfere with the order of generated internal events. receive is not invoked while there are events in the internal queue, such as a time-out 0 event, since selective receive (for system messages) cannot be used.

gen_statem is the result of those principles.

In general, both a gen_server and a gen_statem should always be able to respond to messages in a timely manner, whether they are calls, system messages or others. Busy work can be done in a worker process, linked or monitored.

Another dirty hack that can be used is to send yourself ordinary process messages to step the work forward.