Use of {continue, _} in gen_server's init

saleyn · May 12, 2022, 2:23am

The documentation of gen_server’s callbacks has options to return a continuation in the form: {reply,Reply,NewState,{continue,Continue}} or {noreply,NewState,{continue,Continue}}, as the result of this PR. The document states:

handle_continue/2 is invoked immediately after the previous callback, which makes it useful for performing work after initialization or for splitting the work in a callback in multiple steps, updating the process state along the way.

To the reader the word “immediately” implies that the handle_continue/2 is called synchronously upon the return from a handle_call/3 or init/1 BEFORE the response is sent to the caller. However, the implementation is first returning the result to the caller and then calls the continuation.

The current implementation seems to be inconvenient when at the end of the init/1 you want to perform some continuation synchronously before returning the control to the caller, as otherwise it may lead to cases requiring additional synchronization. I’d like to know if the current implementation is consistent with the original intent behind the return of {continue, _}, and whether it would be a good idea to introduce {sync_continue, _} to be able to call the continuation before returning the result to the caller.

Regards,

Serge

max-au · May 12, 2022, 3:02am

The whole point of continue is to return control (and allow process that runs start_link to continue).

This feature is incredibly useful for some asynchronous initialisation that has to be done, but should not make the supervisor to wait before starting all other children.

If you need something that’s executed before sending a response to the caller, then you should have that code in init/1 or handle_call/3 itself.

dominic · May 12, 2022, 4:21am

If there’s some example?
I want to learn about continue too

SirWerto · May 12, 2022, 8:36am

The link in the second answer is extremely interesting

max-au · May 13, 2022, 3:55am

One example would be to start multiple children concurrently. Imagine this supervision tree:
myapp_sup → [child1, child2, child3]. All of them are independent (so, myapp_sup is one_for_one). They are all slow to init (for example, read large files - translations, IP to ASN mappings, phone number mapping etc.).

So we have children loading files in handle_continue. All 3 children are initialising concurrently, and total startup time is equal to the longest of the 3 - not the sum of these 3.

Pretty much any asynchronous work that shouldn’t fail gen_server supervised startup can be done there. Say, if it’s a cache (I should probably open source gen_cache behaviour we’ve been using internally for quite a long time), then cache warm-up can happen in handle_continue (because cold cache can be already utilised - via ETS - by other processes).

dominic · May 13, 2022, 7:15am

thanks you example

I have a question about startup
how could you know the system is startup
in this case, maybe startup is not important?
beacuse, call will block(but maybe 5s timeout)
or say, if we care about this, just make it sync init

dominic · May 13, 2022, 8:12am

after reading @SirWerto’s reply
But can continue work with this situation:
there is a function aaa in deep level of handle_xxx

aaa(State) ->
    %% when we working in a team
    %% someone maybe add their trigger in the function 
    %% insead of using event(info), self() ! trigger_xxx
    State1 = trigger_bbb(State),
    State2 = trigger_ccc(State1),
    State2.

it’s normally, if you agree about these, keep going
when trigger_ccc raise an error, it’s obvious that I can’t using contine to rescue trigger_bbb
beacuse aaa in deep level of handle_xxx, trigger_bbb can never use continue!
I think continue is a week in checkpoint
but, maybe I’m understand in a wrong way

this part it’s is not about gen_server
if it should not be writed here, I will delete it
I have an implementation about checkpoint too
I write a behaviour top of gen_server
my strategy is

save every thing in state
save all ansyc message in state, gen_server:cast/3, send/3.etc
give flush and rollback function, flush will save current state in process_dict, rollback will get old state from process_dict
user should flush after a sync operation, ets:insert2, gen_server:call/3.etc
when I catch any error in try handle_xxx, auto rollback and log error

LostKobrakai · May 13, 2022, 8:20am

You could have a forth process started after the 3 others, which does a call to the 3 prev. processes and returns only after a response was received for each. Then the setup happens in parallel, but the whole supervisor is only considered started after setup is completed. I’ve used this recently in nerves_time with a configurable wait time, so it tries to complete the setup synchroniously up to a limit and otherwise just continues asynchronous.

LeonardB · May 13, 2022, 12:22pm

Interesting @LostKobrakai

This has me thinking about a possible setting for supervisor on how to start children.

Would it make sense to have a {children_start_type, sync | async} option for supervisor, with ‘sync’ being the default?

With async the supervisor could start all children in parallel, whereas sync would be the existing ordered/synchronous behavior.

This behavior was/is achievable using proc_lib, and now with using handle_continue in all the children, but it seems it may be convenient to specify it at the supervisor level and write ‘slightly’ less code in the children.

srijan · May 13, 2022, 2:04pm

I think the handle_continue method is more powerful because you might want some part of the initialization to be in sync and some in async.

Doing essential initialization in sync allows failing early in case something goes wrong.

LeonardB · May 13, 2022, 3:01pm

I don’t disagree with you at all. handle_continue is great when you have a ‘mix’ of different servers under the supervisor and you want to explicitly control the return/startup behavior of a child.

An example case would be when there is no explicit ordering requirement for starting any of children under a supervisor and those children have variable start/init times. That would allow for implicit parallel starting of all the children while maintaining the failure semantics of the normal supervisor.

When using handle_continue in the init, the supervisor immediately sees a successful start of the child process, whereas if the supervisor supported async starting of children, the supervisor would still wait for all children to return before continuing.

max-au · May 14, 2022, 4:33am

Exactly.
It has however a weak spot - for one_to_one supervisor, it may happen that only a single child crashes, and gets restarted.

juhlig · May 14, 2022, 10:59am

Hm, interesting, but how does the 4th know which processes to call out to? It can’t ask the supervisor via which_children since it is still in it’s own init and won’t respond unless that has returned. So unless the 3 processes starting async are all registered ones, I don’t see how this can be done.

LostKobrakai · May 14, 2022, 11:20am

In the case I mentioned it was a named process anyways.

max-au · May 16, 2022, 3:50am

It could be the other way around: these 3 processes are sending a message to the fourth (which is registered).

juhlig · May 16, 2022, 6:11am

Not really… That fourth process will be started last, so the previous 3 don’t know if it is started yet when they try to contact it to say that they’re operational. Unless they go check if it is there, of course, but that again is cumbersome IMO.

saleyn · May 18, 2022, 2:35am

The point I was making was that the documentation uses the words “invoked immediately after the previous callback”, which falsely creates an impression that if used as the return of the init/1, that the caller (i.e. supervisor) will only be notified given a response at the end of the continuation, which is inconsistent with the current implementation. This at least needs to be documented, but IMHO, both functionalities are useful - the one where a handle_continue/2 is invoked by init/1 before the response is delivered to the caller (for performing some synchronous task in handle_continue, which could be used as the continuation by other handlers) as well as the current implementation, where the response is delivered to the supervisor asynchronously which allowing the gen_server to perform post-initialization work asynchronously.

max-au · May 18, 2022, 4:24am

The way I read is quite opposite, - what’d be the point of having such a continuation callback, if not for replying to the supervisor?
Although if you can think of a better wording, I’d suggest opening a PR (pull request) on GitHub to improve the documentation. It’s probably one of the easiest and the same time impactful ways to contribute!

saleyn · May 19, 2022, 2:03am

The way I read is quite opposite, - what’d be the point of having such a continuation callback,
if not for replying to the supervisor?

The point would be that that continuation callback could be used by other handlers. E.g. if it performs some work that is done repetitively, immediately by init (which would have to be done before yielding control to the supervisor), and also on a timer, in which case the handle_info/2 would just call the same continuation.

In this case if the gen_server’s implementation of the init/1 handler detected {ok, State, {continue, Continuation}}, it could first invoke the Continuation, and only then call proc_lib:init_ack/2 to let supervisor resume its work. To maintain backward compatibility this type of behaviour could be achieved by calling it {ok, State, {sync_continue, Continuation}}.

You validly commented that “If you need something that’s executed before sending a response to the caller, then you should have that code in init/1 or handle_call/3 itself.” However, the same argument could be made that in the absence of the {continue, Continuation} return, the same is achievable by a workaround of dispatching a message to self(), yet, it was a convenience to have the continuation API added to gen_server.

srijan · May 19, 2022, 4:30am

The point would be that that continuation callback could be used by other handlers.

Achieving this does not need any special gen_server feature - the common part of the init that can be called by other handlers can just be extracted out into a function.

However, the same argument could be made that in the absence of the {continue, Continuation} return, the same is achievable by a workaround of dispatching a message to self(), yet, it was a convenience to have the continuation API added to gen_server.

No. Sending a message to self() does not guarantee that that message gets processed before any other message in it’s queue gets processed. The handle_continue feature is required to enable doing some initialization work in async without some other message being processed before the async part.