I was rearranging my supervisors and had a thought. How difficult would it be to provide an anonymous function or module / {M, F} as an optional configuration to the supervisor?
This optional module would be called with the crash information and could override the default supervisor restart strategies, or provide other actions. Think gen_event. For example:
% My Erlang is a bit rusty .. too much Elixir ;-)
handle_info({'EXIT', Pid, Reason}, #state{crash_mfa = {Mod, Func} = State) ->
SupPid = self()
% Spawn so it won't block, but has to complete within a short time period.
FunPid =
spawn(fun() ->
SupPid ! {actions, self(), erlang:apply(Mod, Func, [Pid, Reason, State])}
end)
CustomActions =
receive
{actions, FunPid, Actions} ->
Actions
after
2000 ->
default
end,
case restart_child(Pid, Reason, CustomActions, State) of
{ok, State1} ->
{noreply, State1};
{shutdown, State1} ->
{stop, shutdown, State1}
end;
The current supervisor code would still handle the grunt-work, and may provide an API so the writer of the optional module can get the default config and prior restart interval data.
Just a thought. But I ran into a case recently where this would have been helpful.
Heh… I have been thinking about something like custom supervisors for a few years now The problem is that, well, it is not that easy (as you make it sound ), the more you think about it. It has to fit a lot of use cases, everything is kinda interweaved, and so one thing leads to another and another and yet another… until you arrive at something just short of a complete custom from-the-ground-up tailored-to-fit implementation of a supervisor
That’s unfortunate, I think it would be a nice feature. In my case we have a gen_server that has let it crash event A where we can get away with a one_for_one strategy and keep trying many times for several minutes. The other let it crash events we want one_for_all strategy and limited restarts.
Maybe I’ll have a go at hacking something together
I would be careful with those, ie not understand them as alternatives in the sense of “the better supervisors”
supervisor3 is Klarnas version of supervisor2, which is RabbitMQs version of supervisor:
AFAIK/IIRC RabbitMQ forked supervisor, modified it to their needs and named it supervisor2, and Klarna did likewise with their supervisor2 → supervisor3.
This basically means that supervisor2 and supervisor3 may be lagging behind and/or deviating from the standard OTP supervisor and even each other, for a while or forever, depends. Also, they are tailored to their needs, which may or may not coincide with yours, now or later on, no guarantees there
Parent on the other hand is Elixir. Which shouldn’t bother you much :
`% My Erlang is a bit rusty .. too much Elixir ;-)`
I have been thinking about your proposal for a while, and I don’t dislike it I would do it differently, though.
As I see it, this amounts to a custom child restart strategy on top of the existing permanent, transient and temporary. I would rather implement this as an optional callback than as “just any function”, as it is likely closely tied to the supervisor implementation.
This callback would have to be plugged between the detection of a child exit and the restart. I would not call it in a separate process, that seems unnecessary and more complexity than it is worth (timeouts, exit detection, …).
As return values, instructions like “ignore”, “ignore and remove” and “restart” are the obvious ones. Also, overriding the supervisors restart strategy may be possible, like “if this child dies, restart it” (one_for_one) or “if this other child dies, restart all” (all_for_one). Evene something like “if this yet other child dies, shut down the supervisor”.
There are certainly a number of edge cases to think of. Like, what do we do if the callback function crashes? What about static children (given in the return from init/1) vs dynamic ones (added later via start_child/2)? What about simple_one_for_one supervisors, which are always a kind of special case? Etc, etc.
But in general, I think I’m about to buy into the idea