Customizable Supervisors - how difficult would it be to provide an anonymous function or module / {M, F} as an optional configuration?

nhpip · September 26, 2024, 2:33pm

I posted this on Elixir forums.

I was rearranging my supervisors and had a thought. How difficult would it be to provide an anonymous function or module / {M, F} as an optional configuration to the supervisor?

This optional module would be called with the crash information and could override the default supervisor restart strategies, or provide other actions. Think gen_event. For example:

% My Erlang is a bit rusty .. too much Elixir ;-)
handle_info({'EXIT', Pid, Reason}, #state{crash_mfa = {Mod, Func} = State) ->
  SupPid = self()
  % Spawn so it won't block, but has to complete within a short time period. 
  FunPid = 
     spawn(fun() -> 
              SupPid ! {actions, self(), erlang:apply(Mod, Func, [Pid, Reason, State])} 
     end)

 CustomActions =  
   receive
       {actions, FunPid, Actions} -> 
          Actions
  after
    2000 -> 
         default    
  end,
  case restart_child(Pid, Reason, CustomActions, State) of
	{ok, State1} ->
	    {noreply, State1};
	{shutdown, State1} ->
	    {stop, shutdown, State1}
  end;

The current supervisor code would still handle the grunt-work, and may provide an API so the writer of the optional module can get the default config and prior restart interval data.

Just a thought. But I ran into a case recently where this would have been helpful.

Maria-12648430 · September 26, 2024, 4:06pm

Heh… I have been thinking about something like custom supervisors for a few years now The problem is that, well, it is not that easy (as you make it sound ), the more you think about it. It has to fit a lot of use cases, everything is kinda interweaved, and so one thing leads to another and another and yet another… until you arrive at something just short of a complete custom from-the-ground-up tailored-to-fit implementation of a supervisor

nhpip · September 26, 2024, 4:43pm

That’s unfortunate, I think it would be a nice feature. In my case we have a gen_server that has let it crash event A where we can get away with a one_for_one strategy and keep trying many times for several minutes. The other let it crash events we want one_for_all strategy and limited restarts.

Maybe I’ll have a go at hacking something together

LostKobrakai · September 26, 2024, 5:17pm

There also some existing supervisor alternatives like GitHub - klarna/supervisor3: Library to abstract supervisor2.erl from rabbitmq/rabbitmq-common or GitHub - sasa1977/parent: Custom parenting of processes in Elixir. Maybe the could help with your usecase.

Maria-12648430 · October 4, 2024, 10:30am

I would be careful with those, ie not understand them as alternatives in the sense of “the better supervisors”

supervisor3 is Klarnas version of supervisor2, which is RabbitMQs version of supervisor:
AFAIK/IIRC RabbitMQ forked supervisor, modified it to their needs and named it supervisor2, and Klarna did likewise with their supervisor2 → supervisor3.
This basically means that supervisor2 and supervisor3 may be lagging behind and/or deviating from the standard OTP supervisor and even each other, for a while or forever, depends. Also, they are tailored to their needs, which may or may not coincide with yours, now or later on, no guarantees there

Parent on the other hand is Elixir. Which shouldn’t bother you much :

`% My Erlang is a bit rusty .. too much Elixir ;-)`

However, it also states that it is ~3x slower and consumes ~2x more memory than DynamicSupervisor, and that the API is prone to significant changes.

(Disclaimer: this is just my $0.02 )

Maria-12648430 · October 4, 2024, 11:11am

I have been thinking about your proposal for a while, and I don’t dislike it I would do it differently, though.

As I see it, this amounts to a custom child restart strategy on top of the existing permanent, transient and temporary. I would rather implement this as an optional callback than as “just any function”, as it is likely closely tied to the supervisor implementation.

This callback would have to be plugged between the detection of a child exit and the restart. I would not call it in a separate process, that seems unnecessary and more complexity than it is worth (timeouts, exit detection, …).

As return values, instructions like “ignore”, “ignore and remove” and “restart” are the obvious ones. Also, overriding the supervisors restart strategy may be possible, like “if this child dies, restart it” (one_for_one) or “if this other child dies, restart all” (all_for_one). Evene something like “if this yet other child dies, shut down the supervisor”.

There are certainly a number of edge cases to think of. Like, what do we do if the callback function crashes? What about static children (given in the return from init/1) vs dynamic ones (added later via start_child/2)? What about simple_one_for_one supervisors, which are always a kind of special case? Etc, etc.

But in general, I think I’m about to buy into the idea

nhpip · October 11, 2024, 3:46pm

Yes, I understand where you’re coming from, I’m somewhat unsure about that too. My thought was spawning a separate process would protect the supervisor from someone doing something blocking.

Of course the developer can make the decision to spawn a separate process if they feel like it.