ErlangRED & Supervisor node - any other configuration that might be necessary for a supervisor?

Hi There,

I’ve just created the supervisor node for ErlangRED and it looks a little like this:

The point is that it’s just another node that supervisors the two function nodes beneath it.

The configuration options of the node are these:

Besides the usual configuration (intensity, period, shutdown and restart strategy), the type is either static or dynamic - meaning either the children are supplied at creation or the children are added after the supervisor has been started.

The final configuration is which nodes (aka processes - as each node is a process) should be supervised. Here it possible to select all in flow (i.e. the flow tab), all in a group (i.e. when nodes are grouped together) or - as shown here - selectively, one-by-one.

Question: is there any other configuration that might be necessary for a supervisor?

Most of the options aren’t yet implemented, basically a selective dynamic one-for-one never-shutting-down supervisor is solely possible! But the intensity and period can be altered.

The implementation works but I wonder whether the complexity has to do with the domain (i.e. visual flow based programming) or whether I should do some refactoring. There is much interchange between the node behaviour, the supervisor node and the booting up of a flow - it seems to be necessary given the domain, but can it be done better?

Cheers!

5 Likes

So the supervisor node works so far that I can visually demonstrate the differences between the restart strategies:

img

What is shown is a supervisor that is supervising three function nodes and two debug nodes. The debug nodes are just counting messages as they come in. The bottom and top function nodes just pass through the messages they receive and the middle function node does an exit(self()) - its the process/node that is failing.

The supervisor has an intensity of 500 over 30 seconds. Not shown is a message generator that is an infinite loop that pauses for 5 millisecond between messages. The debug node at the very bottom is counting those messages.

The order of the process - as given to the supervisor node - is defined by the ‘y’ coordinate of the nodes[1]. So the function 1 node is the first, then the debug 1 node, then the exit call function node (the one in the middle) and then the function 3 node and debug 6 node. Order is of course important for the rest-for-one option.

So what happens? When the first message comes in (all function nodes receive the same messages), the exit call function fails and the supervisor restarts it.

For one-for-all, all nodes are restarted. The debug nodes have their counts reset to zero and hence they are showing 1. It’s always a new message but they never get beyond one because the exit-call function keeps being restarted by the supervisor and so are they.

The one-for-one strategy shows the debug nodes hitting a message count of 501 because after 500 messages, the supervisor dies. What is not shown is that the supervisor is automagically restarted after two seconds (shown by the status of the supervisor node). When the supervisor is restarted, the processes that represent the function and debug nodes are restarted and begin receiving message from existing nodes/processes[2] again.

Finally the rest-for-one strategy shows that the top debug continues to receive messages (i.e. isn’t restarted) when the exit function is restarted but the debug 6 counter stays at 1 - because it’s being restarted.

The supervisor node sends out messages with status: started, restarted and dead are the three states it has. When a status of dead is received, a different flow sends the supervisor a message to restart itself and its children. This doesn’t have to happen, the supervisor can be left for dead, if so desired.

That’s the current implementation state of the supervisor node. I haven’t updated the codebase with the latest version, after all it’s Sunday! :slight_smile:

[1] = NOTE: this is for testing only, never, ever, will I use visual location to influence code logic - a very dark pattern and impossible to debug. Process order for the supervisor is still an open question on how best to do that.

[2] = Luckily I built ErlangRED such that when these processes are restarted, they have the same name as the before so that existing processes sending messages to named process simply continue to send their messages to the new processes, not even noticing that the process died inbetween.

2 Likes

Just as an update, the supervisor node is now basically completed and I’ve created some test flows but first

Ordering of nodes (i.e. processes) is now done with a sortable list inside the configuration of the supervisor node:

No visual sorting of nodes within the flow.

I created some test flows to experiment with the supervisor pattern:

  • supervisor of supervisor of supervisor of … which seems rather unrealistic to have five levels of supervisors but it’s possible. Erlang-Red makes no heel out of it, I can create as many levels as I like - it’s supervisors all the way down.

  • supervisor tree which is a more realistic structure with a root supervisor and two sub-supervisors monitoring consumers of a data stream. As I describe in the test flow, this setup is probably very classic with a data stream producing messages that are handled by various consumers which in turn are supervised. In the case of this test flow, these consumers are just placeholder nodes represented by function nodes but the supervisor node can supervise any node type, so replacing them with something like a MQTT out node isn’t a problem.

  • standalone supervisor being restarted by a separate flow. This use case was created before it was possible to have supervisors of supervisors, so the question was how to restart a supervisor node? The answer was by using a message with the action restart. That is what this flow demonstrates. It allows supervisor nodes to be restarted either by having another supervisor node or by sending it a message or both even. Just that little more flexible.

I’m just cleaning it up so that there is better error reporting for misconfigurations, i.e., supervisor loops or when two supervisors want to supervise the same node - both are possible in Erlang-Red.

I’ve also wrote a write of the project as a whole for those interested. I discuss the architecture of the Erlang code a little bit. It’s probably as stable as it’s going to get unless there good suggestions otherwise.

1 Like