Peer discovery in supervisor

maxlapshin · September 19, 2022, 6:38am

We have discussed this topic about a year ago on googlegroups, so I want to continue it here.

Supervisor children can be grouped to process some data together. This collaborative data processing requires knowledge of sibling pids.

Process A have done something and now he wants to send it to process B. How can he know the pid of B?

We have global process registering. It is an analog of singleton in other languages. Somebody can mention gproc or my gen_tracker (that works for us for more than 10 years).

All these methods are really bad when we speak about decomposition, isolation, testing and sharing code.

What library will you choose: the one that spoils global namespace and do not allow to create more than one instance of database connection, or another one, that runs in an isolated environment, has very clear and strictly defined input and output without unforeseen side effects?

I like the idea of peer discovery in Kubernetes: any program in this operation system (yes, k8s is a cluster operation system, just like linux is a computer operation system or erlang is an in-program OS) can find out IP addresses (they act like pids in linux or erlang) from a central DNS resolver that will always respond.

Erlang has a better thing than central resolver: it has implicit supervisor in $ancestors and it may be a good idea to ask him: give me the pid of your child with id=packet_filter

Question here is:

should supervisor push pids of children to newly spawned processes? If yes, than what to do with restarts? Maybe some protocol for updating peers?
maybe process should query supervisor itself? It doesn’t require any additional code, but brings new comlpexity to each process.

What do you think?

Perhaps I’m trying to design some kind of dependency injection mechanism, but I have zero experience in Java, so haven’t met with it in wild.

LostKobrakai · September 19, 2022, 6:52am

I don’t think it’s a good idea to push process registration knowledge to supervisors. If you really want to “key” a group of processes by their parent supervisor I’d use the $ancestors pid when registering in some process registry, which allows custom terms to be used as keys. When the parent pid changes all children restart, so that should be fine, while individual children restarting will re-register them.

At least by know I’ve usually found some more natural key to be used for registering multiple processes, which are meant to cooperate, to.

maxlapshin · September 19, 2022, 7:09am

Supervisor already has process registration. It is a child id.

The main idea is that parent supevisor can give a good namespace. Any global registration like gproc becomes a nightmare when name clash happens.

maxlapshin · September 19, 2022, 9:48am

Some more ideas:

We already have written by OTP team system that gives us very strict guaranties about having one and only one process with specified name in current namespace. This namespace has non-global naming and it is a supervisor pid. Supervisor has its own child id namespace and it is very cool, much better than any global namespace.

This guaranty is extremely useful because we want to have only one process that opens file for writing or do any other things only once.

Question is: how to update pids in this namespace. When everything starts, it is not hard to ask supervisor for children name and save their pids in local state.

It is not hard to monitor each of them (4 lines of code per each sibling).

It will be hard to update. How can we know that supervisor has restarted a child? It may bring some delay in restart and we will end up with polling supervisor.

LostKobrakai · September 20, 2022, 8:36am

That’s exactly why I don’t think the supervisor should act as a registry. Registries already handle those things. via-tuples allow you to interact with processes without needing to deal with low level pids on both ends – the starting of children/registration as well as referencing them from elsewhere. I only have experience with Elixirs Registry, but you can start as many of them as you like or just key things by a tuple of {my_ancestor_pid, term} using a single (node local) registry. I’m totally with you that the OTP provided registries have downsides, they either allow only atom keys or are cluster wide, but that doesn’t mean other registries cannot make other tradeoffs.

nzok · September 23, 2022, 2:14am

First question: is the idea to revise an existing behaviour
or to create a new one?

Second question: is a single-level scheme wanted, or a recursive one? Just where are the encapsulation boundaries to be drawn in
supervision trees?

Third question: numbers. How many registry updates per second
are envisaged, to how many clients?

More questions after those are answered.

Abdelghani · September 26, 2022, 2:20pm

@maxlapshin Yes discovery can be a problem but Processes are from the same node or can be remote ?
In the first case I think a manager server will resolve the problem by registering all node pids but it can be bottleneck in terms of scalability.
In the second case I have watched whatsapp conference video on youtube but I didn’t understand the idea… I think when you have a cluster and you want to search a Pid basing on Username or something you can hash the Username before connecting to pick a node and when someone want the Pid of this User, he should also hash Username to pick the same server and after use the local Pid Resolver Server Manager.

maxlapshin · September 26, 2022, 2:57pm

Great questions, thanks!

is the idea to revise an existing behaviour

I’m not sure. Right now I’m more discussing it. Currently I’m going to implement what I want with a set of gen_servers. I need to find out better, what exactly do I want.

Just like an idea of reconfigurable child in supervisor that really should be implemented, but is not there yet.

is a single-level scheme wanted, or a recursive one?

Of course it would be great to have an incapsulated inside current supervisor scheme. Single-level will require passing current context to each gen_server.

Third question: numbers

Really thank you for these questions!

I have about 5-15 children per supervisor. Practical limit of such blocks is about 2000 per one host.
We speak about several updates per second on start and maybe several queries per second in runtime.