Tagged processes in pg module

paulozulato · July 30, 2022, 12:47am

Hi all,

I’m using the pg module in a project to make use of its amazing distributed named process group feature, but I’m missing something: tagged processes ({Tag :: any(), Process :: pid()}). This would be very useful to identify some particular processes within a given group.

For example, suppose some resource is controlled by two workers and one leader and this process set is added to a pg group. In this way, it’s easy to get the processes associated to a resource, but it’s hard to identify the leader process among the returned ones:

% Set up group
Scope = some_scope,
GroupId = 42,
Workers = get_pid(workers), % returns [pid(), pid()]
Leader = get_pid(leader), % returns pid()
pg:join(Scope, GroupId, [Leader | Workers]).

% Getting members
Members = pg:get_members(Scope, GroupId).
% [Pid1, Pid2, Pid3] -> which one is the group leader?

As a workaround, I’m creating another group with only the leader process, so I end up with two groups: one for workers and another one for leader (or one for all members and another one for leader). It would be the same if using scope for separating processes by tag/role: we would end up with single-element groups to accommodate such particular processes.
Also, this approach requires a new group per each label that is created. This seems to me I’m missing something.

One way to solve this question is to allow a tag to be associated with a process when joining the group. With tagged processes, these particular processes would be easily identified:

      pg:join(Scope, GroupId, [{leader, Leader} | Workers]).
    
      Members = pg:get_members(Scope, GroupId).
      % something like [Pid1, {leader, Pid2}, Pid3]

      % it's easy to get the leader process from the group
      {leader, LeaderPid} = lists:keyfind(leader, 1, Members).

This approach seems to resolve the question without breaking backward compatibility or changing the module too much.

What do you think about this feature? Am I missing some way to use pg without tagged processes and being able to easily pick some particular processes from a group?

Finally, I’m working on a PR to add this feature to the pg module if this idea is accepted. Suggestions are welcome.

–
Best regards,
Paulo Zulato

MononcQc · July 30, 2022, 3:05pm

From the documentation:

Process Groups implement strong eventual consistency. Process Groups membership view may temporarily diverge. For example, when processes on node1 and node2 join concurrently, node3 and node4 may receive updates in a different order.

Membership view is not transitive. If node1 is not directly connected to node2, they will not see each other groups. But if both are connected to node3, node3 will have the full view.

Strong eventual consistency means that you can still have temporary inconsistencies or divergences. If you have to elect a leader, then I would say you have a need for Strong consistency, not strong eventual consistency.

using PG to tag a leader process would therefore counteract the election or selection of a leader in the first place, and you’ll need to have a repair process (to redirect messages from a non-leader to a leader).

Tagged processes may be useful in different contexts, but it would be harder to make a case for them given processes can be in multiple groups. The group leader tagging is probably not a good example though.

paulozulato · July 30, 2022, 5:24pm

Yes, I was unhappy with this example. I tried to simplify and oversimplified it. Sorry for that.

In fact, my use case is static - no election involved - and I have some processes (local or remote) handling a resource using well-defined roles and, if a process crashes, it’s restarted by the supervisor on the same role. Therefore, there is no election for defining the roles and a given group will always have the same number of processes on it (one process per role), although the process related to a role can be replaced if it crashes.

And pg fits like a glove this use case, except when I need to send a message to the process related to a specific role within this group. In this use case, tagged processes would fulfill my needs:

given a resource instance, it’s mapped to a pg group;
I can get all processes associated to a resource by querying pg, so I can broadcast generic messages to all participants handling that resource instance;
if a process crashes, it’ll be removed from the group by pg itself and the new one will join the group on init, so pg group will be updated automatically, then I don’t need to control this situation manually;
I can identify processes by role within the group, so I can send specific messages to the right process that is handling some aspect of that resource - each process role manages an aspect of the resource.

To achieve these points without tagged processes I’ve created a general scope and a scope per role, so each process joins the related group id on general and role specific scopes, ending up with several pg groups per resource instance: a general group with all processes related to that resource and several single-element groups related to their roles in the group. With tagged processes, I could have only one group per resource instance.

Now I think the use case is better explained:

%%%%
%% The resource instance 42 is controlled by three processes (local or remote).
%% Each process manages one aspect of that resource.
%% Let's name their roles top, middle and bottom.
%%%%
%% Set up group
Scope = some_scope,
GroupId = 42,
Top = new_pid(top, GroupId), % returns pid()
Middle = new_pid(middle, GroupId), % returns pid()
Bottom = new_pid(bottom, GroupId), % returns pid()
Processes = [{top, Top}, {middle, Middle}, {bottom, Bottom}].
pg:join(Scope, GroupId, Processes).

%% Getting members
Members = pg:get_members(Scope, GroupId).
% something like [{top, Pid1}, {middle, Pid2}, {bottom, Pid3}]

%% it's easy to get the process which controls the aspect "middle" in this group
{middle, MiddlePid} = lists:keyfind(middle, 1, Members).

domi · August 1, 2022, 1:29am

What is the problem with the “multiple groups” approach?

paulozulato · August 1, 2022, 1:35am

There is no problem using it, but I have to manually manage several different pg groups which are related to only one group/resource.

max-au · August 1, 2022, 2:39am

That is, indeed, the intended design. We are using pg to implement a kind of “sticky routing” by creating groups named {serviceA_primary, 12} where serviceA is the name of the service, _primary means “send to a process from the primary group if there is one”, and 12 is partition. In addition to primary, we have secondary - accepting traffic when there is no processes in the primary group. It is essentially a dumbed down leader election mechanism with no guarantees, but it worked well for us.

Alternative approach would be to have a scope named serviceA and within that scope have primary and secondary groups (or, {primary, N} for partitioned services).

With new pg monitoring functionality (already merged to maint so probably coming in 25.1) it will be possible to have more complicated schemas, even some sort of a “leader election” mechanism on top of pg. I still owe a blog post on that…