Queries on the modules pool and slave

mohan · October 11, 2022, 2:34pm

As per Erlang documentation, pool module is used to run a set of Erlang nodes as a pool of computational processors. I have the following queries. Please clarify.

Please share any example code to understand the modules pool and slave. If I get an example, I can myself find out answer for many of the queries listed below.
Set of nodes are specified in the file .hosts.erlang. Please confirm.
How to designate one of the node as master node?
A node can be designated as slave node using the “slave” module. Please confirm.
How to link the master node with the set of slave nodes?
Documentation says that pspawn spawns a process on the pool node that is expected to have the lowest future load. Should it be the CURRENT load?
Does pspawn consider only the slave nodes for spawning a process? or does it cofirm both slave and master nodes?
How do I create multiple pools?

max-au · October 13, 2022, 1:54am

I’d honestly recommend avoiding these two. One (slave) is already deprecated, and replaced with peer that has a lot more robust API. The other one (pool) is not exactly functional either (hence all the questions you have). I’d hope to deprecate it too, and replace with some better clustering solution.

mohan · October 13, 2022, 5:37am

Thanks Max. When I checked Erlang Stdlib documentation, I do not find that these functions are deprecated. Where to check the function status (deprecated or not)?

Reference:

mohan · October 13, 2022, 5:38am

What is the alternative for clustering if these functions are deprecated?

max-au · October 13, 2022, 5:11pm

Entire slave module (all of its functions) are deprecated.

The design you’re looking for usually goes like this:

Start multiple nodes and connect them in a cluster (you can use .hosts.erlang, or implement your own cluster discovery mechanism, or try something like libcluster). This will create a physical cluster (so nodes are aware of each other and can exchange messages).
On “worker” nodes, you start your application that registers “workers” processes in some pg group (Erlang’s pg library is logical level service discovery mechanism)
On “master” node, or any other node, schedule your jobs via normal gen_server:call(select_worker(), {do, Work}) where select_worker() is something like:

select_worker() ->
    Pool1Workers = pg:get_members(worker_pool_1),
    Count = length(Pool1Workers),
    lists:nth(rand:uniform(Count) + 1, Pool1Workers).

bjorng · October 14, 2022, 4:28am

There is a Deprecations page in the documentation.

mohan · October 14, 2022, 4:39pm

Thanks Max. Please clarify the following queris.

In Java world, Clustering is introduced to provide HA mainly and scalability. Hope the Cluster is introduced in Erlang for the same purpose.
My understanding is that pg works at process level and Clustering works at node level. Given this, how can we use pg for cluster implementation?

max-au · October 14, 2022, 8:28pm

pg is used to implement service discovery. You cannot send a message to a node, you need to send it to a process.

Imagine that you created a cluster of 3 nodes, one, two and three, and connected them (by populating .hosts.erlang file). Now you designate one as your frontend node and run HTTP server there. Nodes “two” and “three” are your backend nodes.
Your HTTP server on “one” wants to make some calls to backends. But it cannot call a node, it needs to call some process. This is where you need pg: both “two” an “three” nodes can run some worker process that handles requests from “one”. So both nodes register this process in some pg group. HTTP server on node “one” picks (use your favourite algorithm) a worker process from the group and makes the call to that worker process. You can scale that by adding more workers to the group, and it’s by design HA - because there is a pool of available workers in that group.