As per Erlang documentation, pool module is used to run a set of Erlang nodes as a pool of computational processors. I have the following queries. Please clarify.
Please share any example code to understand the modules pool and slave. If I get an example, I can myself find out answer for many of the queries listed below.
Set of nodes are specified in the file .hosts.erlang. Please confirm.
How to designate one of the node as master node?
A node can be designated as slave node using the “slave” module. Please confirm.
How to link the master node with the set of slave nodes?
Documentation says that pspawn spawns a process on the pool node that is expected to have the lowest future load. Should it be the CURRENT load?
Does pspawn consider only the slave nodes for spawning a process? or does it cofirm both slave and master nodes?
I’d honestly recommend avoiding these two. One (slave) is already deprecated, and replaced with peer that has a lot more robust API. The other one (pool) is not exactly functional either (hence all the questions you have). I’d hope to deprecate it too, and replace with some better clustering solution.
Thanks Max. When I checked Erlang Stdlib documentation, I do not find that these functions are deprecated. Where to check the function status (deprecated or not)?
Entire slave module (all of its functions) are deprecated.
The design you’re looking for usually goes like this:
Start multiple nodes and connect them in a cluster (you can use .hosts.erlang, or implement your own cluster discovery mechanism, or try something like libcluster). This will create a physical cluster (so nodes are aware of each other and can exchange messages).
On “worker” nodes, you start your application that registers “workers” processes in some pg group (Erlang’s pg library is logical level service discovery mechanism)
On “master” node, or any other node, schedule your jobs via normal gen_server:call(select_worker(), {do, Work}) where select_worker() is something like:
pg is used to implement service discovery. You cannot send a message to a node, you need to send it to a process.
Imagine that you created a cluster of 3 nodes, one, two and three, and connected them (by populating .hosts.erlang file). Now you designate one as your frontend node and run HTTP server there. Nodes “two” and “three” are your backend nodes.
Your HTTP server on “one” wants to make some calls to backends. But it cannot call a node, it needs to call some process. This is where you need pg: both “two” an “three” nodes can run some worker process that handles requests from “one”. So both nodes register this process in some pg group. HTTP server on node “one” picks (use your favourite algorithm) a worker process from the group and makes the call to that worker process. You can scale that by adding more workers to the group, and it’s by design HA - because there is a pool of available workers in that group.