Do BEAM nodes monitor epmd?

fadushin · June 7, 2023, 12:52pm

If a BEAM node registers itself with epmd, but epmd is killed, does the BEAM re-register itself with epmd?

We have found cases where under some systemd-based operating systems, epmd is mysteriously killed, likely because it has been spawned by a process it doesn’t know anything about. As a result, when epmd starts again (e.g., due to another distributed BEAM process starting on the same host), it has lost any node/port information that has been registered with it.

aartamonau · June 8, 2023, 2:16am

I believe it does not. In fact, at some point I stumbled upon this code in rabbitmq that’s dealing exactly with the issue you’re describing.

max-au · June 8, 2023, 3:12am

I actually think it does, at least with newest OTP26.

Just did a quick test, running epmd -d, starting a few nodes, then Ctrl + C the daemon, and start it back - node gets re-registered quickly, within a second or two. Don’t remember off the top of my head where is the code doing it, but it has to be somewhere.

aartamonau · June 8, 2023, 3:55am

Indeed it does since otp-23.3: Reconnect to epmd · erlang/otp@180d855 · GitHub.

I ran into the old (non-reconnecting) behavior myself, good to know it got fixed.

starbelly · June 8, 2023, 11:05pm

For @fadushin I would suggest for run epmd as its own service. If you need an example of a service file for epmd, let us know.

fadushin · June 8, 2023, 11:23pm

Yeah, that makes sense. Is there a reason that the VM does not also re-spawn epmd, if it is not running?

starbelly · June 9, 2023, 12:48am

That’s a good question for which I don’t have a good answer. Someone from the OTP team would have to chime in on that.

Edit:

I forgot to mention, when running epmd as its own service under systemd or <insert service supervision solution here>, you should have a symlink operation as part of your workflow. Specifically and what we do at work is on every full deployment we check to see if the erts version for our release has changed and if so we update an erts symlink such that our service file simply has to point to the symlink vs a specific version of erts. Not doing so can result in odd behavior that lead to “Why?! Why?! Why?! Oh, that’s why…” moments