`open_port` and zombie processes

I wrote draft PR #9453 which solves this issue on some platforms (those with prctl) by always killing all spawned child processes according to their process group, whenever the VM terminates. In my case, the child was a long-running rsync and should only continue if managed under the BEAM. I started with a one-off port service wrapper in my downstream project, but there are smells such as the emphatic documentation in Elixir which point in the direction of a language enhancement.

My naive assumption is that the children should be cleaned up in most cases, and if this is desirable then the default could be to kill them and a few exceptions would be easy enough to implement by isolating the child or grandchild in a new process group. I don’t have enough BEAM experience to guess if this is true, or to reason about how the migration to such a default might be accomplished, ie. whether existing applications are broken anyway (undergoing unplanned BEAM destruction) when they hit this edge case.

+1 that the intermediate erl_child_setup already provides much of the machinery to make this possible to do in a straightforward way!

It’s not clear to me whether erlexec can forward every case of abnormal VM termination, it seems to be trapping several specific signals and mostly waiting for pipe failures, but the library has a lot of wisdom to offer so perhaps it’s showing the most portable approach already. Its exec:run supports a kill_group flag similar to wojtekmach’s suggestion, which hints at such an option being useful.

6 Likes