Determining processes lingering in old code

Wondersye · August 20, 2022, 9:04am

Hi fellow Erlangers,

Regarding hot code reload, when purging a module (e.g. with code:purge/1), the Erlang code server is able to determine which processes are lingering in the old code for that module (so that it can kill them).

Would there be a (non-OTP, preferably cheap/scalable) way of determining a list of such processes, prior attempting to purge?

(keeping track of such processes is difficult as any process may call at any time an exported function of a module being reloaded, and scanning all live processes with erlang:check_process_code/{2,3} is hardly an option)

Thanks in advance for any hint,

Olivier.

max-au · August 25, 2022, 8:12pm

any process may call at any time an exported function of a module being reloaded

When the module has been hot-code-loaded, it’s no longer possible to call the old version of the exported function.
Solution you suggested (Lingering = [Pid || Pid <- processes(), check_process_code(Pid, Module)]) after loading new code is exactly what we use.

Wondersye · August 26, 2022, 7:52pm

Thanks Maxim for your answer; indeed, should an exported function be called just prior to a code change, as two versions of the module are kept, ongoing calls to the old code will peacefully extinguish without involving the killing of the corresponding processes (providing none is looping over such a function; just issuing one-time, module-qualified calls). This is a fine mechanism.

I was wondering if code_purge/1 had a smarter way of detecting lingering processes than exhaustive search (not that easy to find the NIF ultimately implementing it); as anticipated I understand this is not the case; however at least some applications may also be able to store the PIDs / determine these processes by themselves, in a more scalable way, so this is fine as well. Not to mention that any process checking could be done also concurrently. Neat feature.

Thanks again!

max-au · August 26, 2022, 8:27pm

Internally ERTS uses non-blocking way to call check_process_code, see otp/erts_code_purger.erl at master · erlang/otp · GitHub
As you can see there, it’s “exhaustive search” approach.

It does pretty much the same as check_process_code/2 but has higher concurrency (important you have millions of processes, but does not matter for just a few thousands).

What is your use case, e.g. what do you plan to use this functionality for?

Wondersye · August 26, 2022, 9:54pm

Many thanks for the pointers, I thought everything was done on the C side.

In terms of use case, I was planning an hobbyist application making use of WOOPER that would have to be updated while being in continuous operation (hence the first support that I added recently).

The idea is that (in OOP parlance), in the context of a class upgrade/downgrade:

calls to static methods would be managed as discussed previously (as exported helpers), hence not killing callers (any process)
instances of that class (that is: its direct instances and also the ones of all classes inheriting from it, directly or not) would undergo a corresponding (possibly class-specific) state update plus a branching to the main loop provided by the newer version of the module implementing that class

Yet these instances would have to be identified beforehand - hence my initial question.
I see now that this last need could be covered either by (1) maintaining class-specific sets of instance PIDs through their life cycle (“application-specific solution”), or (2) exhaustive search (then preferably in a concurrrent way, like the pointer to the ERTS code purger that you shared).

It is way too early for me to go further, as maybe updates would apply to very specific, single classes (and their child class) only (in which case preserving sets of instance PIDs would make sense, to be able to target them directly), or to whole sets of inter-related classes/modules, like a mass system update (in which case an exhaustive scan could be meaningful, as a larger number of processes/instances would be impacted anyway).

At least the options are well-identified now - thanks for that!

max-au · August 27, 2022, 5:08am

You might be interested in how OTP itself handles hot code upgrade (using release_handler and upgrade scripts). However we never made it convenient enough for frequent releases (which you seem to be after), so take the advice above with a grain of salt.