How to deal with destructors that can take a while to run and possibly block the scheduler?

varsill · December 13, 2024, 12:28pm

Hello!
I have a question concerning long running resource destructors.
AFAIK since OTP-22 resource destructors are always run on normal schedulers. What is more, this line indicates that they are alway run on scheduler nr 1: otp/erts/emulator/beam/erl_nif.c at 601a012837ea0a5c8095bf24223132824177124d · erlang/otp · GitHub
How should I deal with destructors that can take long to run and possibly block the scheduler?

Best wishes,
Łukasz Kita

starbelly · December 13, 2024, 9:03pm

I would view resource destructors as the last ditch effort to cleanup. In other words, when you are done with the operation your erlang code should call a nif function (e.g, done/1) to clean up immediately. In your resource destructor you should check to see if said cleanup was already performed, if not, then destroy.

You can count on resource dtors to run, the question of course is when, and this will be dependent on the behavior and/or life cycle of the process (i.e., when a gc happens iirc).

Is that infeasible for your case for some reason?

Edit :

Note : I should have noted, my statement is tangential to your question. Yet, I could not help but infer a few things from your statements and questions. I do think someone from OTP team needs to answer your question.

I did try to trace the code a bit myself. This code hasn’t changed in later versions of erts, so it’s not just OTP 22. I don’t believe it will always run on the first scheduler, it does try to grab the current scheduler, but clearly that won’t always happen, as there is a fallback to scheduler 1.

Of course, I myself am curious whether it could be specified on nif initialization that you want to run a resource dtor on a dirty scheduler vs a normal scheduler.

I also wonder if it’s possible to yield from a dtor, but it doesn’t look like it at a glance, even if you’re running on a dirty scheduler it would not be great to block for a terribly long time. However, in some cases all bets may be off if you’re resource is actually managed by an foreign library.

While waiting on someone from OTP team to respond, do you mind telling us about the length of time on average you spend tearing down a resource and what type of resource?

jhogberg · December 16, 2024, 9:14am

If the resource destructor is triggered on a normal scheduler, it will run on that same scheduler. Otherwise, if it is triggered on a dirty scheduler, it will be rescheduled to run on the first scheduler.

Since it’s not possible to yield from a resource destructor, you should generally not do anything that could block the scheduler in there.

The way I solved this problem in the file module was to register a monitor for the owning process, which when triggered sent a message to another process that in turn closed the file descriptor on the now-dead process’ behalf. Note that this was done in the monitor callback and not the destructor, and the resource was kept alive beyond the owning process’ death because of the sent message.

github.com

erlang/otp/blob/maint-27/erts/emulator/nifs/common/prim_file_nif.c

/*
 * %CopyrightBegin%
 *
 * Copyright Ericsson 2017-2023. All Rights Reserved.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 *
 * %CopyrightEnd%
 */

This file has been truncated. show original

varsill · December 16, 2024, 10:44am

Thank you very much for the answers!

I did try to trace the code a bit myself. This code hasn’t changed in later versions of erts, so it’s not just OTP 22.

Indeed, I’ve encountered my problem on OTP-26, I just believe that since OTP-22 there is a new mechanism of scheduling destructors.

I don’t believe it will always run on the first scheduler, it does try to grab the current scheduler, but clearly that won’t always happen, as there is a fallback to scheduler 1.

You are right, but erts_get_scheduler_id implementation suggests that when it’s invoked on a dirty scheduler, it returns 0, so I think that all destructors triggered on a dirty schedulers are run on normal scheduler nr 1. [EDIT] I just spotted that @jhogberg has explained it more thoroughly, thanks!

do you mind telling us about the length of time on average you spend tearing down a resource and what type of resource?

I use GitHub - libvips/libvips: A fast image processing library with low memory needs. wrapped in NIFs as provided by GitHub - akash-akya/vix: Elixir extension for libvips. What causes trouble in my case is unreferencing of GObject (GObject – 2.0).
Depending on the number of working threads spawned by libvips, it might happen that (due to the synchronization overhead) unreferencing (and potentially destroying of GObjects) requires waiting for long time. I observe high scheduler nr 1 utilization when it happens, which drives me towards an assumption that some busy waiting must be occurring there.

The way I solved this problem in the file module was to register a monitor for the owning process, which when triggered sent a message to another process that in turn closed the file descriptor on the now-dead process’ behalf.

I will definitely try that approach, thanks!

I also wonder what would happen if I waited on a system mutex in my resource destructor, causing it to yield to another OS thread. Does it mean my entire scheduler would be inactive until that mutex is released?

jhogberg · December 16, 2024, 11:01am

Yes, unfortunately.

akash-akya · December 20, 2024, 9:19am

Hi @jhogberg @starbelly, thanks for the detailed explanation. I am wrapping my head around to implement something similar in Vix to solve the issue @varsill mentioned above.

I think in case of OTP file there is a owner process and the resource lifetime is tied to that process, so we can monitor the owner and cleanup using singleton janitor process when owner dies.

But in my case the resource is not tied to any particular process[0]. I am solely relaying on ERL_TERM lifetime (GC) for the underlying object cleanup. So I can not use process monitor approach. So instead I am thinking of doing something like this:

Spawn a singleton, supervised janitor process similar to OTP. In OTP’s case we are creating a special process (erts_internal:spawn_system_process) but I guess that is not really required in my case.
Resource lifecycle would be same as before except for changes in dtor callback explained below. We still call enif_release_resource during the resource construction to pass the ownership to GC.
GC calls dtor during garbage collation. dtor callback is modified to not call the time consuming somelib_unref function. Instead it allocates a new resource with same underlying object and send {unref, Term} to jaintor process. The new resource creation should be inexpensive as the resource in my case is a very simple struct just holding the underlying object pointer.
jaintor process calls nif_unref NIF call marked as ERL_NIF_DIRTY_JOB_IO_BOUND to release the resource using somelib_unref function.

Does this makes sense? I just want to make sure if this approach is sound at high-level before going into implementation details

[0] I can not change this to wrap the resource inside a process for various reasons. I essentially want the resource to behave like opaque term which can be passed around.

jhogberg · December 20, 2024, 9:36am

That ought to work.

maxlapshin · December 21, 2024, 7:32pm

We have such code in our flussonic (media streaming server), they are related to hardware like Nvidia jetson that have async destructor.

What have we done?

We have created an erlang process that holds this destructor. It is trap_exit, not killed, etc. So it MUST NOT be killed. It monitors its owner and will run as an erlang-level destructor
when owner dies, this special process stops any processing and runs a separate function that works as a destructor. Code is run in a separate thread and holding erlang process waits patiently for a message
after this, this resource in C level is deallocated and you do not need any NIF destructor at all.

It works for many years without any problems.