Timing guarantees between `enif_send` and `receive`

jstimps · January 11, 2025, 9:08pm

From Erlang Reference Manual | Processes,

The amount of time that passes between the time a signal is sent and the arrival of the signal at the destination is unspecified but positive.

Does the same property hold true for messages that are sent from a NIF with enif_send?

I’m trying to debug some crashes I’ve seen on a production system, and I’ve identified a path through my code that would be vulnerable to a race condition if the above is true. However, I haven’t been able to reproduce the race in a dev environment.

In other words, to prove I have a race condition bug in my code, I want to induce a state where this assertion fails. (In the interest of brevity, this is all pseudo-code)

test() ->
     Ref = make_nif_request(),

     timer:sleep(1),

     % The NIF background thread calls `enif_send` sometime here

     true = flush(Ref).
 
flush(Ref) ->   
    receive
        {Ref, ready} ->
            % Normal path
            true
    after 0 ->

        % Assuming enif_send fired during the sleep, the flush would only
        % fail if there is positive time in between the `enif_send` signal and
        % the receive on the message queue
        false

    end.

Where the NIF is something like this, using enif_thread_create.

static ERL_NIF_TERM myapp_setup(...) {
    enif_thread_create(...)
}

staic void thread_fn(...) {
    enif_send(...)
}

// ...

I’ve attempted to put a large workload on a single-scheduler VM to slow down the copying of signals from the signal queue to the message queue to have the flush return false, but I haven’t been able to yet. Any tips would be appreciated!

Thanks!

rickard · January 11, 2025, 9:43pm

Yes, it is true for all signaling. The time is however usually very small internally in the same runtime system.

jstimps · January 12, 2025, 3:21pm

Thanks. It turns out, perhaps unsurprisingly, that the cause of my bug was elsewhere, unrelated to enif_send. This is still valuable info though.

For any future readers, I suggest making sure you exhaust all other possible avenues before blaming the signaling timing. I spent a day trying to force a signal delay with no luck, BEAM is solid as always.