Gen_statem: looping with {next_event, internal, _} blocks calls

ingela · May 8, 2024, 11:50am

Yes and I guess that you know that a synchronous client call the is gen_statem:call or gen_server:call never has required the server side to be synchronous. You can use gen_statem:reply or gen_server:reply to answer later possible from a different state or an other stage.

raimo · May 8, 2024, 12:30pm

Are those Kafka functions? Is there a surveillance possibility to, something like monitoring the request, so you can know if it will never be answered?

It sounds like the root problem is, as said, that the call is Kafka synchronous. Unfortunately it tends to be a bit harder to cover all cases when doing it asynchronously…

I’d say we are looking at two different but related possible features here:

The possibility to generate events that arrive now, but not immediately; a general time-out cannot be used because it is either immediately (0), or in 1 ms or more.
The possibility to “poll” for external events like with receive ... after 0.

Case 1) can already be done with self() ! Msg and the info event type. The same can be said about general timers. There is a timer module that you can use. The {timeout, Name} event type is a convenience feature to simplify handling timer references. We could add something similar for general events to arrive “now”.

Case 2) cannot be done with gen_statem, but with gen_server.
The question is how useful/essential that feature really is… And depending on exactly how case 1) is solved, it might also solve case 2).

For case 1) we need to decide how to queue the events. The most straightforward way would be to send them as messages so they are handled and prioritized just lika any other message. If we have an internal queue we need to implement a priority order, and I guess that however we implement that it will be surprising, at least when you mix in sending messages to yourself.

If gen_statem would receive after 0 until there is no external message the external messages may starve out our appended events.

If gen_statem would alternate between the appended messages and the process mailbox we might get event ordering surprises.

I think the least surprising possibility would be to handle appended events roughly like timers, i.e let the gen_statem engine handle references and send reference tagged messages with self() ! {Ref, Event, Content}. Message ordering becomes no surprise, and it would feel like next_event, but prioritized like external events.

This doesn’t solve how to “poll” for one external event, though…

rlipscombe · May 8, 2024, 1:17pm

I’ll drill more into my specific use case, addressing your questions and one of Ingela’s points:

There’s an interaction between two processes: a “connection” and a “consumer”.

The actual network call to Kafka is asynchronous – send a request, use {active, true} to wait for a response. This is done by a “connection” process that wraps the socket. It’s implemented as a gen_statem and it exposes a synchronous interface: the caller does a gen_statem:call and the connection process uses (as Ingela suggests) gen_statem:reply when the response comes back from the Kafka broker.

From the point of view of the caller, it’s synchronous.

This makes it extremely useful (and readable, in simple scripts) for dealing with low-level Kafka API messages.

Naively, you can (and I did) simply loop, doing something like FetchResponse = kafka_conn:call(C, FetchRequest) repeatedly. This is the “consumer” process – it’s polling for messages on a given Kafka topic. Important here is that Fetch requests have a timeout – the broker won’t respond until either some messages are ready or the timeout expires (or there’s an error).

I originally did this with a gen_statem and the continuous {next_event, internal, fetch} mechanism, discussed at the top of the thread.

This had the problems discussed in (#1):

Infinite loop: because the next_event, er, events are higher-priority than normal messages, it didn’t respond to gen_statem:stop() or gen_statem:call() or 'EXIT' signals (from the linked connection process).
Because it blocked in the kafka_conn:call, it would have been laggy when responding anyway.

So, knowing that kafka_conn is a gen_statem, I made the synchronous gen_statem:call asynchronous by using gen_statem:send_request/4 and gen_statem:check_response/3, allowing the “consumer” process to handle stop, call, EXIT, etc. (#3)

But I’m having problems with that: when I get the response, I do the next_event thing, which means I miss when the connection process dies, and the consumer process crashes with noproc when it makes the next call to the connection.

However, thinking about it, that’s a separate problem – if I were to immediately issue the next call, I’d probably see the same – I mean: the process died asynchronously, so I’ve gotta deal with it anyway, right?

The reason for the long thread is that there just doesn’t seem to me to be a comfortable way to implement a continuous loop with a gen_statem which allows for external messages to be processed – without one of the following: introducing some async calls; sending messages to oneself; or (non-zero) timeouts.

And maybe that’s fine. Maybe this is a scenario where I shouldn’t be using a gen_statem.

raimo · May 8, 2024, 2:21pm

Maybe.

On the other hand, had you used a gen_server and it’s timeout mechanism, I suppose you wouldn’t have noticed any problem since its timeout 0 does a receive ... after 0.

So there is something that cannot be done with gen_statem, but with gen_server (and gen_fsm, presumably), which bugs me…, and that is polling external events.

This is a corner case, since the basic idea of a state machine, and for a gen_* behaviour, maybe even more than for a generic process, is about reacting to external events when they happen, and sit waiting in a receive statement in between… It is bad style to block for 500 ms not responding to system messages. So I don’t know if this problem deserves a solution.

rlipscombe · May 8, 2024, 2:55pm

The fact that my example blocks for 500ms is not good, agreed. But it’s a red herring: it could have blocked for only 50ms. The problem is that looping over something that doesn’t use messages is hard/ugly (delete as appropriate) to do correctly in a gen_statem, and I initially reached for the wrong “solution” – using next_event.

But: we’ve looked at a bunch of different options in the thread, so there’s something useful for everyone here.