{message_queue_data, off_heap} use case

pablopla · February 24, 2022, 9:01am

I have 1 gen_server receiving messages from 100 processes and broadcasting every message to 1,000 processes. Each of the 100 processes sends 100 messages per second to the gen_server.
Will using off_heap help on the gen_server help with receiving from the 100 processes but slow down sending to the 1,000 processes?
Should I set`message_queue_data=off_heap on the gen_server?
Should I set it on the 100 sender processes and 1,000 receiver processes?

rickard · February 25, 2022, 3:06am

off_heap message_queue_data on a process won’t effect sending of messages from that process, only reception of messages to that process.

When on_heap message_queue_data is used, the data in the message queue will be part of any garbage collections made of that process. With a huge message queue this can be very costly since the whole message queue needs to be inspected. When off_heap message_queue_data is used, the message queue can be completely ignored by a garbage collection of the process, which will be a huge benefit if you got a huge message queue when a garbage collection is triggered. With off_heap message_queue_data enabled, the process will also be able to receive signals in parallel from multiple senders as of OTP 25.

Yes help reception, but no it won’t slow down the sending.

Probably yes

Not on the 100 senders, at least not due to this scenario. Probably on the 1000 receivers.

pablopla · February 25, 2022, 4:38am

Thank you for the answer.
What is the trade-off when enabling off_heap?
off_heap skips garbage collection on the message queue and enable receiving parallel signals from multiple senders but at what cost? Why shouldn’t I enable it to all the processes in my app?

rickard · February 25, 2022, 5:17am

The whole operation of passing a message becomes a bit more expensive when off_heap message_queue_data is enabled. I don’t remember the exact figure, but when measuring on this at the time of the introduction of the off_heap message_queue_data feature it was a few percent slower. That is, for processes never getting large message queues there will only be a cost associated with off_heap message_queue_data. This is also the reason why it became an option and not just the new way of passing messages between all processes.

I’d say that if there is a risk that a process may get a very large message queue, you want to use off_heap message_queue_data on it. This since the performance loss may be drastic if a garbage collection is triggered at an unfortunate moment. If the process is not at risk of getting a large message queue, you don’t want it.

pablopla · February 25, 2022, 5:33am

I’ll try measuring the message queue size with erlang:process_info(Pid, message_queue_len) in production and use off_heap when needed.
What is considered a large message queue? 100? 1,000?

rickard · February 25, 2022, 5:58am

It is very hard to give a limit, so I don’t think I will do that. The cost for the garbage collection will increase as the size of the message queue grows. This also depends on the data size of the messages. As the cost of the garbage collection grows, the risk of getting an even larger message queue also increase. This since the process cannot remove messages from the message queue while it is garbage collecting. Personally, if I feel that there is more or less any risk of a buildup of messages, I will go with off_heap message_queue_data. I don’t want to balance on a knifes edge at the risk of ending up with a huge cost for a garbage collection even if such a scenario would happen very seldom. Where I answered “probably” in my first answer, I would have gone with off_heap message_queue_data without thinking twice about it.

pablopla · February 25, 2022, 6:08am

Thanks.

max-au · March 2, 2022, 3:27am

We utilise a simpler heuristic (using a custom patch that I still haven’t submitted upstream).

Essentially, any process that accumulates enough messages to enter a “death spiral” (when GC becomes so expensive that message receive/processing performance falls below the rate at which messages are enqueued) is switched to off_heap mode.

We detect this by monitoring enqueue/dequeue rate (provided with the aforementioned patch). BTW @rickard what do you think of it, is it something OTP team may accept? (I can think of a simplified form with just two counters, enqueued and dequeued, and providing no signal order guarantee like message_queue_len).

rickard · March 3, 2022, 12:44pm

I have been thinking about a setting for message_queue_data which automatically switch over to off_heap once the message queue pass over some predetermined length. I don’t think the condition needs to be more complicated than that, since all that is needed to risk enter the “death spiral” is that a long message queue appears.

There is however one improvement that needs to be done prior to that. Namely yielding when converting an on_heap message queue to an off_heap message queue. Currently the whole message queue is handled at once which may cause problems hogging the scheduler for a long time. I have begun implementing this yielding, but the work has stalled since other higher priority work popped up. The initial plan for a yielding conversion was to introduce this in OTP 25.0 which unfortunately wont happen, but it may perhaps be ready for OTP 25.1. Until the yielding conversion has been implemented, I recommend not changing the message_queue_data on an executing process.

We can introduce new process_info() options if we find them useful. However, when it comes to this specific problem I think an automatic switch over to off_heap as described above is preferred since it will be much cheaper than having to poll the state of the process.

BTW process_info(_, message_queue_len) will enforce the signal order guarantee, but the implementation might read the information directly from the other process if it does not break the signal order which I guess is what you referred to.

max-au · March 3, 2022, 3:09pm

So the “message_queue_data” can be extended to be on_heap | off_heap | {LowWatermark :: pos_integer(), HighWatermark :: pos_integer()}. Or even without the Low Watermark, meaning, there is no automatic revert to messages on the heap - and let the user to manually return to on_heap.

We can introduce new process_info() options if we find them useful.

There are two options that I am interested in. First is exposing parent process (the information is already there, but not available via process_info, leading to bolt-on solutions like saving parent process PID in the child process dictionary
Second is total number of messages ever sent to the process. This is the usual part of our monitoring routine - we often detect some unexpected traffic this way. The reason we use a VM patch is to have this feature also enabled for OTP processes. One example is ERTS literal area collector process, if it starts receiving large amount of messages, we understand that some code purge or persistent_term is going too wild.

The patch I am talking about also counts how many signals were dequeued by the process. Technically this is a duplicate of “enqueued_messages - message_queue_len”, and I think I’d be fine without it.

To summarise, I’m looking to make a PR for:

process_info(_, parent) returning a parent PID
process_info(_, messages_enqueued) that returns a number of messages ever enqueued (here I’m not sure whether I want to count signals or not, likely not, it’s a VM-level thing).

Would that be accepted?

rickard · March 3, 2022, 4:20pm

Automatically going back to on_heap based on message queue length might cause frequent oscillation between off_heap and on_heap which you really want to avoid since the amount of work to do when going from on_heap to off_heap is quite large (proportional to the amount of data in the queue). An alternative could perhaps be to go back to on_heap if the message queue never reach above a certain limit during a certain time period. We would still risk oscillation, but it would at least be possible to limit how frequent that could occur.

If it does not introduce any extra costs when not used, it is not that problematic to introduce new options. A straightforward implementation of messages_enqueued option will at least introduce an extra 64-bit word on each process structure in the system regardless of whether the user is interested in this or not. That is, is this of enough general interests to justify the cost? We need to discuss it internally here at OTP before saying yes or no. Make a github PR or issue and we’ll take a look at it.

max-au · March 4, 2022, 4:17am

Automatically going back to on_heap based on message queue length might cause frequent oscillation

That’s what I also recognised, and therefore thought of “low” and “high” watermarks: process goes off_heap when there is HighWatermark messages in the queue, and returns to on_heap only when amount of messages falls below LowWatermark. Setting the low watermark to 0 means “never go to on_heap”, setting to 1 means “only when there are no messages at all in the queue”, etc…

Make a github PR or issue and we’ll take a look at it.

Will do. The patch does slightly more, as it has 4 counters + timestamp per process, but it demonstrates the idea. I will make a simplified version of it.

I’m not sure whether I want to have this “messages_enqueued” counter to honour signal order. The implementation would be quite complex, yet the result won’t make a significant difference.

max-au · March 6, 2022, 1:55am

Make a github PR or issue and we’ll take a look at it.

Here you are: [erts] Expose parent process via process_info(_, parent) by max-au · Pull Request #5768 · erlang/otp · GitHub