Erlang io-uring support

Speaking of fast connections, has there been anything looked in to for the io-uring linux interfaces (falling back to the old methods on older kernels)? It’s been shown to have quite some advantages in a lot of contexts over traditional patterns. I’ve been using it (outside erlang) for a few months now and it is quite nice when trying to eek out performance when the I/O interface actually becomes a bottleneck (can have much fewer syscalls!).

7 Likes

We have looked at it and realized that it is a lot of work to get done. I’m sure we’ll get around to it eventually, but not any time soon.

5 Likes

Yeah that it is, it is a significant change from the older API’s, and not always worth it for some work (though not worse by any stretch), but it definitely is a lot faster in some things (especially multi-waits).

Even just a low level binding exposing it could be useful though.

3 Likes

We did too, as it closely reminds FreeBSD’s kqueue. But one of the major benefits with io_uring is to also avoid extra read/write syscalls, and integrating this into ERTS appears to be challenging.

2 Likes

I’ll just stay on FreeBSD :wink:

Yes in theory io_uring could be faster than kqueue. But comparing this is easily driving into apples and oranges area.

But for all who care about this kind of IO performance: move to FreeBSD today, kqueues are supported by Erlang since a long time and are well tested and around for long. Will not get started with other FreeBSD advantages to avoid this degrading into a Linux/BSD flamewar.

Its just about being pragmatic: there is a choice available today which will improve your performance vs. some feature which might be coming later if at all (since it doesn’t fit nicely to ERTS).

The only price is that you need to drop Linuxisms from your surrounding infrastructure, but improving portability among Unix alikes is a nice improvement overall because it makes one flexible. This Unix compatibility thing was once a major industry effort until Linux came along and got dominant and for many seem to be the only solution

4 Likes

I would rather improve ERTS to support io_uring than try to bring another OS into tight corporate environment. Plus, it benefits the entire community.

6 Likes

Fortunately my company was always on BSDs starting with commercial BSDI (which led to me working for them until they were bought by WindRiver) the FreeBSD. Knowing kernel internals pretty well. Therefore we stayed on FreeBSD as the default choice of Unix. And even the TCP (and USB) stack on GRiSP is from FreeBSD.

Agree that its hard to introduce in large corporations (like its hard to introduce anything) but here are also many at smaller shops or just starting. And for them its a good choice for better Erlang I/O performance via kqueue which is available right now.

Tip to small companies: don’t go always with the default tools from the large corporations, sometimes they can slow you down and can prevent disruptive potential.

Once upon a time, there was this startup serving superior chat to half a billion users on a bunch of FreeBSD servers running Erlang. Seemed to have served them well to get there and get bought :wink:

10 Likes

Aren’t WhatsApp using FreeBSD anymore? Must admit that was the primary reason I was interested in trying it one day :lol:

If you’re not @max-au did you notice much difference when you changed? As much as I think FreeBSD is awesome, it’s definitely a good thing for Erlang if WhatsApp are still managing to get the same sort of performance by running on Linux :003:

2 Likes

They moved to Linux at some point after being bought by Facebook. Reason being, as best I recall, that Facebook thought it sensible to have homogenous hardware (they already had their own datacentres with many relatively low power Linux servers vs a relatively few high power FreeBSD servers that WhatsApp had grown up with). It caused a few challenges to migrate.

This is all gleaned from a talk from one of the WhatsApp folk and comes with the caveat that I may have misremembered.

Edit: Found a link. Great talk by @max-au !

6 Likes

Nice find Phil - I missed that talk!

That’s even more interesting - WhatsApp runs on servers capped at 32GB of RAM!!

I’d love to see a book from the WhatsApp team - I bet we’d learn so many cool things!

7 Likes

I’ve looked inside io_uring and tried to make a simple driver for erlang.

Yes, there is lot of work and it seems that io_uring should be instead of epoll if we want to achieve lockless design without mutex on each step.

3 Likes

It seems more language runtimes are adopting it becauuse of the performance benefits. I am not deep enough on the subject but it seems this is the future of I/O on Linux.

I believe we would not benefit that much (8x in throughput would make the BEAM the best in class probably…) but it seems like something worth of investment at least.

For now I’ll have to be glad to spectate on the matter :slight_smile:

9 Likes

I’d like to pick up this topic here again, as I was encouraged to do so by some members of the Ericsson OTP team (@ingela and @bjorng) whom I had the pleasure of meeting yesterday. I also did keep wondering for a couple of years why beam didn’t support io_uring, after basically the entire industry having seen how much it can speed up I/O - and apprently also other environments like Ruby or node.js having gained support for it.

Some background: I used to be a Liunx kernel network develpoer in a former life. Today that’s not my career anymore, but I still have strong related background knowledge and am following relatively closely what is happening in that domain. And for regular userspace process I/O, io_uring is nothing short of being the biggest game changer I’ve seen in my 30 years of following Linux.

These days I’m leading the development of Osmocom (open source mobile communications), where last year we develop a lot of [primarily signaling, but also control] plane cellular communications software, which is usually handling lots of sockets with small message send/receive. A lot of this is legacy C code. When looking at profiles of production deployments, the vast majority of CPU cycles was spent on systemcall entry and exit around socket based I/O calls. This is a typical pattern for this type of application. And system call entry/exit is among the most expensive operations you can do from a program, as it means context switch from kernel to userspace, and you loose a lot of cache locality, etc.

Last year we’ve ported a lot of our I/O from classic send/rect/sendmsg/recvmsg etc. system calls over to io_uring. We saw a > 30% CPU cycle reduction in the above-mentioned production deployments just from the switch to io_uring in libosmocore.

I think beam and many types of applications running on beam would equally benefit significantly from using io_uring on Linux. It helps both with reduction of CPU utilization but also even with throughput. I’ve developed a couple of load generators (for RTP and GTP-U) based on io_uring (see osmo-mgw/contrib/rtp-load-gen at laforge/rtp-load-gen - cellular-infrastructure/osmo-mgw - Osmocom gitea and https://gitea.osmocom.org/cellular-infrastructure/gtp-load-gen), and they achieve rather amazing throughput for something that’s just a normal application program, and not an in-kernel packet generator or a kernel bypass technology like DPDK or VPP.

Disclaimer: While I do have plenty of C and Erlang development knowledge, I have no background on the beam itself.

I’ve briefly at the beam source code

I don’t know much about beam, but at least erts/emulator/nifs/common/prim_socket_nif.c
seems to support already multiple io_backends, and you can find a comment

/* =======================================================================
 * Socket specific backend 'synchronicity' functions.
 * This type is used to create 'sync' function table.
 * This table is initiated when the nif is loaded.
 * Initially, its content will be hardcoded to:
 *   * Windows:      async (esaio)
 *   * Other (unix): sync  (essio)
 * When we introduce async I/O for unix (io_uring or something similar)
 * we may make it possible to choose (set a flag when the VM is started;
 * --esock-io=<async|sync>).
 */

And indeed, there’s
erts/emulator/nifs/unix/unix_socket_syncio.c vs erts/emulator/nifs/win32/win_socket_asyncio.c

So apparently async_io is already used on windows, hence

  1. erts/beam already seem to understan the notion of asynchronous I/O, but doesn’t use it on Linux (traditionally that would have meant posix + linux AIO which didn’t support sockets, but io_uring has changed that since kernel in 5.1

  2. supporting io_uring might make Linux actually look closer to Windows from a beam point of view!

You’d still have to keep the existing unix_socket_syncio.c for compatibility with other POSIX/Unix flavours, and for Linux systems that either use a kernel too old, or who disable support (admittedly, there were a number of security issues at least in the early development of io_uring, making some people disable it).

In any case, I do strongly believe there is an excellent opportunity to significantly accelerate [not just socket] I/O for any software running on beam.
There is plenty of excellent io_uring documentation out there, including Lord of the io_uring and of course the original paper and also this Awsome io_uring link collection

If anyone ever decides to follow up on this and bring io_uring to beam: Feel free to reach out with any related questions.

15 Likes

So one of my co-worker tells me that one of main reasons we have not prioritized working on io_uring are the security concerns mentioned here: io_uring - Wikipedia

3 Likes

I would argue pretty much any significant new feature runs at risk of introducing new bugs, including those that are security related. Admittedly, io_uring has had a rather bumpy start. However, all the known ones have been closed quickly by the maintainers.

Also, it is ultimately a decision of the entity operating a given system whether or not they want to use io_uring or not: It’s a policy decision…

I’m not suggesting beam should mandate the use of io_uring, but offer it as an optional I/O backend, on systems that have it enabled/available. It for sure is available on Debian and Ubuntu, RedHat recommends it and I’ve seen reports it’s also available on major cloud offerings. And Meta (the employer of the main io_uring developer) is using it in production.

See also How to handle people dismissing io_uring as insecure? · axboe/liburing · Discussion #1047 · GitHub which gives also some backgorund about several Google specifics regarding the Google decision to disable it on their [internal] servers + Android.

As all systems mature over time, inevitably more [security] bugs are fixed. I don’t think it’s an argument to hold back beam support. It can be developed and tested now, and while some people will disable io_uring as a policy so far, it’s extremely likely that further down the road Google or others will reconsider.

6 Likes

It was some time ago we last evaluated it so I am no saying things can not have changed, only conveying what was considered back then. We will see what happens in the future, but should we reconsider this would be a more long term goal, short term plans are already set. That said anyone feel they have some relevant information on the subject are welcome to contribute to input on the subject.

Also looking back in the thread, it might be a lot of work and that of course will also weigh into the equation.

3 Likes

as far as I understood the io_uring documentation, it can require full migration from epoll to io_uring. So it means having two implementations.

About security: it may be ok, if we speak about 30% of optimization =)

1 Like

Yes, just like having a different windows and posix implementation; the architecture for multiple I/O backends exists.

As stated, the rate of security issues discovered was in earlier versions of the code. Also: The 30% less CPU cycles were of an entire (GSM Base Station Controller) application. If you’d run a low-level I/O benchmark that does nothing but I/O, the speed-up would of course be much higher than that.

4 Likes

A couple of years ago, we (Erlang/OTP) had an implementation
for Unix, but we also needed socket support on Windows. The best
bet there was to base it on I/O Completion Ports.
We also discussed, for the future, a similar solution for Unix,
based on io_uring.

But the Security section in the wikipedia page for
io_uring (io_uring - Wikipedia)
is not encouraging.

We could of course implement it anyway, but let the
use (of it) be optional (for instance, via a command
line option; --esock-io-backend=syncio|asyncio).

So, I suppose it is a question of priorities; implement it
despite the security concerns (in preparation for a future
when these concerns have been resolved) or put our
resources to something else.

1 Like

If the OTP team is willing to drive @LaF0rge through the intricacies of Erlang source code, the company I work for is ready to sponsor the implementation of io_uring.
I’ve negotiated a $10K sponsorship budget available immediately.

6 Likes