Erlang io-uring support

LaF0rge · October 22, 2024, 7:56am

thanks for your generous offer. Sadly I’m very busy with many projects and I don’t think could be personally available for the implementation work, but at my company I have a team of C-language developers who also happen to have some Erlang background. We also have limited availability in general, but this is a reasonably small [and in my opinion important] project, so I think we could work on it. I’m not sure if USD 10k are sufficient, though, given the size of the task.

What would be most useful is to know what kind of test suite the Erlang/OTP/BEAM folks are using to test functionality + performance of the BEAM socket I/O.

peerst · October 24, 2024, 6:35pm

Re-checked the Wikipedia page and they write that now. In the current state of affairs should we really go down a path which very much looks like getting performance at the cost of security? Or if someone really needs this kind of performance maybe reconsider FreeBSD where you have kqueues which are very well integrated with the very good kernel security measures.

Security

io_uring has been noted for exposing a significant attack surface and structural difficulties integrating it with the Linux security subsystem.[10]

In June 2023, Google’s security team reported that 60% of the exploits submitted to their bug bounty program in 2022 were exploits of the Linux kernel’s io_uring vulnerabilities. As a result, io_uring was disabled for apps in Android, and disabled entirely in ChromeOS as well as Google servers.[11] Docker also consequently disabled io_uring from their default seccomp profile.[12]

scherrey · October 25, 2024, 5:30am

Our team was building a multi-way live performance streaming service a few years ago and built on top of io-uring (C++/Linux) with excellent results. The security issues seem to generally be if you’re running code in a shared environment where foreign code could be running. My experience is few BEAM deployments fit that and there are already tons of assumptions within Erlang/BEAM that presume a secure environment in terms of security. This being the default case in telecom.

So I think a build option with io-uring makes a lot of sense for BEAM.

Maria-12648430 · October 28, 2024, 11:13am

Disclaimer: I’m mostly impartial to this. OTOH, the use cases I work with don’t include a lot of io. OTOH, I won’t complain about performance improvements should I need them. tl;dr: this is just my 2ct.

I find this article, How to handle people dismissing io_uring as insecure? · axboe/liburing · Discussion #1047 · GitHub, objectionable.

For one, you should not want to handle people but their concerns. I’ll exaggerate a little, but this question sounds like asking for ad hominem arguments.

For another, one of the answering posts complains that Google paid people for finding bugs, framing it as unfair. But the bugs were there, no? Ie, Google did not pay for making the bugs. And personally, I prefer it if the (arguably) good guys find the bugs, being paid for it or not, instead of the bad guys.

Regarding the argument most/all of the found bugs having been fixed. This doesn’t mean that all have been found, or that it is secure now, it only means that it is now less insecure than it was before. The bugs that were found and fixed are probably the relatively easy to find ones, but there are probably still many left, only harder to find. This says nothing about their severity, though.

But so far, I’m not against implementing it, as long as its usage is optional, in the sense of an opt-in (vs opt-out) option.

Note however that having another io backend means that there has to be someone who maintains it in the foreseeable future (on top of the existing ones), not only implements it now and be done with it.

Generally, yes, personally I’d much prefer that. However, I’d argue that not everybody has the option to take his/her pick of the OS.

peerst · October 28, 2024, 12:11pm

Well yes, but if people would approach their choice rationally the performance of kqueues and excellent security of FreeBSD which doesn’t have to be compromised for performance should be a an important point. Otherwise I would say, maybe this performance is not that important.

This is also important for this community because the same non-rationaly decision making is hurting our own adoption. We are in the same kind of boat than e.g. FreeBSD: the superior solution which is niche.

Maria-12648430 · October 28, 2024, 12:16pm

Note that you don’t have to convince me I like FreeBSD, very much, I just virtually never get a chance to use (or decide to use) it

peerst · October 28, 2024, 1:17pm

Was aware of that from what you wrote before. But ultimately the stakeholders who will ask for better performance are often also the one that say that certain technologies are given. So what we that need to make that happen can confront them with data where we show the performance on FreeBSD and on Linux (with comparable security profile which excludes io_uring in its current state).

juhlig · October 29, 2024, 8:47am

In my experience, companies rarely start out with FreeBSD, or have it on their agenda when picking the underlying infrastructure.

As you said, FreeBSD, while being often the superior solution, is also a niche solution. People who can skillfully handle such a niche solution are, by and large, also niche, basically meaning “hard to find and hire”.

So the usual argument that leads to a preference of Linux over FreeBSD, “if we use FreeBSD, we run a higher risk: if the one (expensive) guy who knows FreeBSD leaves, we probably won’t be able to replace him immediately; people who know Linux OTOH come by the dozen for a dollar, so if one of them leaves, who cares?” - And this is true, and therefore hard to overcome. I guess the same applies for Erlang vs, say, Go

peerst · October 29, 2024, 9:34am

Agree. There is a lot of commonality between FreeBSD and Erlang in this regard.

One more thing which is ignored by many in the hiring question: the superior niche solution has less volume of people but usually higher quality. So it is actually easier to hire good people.

eproxus · October 31, 2024, 4:27pm

I’m not at all knowledgeable in this area but perhaps there are some interesting nuggets in here (as Erlang too deals with asynchronous IO): https://tonbo.io/blog/async-rust-is-not-safe-with-io-uring

And some further discussion here: Async Rust is not safe with io_uring | Hacker News

maxlapshin · November 7, 2024, 7:15pm

The same thing. Our streaming server is deployed as the single application on server, nothing else. So all security is made only inside our code.

tsloughter · November 9, 2024, 12:18pm

This is awesome to see none the less. Even if its not happening immediately. Maybe more can chip in with @zabrane to get a sum that matches the necessary time and resources needed for the task.

sneako · November 10, 2024, 10:08am

Maybe @zabrane can try to put together a budget proposal so that we have a rough idea of what amount would be required and more of us could pitch the idea of contributing funds to our employers and maybe even the EEF as well.

zabrane · November 10, 2024, 10:45am

Hi All

Initially, I’ve got this budget to extend the excellent SQLite NIF driver to support RAFT but the author @mmzeeman was busy.

The budget is of $10K USD. Don’t know if the company I work for has to be part of EEF or not.
Just let me know how to proceed.

starbelly · November 10, 2024, 4:06pm

Yes, this makes sense as an EEF stipend. I think it’s contingent upon the OTPs willingness to accept such work in conjunction with their current timeline (maintenance is a thing too).

starbelly · November 10, 2024, 5:22pm

Being part of the EEF is the thing to do period That said, I think some of us that currently part of the EEF can discuss and contact you.

zabrane · November 11, 2024, 11:20am

@starbelly Thanks. Please send me an email in private.

sneako · November 11, 2024, 2:22pm

Ah sorry, I tagged the wrong person, I meant, it would be great if @LaF0rge could estimate how much his company would need in order to take this on!

@zabrane I really appreciate your initiative and hope to get my employer to contribute as well.

KayEss · November 20, 2024, 4:43am

My company is also interested in this, and as I’ve done a load of io_uring work I thought I’d look into it.

The big problem I see is just the sheer amount of work it would take.

Making some simplifying assumptions just to get things started:

Only look at the networking IO side of io_uring. It’s unclear if the file IO APIs in OTP already support an async mode, but in any case, let’s start somewhere.
Don’t try to use the shared buffer support in io_uring as this is a much more complex use of those APIs if OTP doesn’t arleady support the concepts (and even if it does, let’s start keeping things simple). To start with we should get good results just from supporting the normal application buffers for networking IO. This should still help with lots of small IO requests, especially for things like UDP.

The Windows asyncio implementation is 11,5KLOC. Just a single API, accept, is ~500LOC. If the io_uring implementation could be only 5KLOC (probably not realistic?), and it were possible to write 100LOC per day (also not realistic, at least not for me, who would still have to learn the internal BEAM APIs needed), that’s stil the best part of 3 months work and it doesn’t seem remotely realistic that the work required would be as small as that.

Idea of a plan:

Fork the Windows asyncio backend and:

Give it a new name and integrate into build system
Comment out all Windows APIs leaving in only error returns
See that this can load into OTP

Try to run tests against the new backend, filling in the implementation as the tests call for it.
Go through and add new tests if/when there are gaps
Build some benchmarks (in a separate repo) that exercise a few scenarios to try to get an idea of any performance gains. Add in some performance optimisation based on the findings.

Is there any way to greatly simplify this so that it could be done in a manageable amount of time? I do note that the epoll/kqueue implementation appears to be a more reasonable 2.6KLOC (erts/emulator/sys/common/erl_poll.c), but I believe this would mean choosing io_uring at compile time and I assume there’d be zero chance of landing that in OTP.

Appendix/Musings:

It looks like async completion of IO happens by sending a message to the Erlang process that initiated the request (is this correct?). Normally I like to use one ring per OS thread (using a thread local), which means that the completion for the IO would naturally come back to the thread that initiated the request, but I expect this wouldn’t really matter.
It apears that there is already support for associating a completion data structure with an IOP on the Erlang side. These really need to be in some sort of memory pool for these structures, but that’s also something the could be done in a future refinement after the basics are working.

KayEss · November 27, 2024, 6:20am

Just wanted to follow up on the code sizes with a more radical idea.

I have a C++ io_uring library https://github.com/Felspar/io, which relies on a coro library to do the sorts of async operations that Erlang does https://github.com/Felspar/coro.

The total code size for both repositories is about 4KLOC (which includes all the async machinery, poll/WSAPoll and io_uring implementations). It doesn’t post cancellations in the io_uring for abandoned IOPs (like it looks like the Windows async code in OTP does), but I would expect to be able to solve that in maybe a dozen or two dozen lines of code. I think the big difference is simply because it’s much easier to abstract in C++ than in C making the Windows async code much more verbose than it would need to be. The Windows async code for accept is about the size of the entire io_uring implementation.

From what I see of the Windows code it is a separate Erlang module. Would it be possible to implement a proof-of-concept io_uring module in a separate repo and have it take over all of the BEAM networking with a view to integrating it to OTP later on? If so I think this might be manageable.

I know that there are already some C++ modules in OTP. Maybe this could live there as another one?