Roadrunner - Pure-Erlang HTTP and WebSocket server

williamthome · May 6, 2026, 11:32pm

Hey folks,

Wanted to share a project I’ve been working on for a while: Roadrunner, a pure-Erlang HTTP/1.1 + HTTP/2 + WebSocket server, built from scratch with TDD as the HTTP layer for Arizona.

Why another HTTP server? Mostly because I wanted something that fits better with Arizona, faster, and with:

A small, easy-to-predict API (a handler behaviour, request/response accessors, listener controls, and a few opt-in helpers like cookies, qs, multipart, SSE, WebSocket)
Parsing that follows the RFCs: RFC 9110/9112 for HTTP/1.1, RFC 9113 + RFC 7541 (HPACK) for HTTP/2, RFC 6455 for WebSocket, RFC 7692 permessage-deflate. h2spec strict 100 % and Autobahn fuzzingclient strict 100 %, no skipped tests.
Modern OTP style: sigils, maybe, body recursion, binary keys for wire data, -doc/-moduledoc markdown, dialyzer-clean
Built-in graceful shutdown, telemetry events, per-request request_id in logger:set_process_metadata/1, and proc_lib:set_label/1 per-listener / per-acceptor / per-conn so observer trees are easy to read
Real numbers. There’s a full bench grid in the repo against cowboy and elli. Roadrunner is usually 30 to 80 % faster than cowboy. Versus elli, it’s about even or a bit faster on simple GETs and clearly wins when you need something elli doesn’t ship (router, gzip, h2, WebSocket, pipelining, etc.)

Quick taste of the numbers (req/s, median of 3 runs, 50 clients, loopback, 12th-gen i9):

scenario	roadrunner	cowboy	elli
`hello`	298 k	179 k	278 k
`headers_heavy`	235 k	118 k	211 k
`cookies_heavy`	247 k	154 k	n/a
`pipelined_h1`	501 k	329 k	4.9 k
`gzip_response`	127 k	100 k	n/a
`websocket_msg_throughput`	199 k	155 k	n/a

Bold marks the row winner. n/a means elli’s test fixture doesn’t support that workload (no router, no gzip, no WebSocket). Full grid with p50/p99 plus h2 and memory shape lives in docs/bench_results.md.

A few things to know up front:

It’s 0.x. The core works and is fully tested, but the API may change between minor versions
Needs OTP 29 (currently RC). Why? Performance and modern Erlang
Not on Hex yet. rebar3 hex publish needs runtime deps from Hex, and telemetry is still a git dep locally because an OTP 29 RC3 + Fastly TLS bug blocks rebar3 update. The fix is landing in OTP 29 RC4 (I guess), and v0.1.0 will go up on Hex right after

% deps (git for now, hex once OTP 29 RC4 lands):
{deps, [
    {roadrunner, {git, "https://github.com/arizona-framework/roadrunner.git", {branch, "main"}}}
]}.

% boot a listener:
roadrunner:start_listener(my_listener, #{
    port => 8080,
    routes => [{~"/", my_handler, #{greeting => ~"hello"}}]
}).

Feedback is very welcome: bug reports, doc gaps, perf checks on your hardware, anything. The README has the full conformance, perf, and hardening notes, plus a docs/comparison.md with the honest take.

Beep beep.

zabrane · May 7, 2026, 9:10am

Hi @williamthome,

Thank you very much for Roadrunner. This is exactly the alternative the Erlang ecosystem needed. Really appreciated.

Two things from our side:

Could you explain a bit more how the load generator works? I would like to understand the internals: how connections are driven, how requests are scheduled across the 50 clients, how throughput is computed, how latency samples are aggregated for the p50/p99 columns, and whether the loader itself can become the bottleneck on the high-throughput cells.
We have not switched to OTP 29 RC yet, so we cannot run the matrix locally for now. Could you add a wrk2-based scenario (GitHub - giltene/wrk2: A constant throughput, correct latency recording variant of wrk · GitHub, specifically wrk2, not wrk)? wrk2 runs at a constant target rate and accounts for Coordinated Omission, which means the percentile distribution actually reflects what a real client would observe under load instead of hiding the tail every time the server stalls. With the corrected latency you get from HdrHistogram, the 99th, 99.9th, 99.99th percentiles tell a much more honest story than what wrk, ab, vegeta … reports.

Localhost is fine, no need for HTTPS. The goal is to measure what Roadrunner can really sustain end-to-end. A matrix at, say, 50%, 75%, 90%, 95% of the saturation rate per scenario, with --latency (and ideally --u_latency for a corrected vs uncorrected comparison), would be very telling.

Thanks again, and looking forward to new features.

Best,

williamthome · May 7, 2026, 3:07pm

Hey, @zabrane o/

Thanks a lot for the feedback! Both items are on a local branch, merging soon.

1. Load generator internals

How the closed-loop driver works:

50 worker processes, one keep-alive TCP connection each
Send, wait for the response, send the next
Throughput is total successful requests over wall-clock time
Each worker keeps per-request nanosecond timings; the driver merges every worker’s timings, sorts, picks p50/p95/p99 by position
Near peak, the bench driver and the server end up competing for the same cores, so the driver itself can become the bottleneck

Full writeup is in a doc file on the branch.

2. wrk2 scenario

A script and a CI smoke job for this are on the branch.

For each (scenario, server), wrk2 sweeps four rates: 50%, 75%, 90%, 95% of the closed-loop peak. Each measurement uses both --latency and --u_latency, so every scenario gets two tables and you can see the Coordinated Omission (CO) gap directly. If the server can’t keep up with the target rate, the row is flagged as saturated. Localhost only, no HTTPS.

CO gap on roadrunner, hello scenario at 127k req/s (50% of measured peak):

percentile	corrected	uncorrected	ratio
p50	0.97 ms	28 µs	~35x
p99	2.01 ms	160 µs	~13x
p99.9	2.24 ms	260 µs	~9x
p99.99	2.66 ms	558 µs	~5x

That’s what the closed-loop bench was hiding.

And the three servers next to each other, hello scenario at each one’s 75% rate (corrected percentiles):

server	rate	p50	p99	p99.9	p99.99
roadrunner	190 k	1.03 ms	2.15 ms	2.37 ms	2.59 ms
cowboy	136 k	1.03 ms	2.14 ms	2.65 ms	5.45 ms
elli	204 k	1.04 ms	2.17 ms	2.39 ms	2.72 ms

p50 to p99 are tied across all three. The interesting bit is p99.99: roadrunner and elli stay tight (2.6 to 2.7 ms), cowboy widens to 5.4 ms, about 2x the others. That’s where each server’s design choice for handling connections (one process per conn vs other models) starts to show.

zabrane · May 8, 2026, 7:38am

Hey @williamthome

This is great, thank you so much. The CO gap table comparison is exactly the honest data I was hoping for, and the turnaround was impressive.

Once OTP 29 reaches a stable release, I will start benching Roadrunner internally on our own workloads. I will share our numbers back once we have run them on our hardware.

lpil · May 10, 2026, 7:52pm

Looks really cool! Nice work.

what would you say is the source of the performance improvements?

How does it compare to Elixir’s Bandit and Gleam’s Ewe? I believe they are supposed to be the fastest BEAM HTTP servers currently.

williamthome · May 10, 2026, 9:28pm

Thanks Louis!

I’d say two layers: architecture and BEAM-idiomatic patterns.

Architecture:

One process per connection running a tail-recursive proc_lib loop, no gen_server / gen_statem boundaries
Passive gen_tcp:recv on the hot path
Acceptor pool inline on gen_tcp / ssl, no Ranch
Pure-incremental-binary-matcher parser (erlang:decode_packet measured 2x slower)

BEAM-idiomatic patterns (small wins that compound):

Body recursion, no lists:reverse(Acc) accumulator pattern
Compiled binary:cp() patterns stashed in persistent_term via on_load, threaded as individual args (zero allocation on hot reads)
ASCII-only fast paths (e.g. ascii_lowercase) over Unicode-aware BIFs (string:lowercase) where the wire data is ASCII-bounded

I haven’t cross-benched directly so I can’t make a “fastest” claim. Bandit and Roadrunner are conceptually similar (pure Erlang / pure Elixir, h1+h2+WS on one listener, both pass h2spec strict and Autobahn 100%). The architectural differences are Bandit uses ThousandIsland + Plug, Roadrunner inlines gen_tcp/ssl + explicit return shapes. I’d expect them in similar territory on raw throughput, with workload-shape differences.

For Ewe I’m less familiar with the internals. Would love to see Ewe vs cowboy numbers from your side if you have them, happy to run roadrunner on the same scenario for direct comparison.

Schultzer · May 11, 2026, 9:28pm

Body recursion is a big one for high performance erlang!

Awesome job!

williamthome · May 17, 2026, 1:58pm

Hey folks,

roadrunner 0.1.0 is on Hex! OTP 29 is out, the TLS blocker is gone, so the hex package could finally land.

{deps, [
    {roadrunner, "~> 0.1"}
]}.

@zabrane the wrk2 work landed in docs/wrk2_results.md. Thanks for the nudge.

0.x caveat still applies: API may change between minor versions, so pin an exact version {roadrunner, "0.1.0"} if you need stability across upgrades.

Feedback and bug reports are very welcome! o/

lpil · May 18, 2026, 10:55am

Congratulations! Perhaps I should look into making a Gleam package for using Roadrunner

What’s left before v1.0.0?

Do you have plans for HTTP3 in future?

williamthome · May 18, 2026, 11:26am

Thanks! A Gleam package would be very welcome!

What’s left for v1.0 is tracked in the roadmap. I have more items in mind for it, and maybe not all of them will be done, but HTTP/3 is on it.

williamthome · May 26, 2026, 3:02am

roadrunner 0.2.1 is on Hex, and the big addition is HTTP/3!

{deps, [
    {roadrunner, "~> 0.2"}
]}.

What’s new since 0.1.0:

HTTP/3 over QUIC (RFC 9114), experimental for now. Enable it with protocols => [http3] on a TLS listener; it runs alongside HTTP/1.1 and HTTP/2 on the same port and advertises Alt-Svc so browsers upgrade. It is built on the pure-Erlang quic library. Thanks @benoitc!
Denial-of-service hardening: bounds on WebSocket inbound memory and on the HTTP/2 CONTINUATION header block.
Faster static file serving, HTTP/2 flow-control fixes, and graceful drain through WebSocket sessions.

etnt · May 26, 2026, 10:55am

If possible, it would be interesting so see some comparing figures with the OTP built in httpd server

williamthome · May 26, 2026, 1:51pm

Hey @etnt!

I ran roadrunner against the built-in httpd, using the same load test we use for cowboy and elli. Two httpd setups: plain httpd (a small hand-written do/1, no extra deps) and httpd + httpd_router.

Median of 3 runs, 50 clients, HTTP/1.1, loopback, OTP 29 (req/s, higher is better):

scenario	roadrunner	httpd	httpd + httpd_router
hello	292k	183k	178k
json	266k	164k	156k
echo	253k	149k	138k
routing, 100 paths	244k	150k	64k

A few friendly notes:

The built-in server is fast, and httpd_router adds almost no cost when there is just one route. Nice work.
The slow part is matching many routes: httpd_router reads and scans its full route list on every request (ets:tab2list/1), so it slows down with 100 routes. Keeping the routes ready instead of rebuilding them each time would help a lot here.
One tip if you test httpd yourself: by default it groups small TCP writes together (Nagle’s algorithm), which adds about 40ms to each keep-alive request on loopback. Turn it off with {socket_type, {ip_comm, [{nodelay, true}]}} and helloscenario goes from ~1k to ~180k req/s.

On features, the built-in httpd is HTTP/1.1 only, which is the trade-off for staying dependency-free:

feature	roadrunner	httpd	httpd + httpd_router
HTTP/2	yes	no	no
HTTP/3	yes	no	no
WebSocket	yes	no	no
routing + path params	yes	manual	yes
extra deps	telemetry	none	httpd_router

So if you want to stay dependency-free and only need HTTP/1.1, httpd is a solid pick, and httpd_router adds clean routing on top with almost no cost.

The numbers move around a bit (~15% each run), my machine was a little busy, and loopback (same machine) hides the real network cost, so take them as rough. I’m gonna add a full comparison in a PR shortly.

Cheers!

etnt · May 26, 2026, 1:54pm

Excellent, thanks!

etnt · May 26, 2026, 5:01pm

I have made some performance improvements according to your suggestions,
so if you have the time please re-run that httpd comparision

williamthome · May 26, 2026, 11:35pm

Re-ran with your latest change:

httpd_router	before	after
routing, 100 paths	64k	118k
url_with_qs	114k	137k

The 100-route test nearly doubled, and on single-route paths httpd_router is now about even with plain httpd! Nice Work o/

zabrane · June 1, 2026, 7:55pm

Hi @williamthome

Small request: would it be possible for Roadrunner to support OTP 28 as well? The rebar.config currently pins {minimum_otp_vsn, “29”}.

And out of curiosity, what drove the choice of 29 as the minimum? Knowing which 29-specific features Roadrunner relies on would help us understand whether a 28 backport is realistic.

The reason I ask: migrating our full service fleet to OTP 29 is taking longer than we planned, so we are stuck on 28 for now on the services where we would most like to try Roadrunner. Even informal 28 support would let us start much sooner.

Thanks, and great work on the project.

williamthome · June 1, 2026, 9:45pm

Hi @zabrane! Great question, and good news o/

The OTP 29 minimum was mostly “aspirational”. When I set it, I planned to use two new-in-29 things (native records and the is_integer/3 guard), but roadrunner only ended up using one of them, and there was no real reason to keep it. I took it out, so the code does not need 29 anymore.

CI now runs the tests, h2spec, and Autobahn on OTP 27, 28, and 29, all green, so the floor is now OTP 27, even lower than you asked.

It’s already on Hex, in 0.4.0, so you can:

{deps, [{roadrunner, "~> 0.4"}]}.

It’s been a few releases since the last update here (0.2.1, HTTP/3), so this one rolls up everything since. A lot of the limits that used to be fixed in the code are now listener options you can set:

HTTP/1 request and header limits under {http1, #{...}} in protocols (request line, header line, header block, header count).
HTTP/2 and HTTP/3 caps: max_concurrent_streams and the header block size for h2, the header block size for h3.
socket_backlog for the TCP accept queue, and handler_spawn to tune the handler process (for example fullsweep_after).
Better over-limit replies: a too-long request line now answers 414 URI Too Long, too-big or too-many headers answer 431 Request Header Fields Too Large, instead of a plain 400.
max_clients rejections are now visible: when a connection is dropped at the cap, a telemetry event fires, so it’s not silent anymore.
max_concurrent_requests, a new cap on how many requests run at once across the whole listener. Until now you could cap connections (max_clients) and streams per connection (max_concurrent_streams), but not the total number of requests in flight, so on h2/h3 a single connection could open many streams at once and memory could grow more than you expect. This caps that total directly. Over the cap, the stream is refused with a standard “try again” signal (safe to retry, nothing was processed), and a [roadrunner, request, throttled] telemetry event fires. Off by default (infinity).

All the knobs are listed in docs/resource_limits.md.

Thanks @zabrane for the nudge, and for the kind words. Feedback and bug reports are always welcome!

zabrane · June 2, 2026, 12:55pm

Hi @williamthome

Perfect, thank you. OTP 27 is even lower than we asked, that unblocks us right away. Pulling in ~> 0.4 now and will share benchmark numbers from our hardware once we have run them.

Thanks for the quick turnaround o/

williamthome · June 2, 2026, 10:25pm

0.5.0 is out on Hex o/

0.4 made a lot of fixed limits into options you can set. 0.5 makes the server actually check those limits, and fixes small spec and safety bugs across HTTP/2, HTTP/3, WebSocket, and SSE.

{deps, [{roadrunner, "~> 0.5"}]}.

What’s new since 0.4:

HTTP/2 flow control: now follows the spec more closely, so bad or oversized data is handled the right way
Header size limits: respected in both directions, what the client asks for and what the server allows
HTTP/2 streams: cleaner handling of closed and reset streams, with no more hangs if the connection drops mid-stream
WebSocket frames: broken and odd frames are handled more safely
Request bodies: the max_content_length limit now also applies to chunked uploads, plus small safety fixes in SSE and header parsing
Extras: a new max_streams_bidi option to limit HTTP/3 streams, and streaming connections now answer OTP system messages

h2spec and Autobahn are still green on OTP 27, 28, and 29.

Feedback and bug reports are always welcome o/