Roadrunner - Pure-Erlang HTTP and WebSocket server

Hey folks,

Wanted to share a project I’ve been working on for a while: Roadrunner, a pure-Erlang HTTP/1.1 + HTTP/2 + WebSocket server, built from scratch with TDD as the HTTP layer for Arizona.

Why another HTTP server? Mostly because I wanted something that fits better with Arizona, faster, and with:

  • A small, easy-to-predict API (a handler behaviour, request/response accessors, listener controls, and a few opt-in helpers like cookies, qs, multipart, SSE, WebSocket)
  • Parsing that follows the RFCs: RFC 9110/9112 for HTTP/1.1, RFC 9113 + RFC 7541 (HPACK) for HTTP/2, RFC 6455 for WebSocket, RFC 7692 permessage-deflate. h2spec strict 100 % and Autobahn fuzzingclient strict 100 %, no skipped tests.
  • Modern OTP style: sigils, maybe, body recursion, binary keys for wire data, -doc/-moduledoc markdown, dialyzer-clean
  • Built-in graceful shutdown, telemetry events, per-request request_id in logger:set_process_metadata/1, and proc_lib:set_label/1 per-listener / per-acceptor / per-conn so observer trees are easy to read
  • Real numbers. There’s a full bench grid in the repo against cowboy and elli. Roadrunner is usually 30 to 80 % faster than cowboy. Versus elli, it’s about even or a bit faster on simple GETs and clearly wins when you need something elli doesn’t ship (router, gzip, h2, WebSocket, pipelining, etc.)

Quick taste of the numbers (req/s, median of 3 runs, 50 clients, loopback, 12th-gen i9):

scenario roadrunner cowboy elli
hello 298 k 179 k 278 k
headers_heavy 235 k 118 k 211 k
cookies_heavy 247 k 154 k n/a
pipelined_h1 501 k 329 k 4.9 k
gzip_response 127 k 100 k n/a
websocket_msg_throughput 199 k 155 k n/a

Bold marks the row winner. n/a means elli’s test fixture doesn’t support that workload (no router, no gzip, no WebSocket). Full grid with p50/p99 plus h2 and memory shape lives in docs/bench_results.md.

A few things to know up front:

  • It’s 0.x. The core works and is fully tested, but the API may change between minor versions
  • Needs OTP 29 (currently RC). Why? Performance and modern Erlang
  • Not on Hex yet. rebar3 hex publish needs runtime deps from Hex, and telemetry is still a git dep locally because an OTP 29 RC3 + Fastly TLS bug blocks rebar3 update. The fix is landing in OTP 29 RC4 (I guess), and v0.1.0 will go up on Hex right after
% deps (git for now, hex once OTP 29 RC4 lands):
{deps, [
    {roadrunner, {git, "https://github.com/arizona-framework/roadrunner.git", {branch, "main"}}}
]}.

% boot a listener:
roadrunner:start_listener(my_listener, #{
    port => 8080,
    routes => [{~"/", my_handler, #{greeting => ~"hello"}}]
}).

Feedback is very welcome: bug reports, doc gaps, perf checks on your hardware, anything. The README has the full conformance, perf, and hardening notes, plus a docs/comparison.md with the honest take.

Beep beep.

12 Likes

Hi @williamthome,

Thank you very much for Roadrunner. This is exactly the alternative the Erlang ecosystem needed. Really appreciated.

Two things from our side:

  1. Could you explain a bit more how the load generator works? I would like to understand the internals: how connections are driven, how requests are scheduled across the 50 clients, how throughput is computed, how latency samples are aggregated for the p50/p99 columns, and whether the loader itself can become the bottleneck on the high-throughput cells.

  2. We have not switched to OTP 29 RC yet, so we cannot run the matrix locally for now. Could you add a wrk2-based scenario (GitHub - giltene/wrk2: A constant throughput, correct latency recording variant of wrk · GitHub, specifically wrk2, not wrk)? wrk2 runs at a constant target rate and accounts for Coordinated Omission, which means the percentile distribution actually reflects what a real client would observe under load instead of hiding the tail every time the server stalls. With the corrected latency you get from HdrHistogram, the 99th, 99.9th, 99.99th percentiles tell a much more honest story than what wrk, ab, vegeta … reports.

Localhost is fine, no need for HTTPS. The goal is to measure what Roadrunner can really sustain end-to-end. A matrix at, say, 50%, 75%, 90%, 95% of the saturation rate per scenario, with --latency (and ideally --u_latency for a corrected vs uncorrected comparison), would be very telling.

Thanks again, and looking forward to new features.

Best,

2 Likes

Hey, @zabrane o/

Thanks a lot for the feedback! Both items are on a local branch, merging soon.

1. Load generator internals

How the closed-loop driver works:

  • 50 worker processes, one keep-alive TCP connection each
  • Send, wait for the response, send the next
  • Throughput is total successful requests over wall-clock time
  • Each worker keeps per-request nanosecond timings; the driver merges every worker’s timings, sorts, picks p50/p95/p99 by position
  • Near peak, the bench driver and the server end up competing for the same cores, so the driver itself can become the bottleneck

Full writeup is in a doc file on the branch.

2. wrk2 scenario

A script and a CI smoke job for this are on the branch.

For each (scenario, server), wrk2 sweeps four rates: 50%, 75%, 90%, 95% of the closed-loop peak. Each measurement uses both --latency and --u_latency, so every scenario gets two tables and you can see the Coordinated Omission (CO) gap directly. If the server can’t keep up with the target rate, the row is flagged as saturated. Localhost only, no HTTPS.

CO gap on roadrunner, hello scenario at 127k req/s (50% of measured peak):

percentile corrected uncorrected ratio
p50 0.97 ms 28 µs ~35x
p99 2.01 ms 160 µs ~13x
p99.9 2.24 ms 260 µs ~9x
p99.99 2.66 ms 558 µs ~5x

That’s what the closed-loop bench was hiding.

And the three servers next to each other, hello scenario at each one’s 75% rate (corrected percentiles):

server rate p50 p99 p99.9 p99.99
roadrunner 190 k 1.03 ms 2.15 ms 2.37 ms 2.59 ms
cowboy 136 k 1.03 ms 2.14 ms 2.65 ms 5.45 ms
elli 204 k 1.04 ms 2.17 ms 2.39 ms 2.72 ms

p50 to p99 are tied across all three. The interesting bit is p99.99: roadrunner and elli stay tight (2.6 to 2.7 ms), cowboy widens to 5.4 ms, about 2x the others. That’s where each server’s design choice for handling connections (one process per conn vs other models) starts to show.

1 Like

Hey @williamthome

This is great, thank you so much. The CO gap table comparison is exactly the honest data I was hoping for, and the turnaround was impressive.

Once OTP 29 reaches a stable release, I will start benching Roadrunner internally on our own workloads. I will share our numbers back once we have run them on our hardware.

2 Likes

Looks really cool! Nice work.

what would you say is the source of the performance improvements?

How does it compare to Elixir’s Bandit and Gleam’s Ewe? I believe they are supposed to be the fastest BEAM HTTP servers currently.

1 Like

Thanks Louis!

I’d say two layers: architecture and BEAM-idiomatic patterns.

Architecture:

  • One process per connection running a tail-recursive proc_lib loop, no gen_server / gen_statem boundaries
  • Passive gen_tcp:recv on the hot path
  • Acceptor pool inline on gen_tcp / ssl, no Ranch
  • Pure-incremental-binary-matcher parser (erlang:decode_packet measured 2x slower)

BEAM-idiomatic patterns (small wins that compound):

  • Body recursion, no lists:reverse(Acc) accumulator pattern
  • Compiled binary:cp() patterns stashed in persistent_term via on_load, threaded as individual args (zero allocation on hot reads)
  • ASCII-only fast paths (e.g. ascii_lowercase) over Unicode-aware BIFs (string:lowercase) where the wire data is ASCII-bounded

I haven’t cross-benched directly so I can’t make a “fastest” claim. Bandit and Roadrunner are conceptually similar (pure Erlang / pure Elixir, h1+h2+WS on one listener, both pass h2spec strict and Autobahn 100%). The architectural differences are Bandit uses ThousandIsland + Plug, Roadrunner inlines gen_tcp/ssl + explicit return shapes. I’d expect them in similar territory on raw throughput, with workload-shape differences.

For Ewe I’m less familiar with the internals. Would love to see Ewe vs cowboy numbers from your side if you have them, happy to run roadrunner on the same scenario for direct comparison.

2 Likes

Body recursion is a big one for high performance erlang!

Awesome job!

1 Like

Hey folks,

roadrunner 0.1.0 is on Hex! OTP 29 is out, the TLS blocker is gone, so the hex package could finally land.

{deps, [
    {roadrunner, "~> 0.1"}
]}.

@zabrane the wrk2 work landed in docs/wrk2_results.md. Thanks for the nudge.

0.x caveat still applies: API may change between minor versions, so pin an exact version {roadrunner, "0.1.0"} if you need stability across upgrades.

Feedback and bug reports are very welcome! o/

2 Likes