Oooo, this looks fun, let’s try!
Here’s the erlang results here:
❯ erlc ring_benchmark.erl
ring_benchmark.erl:14:8: Warning: erlang:now/0 is deprecated; see the "Time and Time Correction in Erlang" chapter of the ERTS User's Guide for more information
% 14| T1 = now(),
% | ^
ring_benchmark.erl:19:8: Warning: erlang:now/0 is deprecated; see the "Time and Time Correction in Erlang" chapter of the ERTS User's Guide for more information
% 19| T2 = now(),
% | ^
❯ erl
Erlang/OTP 24 [erts-12.1.5] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit]
Eshell V12.1.5 (abort with ^G)
1> ring_benchmark:run(100, 100000).
6180679
And here’s the golang results here:
❯ go build ring-go.go
❯ ./ring-go -m=100 -n=100000
6.907550383s
So as expected, erlang is faster for this (go’s channels and greenthreads have a lot of overhead, hence why the ‘fastest’ go programs don’t use them).
(Note: My rust times here are wrong because I left a bug in because there wasn’t actually any data for me to consume (it was a dataless operation… ^.^;) so it all got optimized-skipped out, see later post for accurate times.)
Hmm, let’s try this in rust, a very generic channel processor using native threads:
use clap::Parser;
use crossbeam::{channel, scope};
use std::time::Instant;
#[derive(Parser)]
struct CliArgs {
#[clap(short, long, default_value = "10")]
m: usize,
#[clap(short, long, default_value = "10")]
n: usize,
}
fn main() {
let CliArgs { m, n } = CliArgs::parse();
let chs: Vec<_> = (0..n).map(|_| channel::bounded(0)).collect();
let done = channel::bounded(n);
scope(|s| {
for i in 0..n {
let channel = &chs[i].1;
let next = &chs[(i+1) % n].0;
let done = &done.1;
s.spawn(move |_| {
loop {
channel::select! {
recv(channel) -> k => {
let k = k.unwrap();
if k > 0 {
next.send(k - 1).unwrap();
} else {
return;
}
}
recv(done) -> _ => return,
};
}
});
}
let start = Instant::now();
chs[0].0.send(m * n).unwrap();
(0..n).for_each(|_| done.0.send(()).unwrap());
let end = Instant::now();
println!("{:?}", end - start);
}).unwrap();
}
Now obviously allocating 100000 native threads is extreme, so swapping the m and n to make 100 threads with 100000 processing:
❯ cargo run --release --bin ring_rs_basic -- -m100000 -n100
Compiling ring v0.1.0 (/home/overminddl1/tmp/ring)
Finished release [optimized] target(s) in 12.23s
Running `target/release/ring_rs_basic -m100000 -n100`
9.81679ms
Though let’s try it with n=16 so one thread per core at least:
❯ cargo run --release --bin ring_rs_basic -- -m100000 -n16ring_rs_greenthreads -- -m100 -n100000
Finished release [optimized] target(s) in 0.03s
Running `target/release/ring_rs_basic -m100000 -n16`
55.886µs
Too fast! Bumping m
to 1 million:
❯ cargo run --release --bin ring_rs_basic -- -m1000000 -n16
Finished release [optimized] target(s) in 0.03s
Running `target/release/ring_rs_basic -m1000000 -n16`
962.916µs
Still fast but whatever, almost 1 millisecond to send 16 million messages around 16 threads.
Now let’s try it with Rust’s greenthreads/async as this would be most close to what erlang and go are doing, I’m expecting slower than the native threads of course as these will obviously run a lot more code instead of simple hardware synchronization, the code:
use clap::Parser;
use std::time::Instant;
use tokio::{select, sync::{mpsc, watch}, task};
#[derive(Parser)]
struct CliArgs {
#[clap(short, long, default_value = "10")]
m: usize,
#[clap(short, long, default_value = "10")]
n: usize,
}
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let CliArgs { m, n } = CliArgs::parse();
let mut chs: Vec<_> = (0..n).map(|_| {
let (tx, rx) = mpsc::channel::<usize>(1);
(tx, Some(rx))
}).collect();
let (done_tx, done) = watch::channel(false);
for i in 0..n {
let mut channel = chs[i].1.take().unwrap();
let next = chs[(i+1) % n].0.clone();
let mut done = done.clone();
task::spawn(async move {
loop {
select! {
k = channel.recv() => {
if let Some(k) = k {
if k > 0 {
if let Err(e) = next.send(k - 1).await {
// 'next''task has closed
eprintln!("Task closed early: {:?}", e);
return;
}
} else {
return;
}
} else {
// channel closed
return;
}
}
_ = done.changed() => if *done.borrow() { return; },
};
}
});
}
let start = Instant::now();
chs[0].0.send(m * n).await?;
done_tx.send(true)?;
let end = Instant::now();
println!("{:?}", end - start);
Ok(())
}
And running it:
❯ cargo run --release --bin ring_rs_greenthreads -- -m100 -n100000
Compiling ring v0.1.0 (/home/overminddl1/tmp/ring)
Finished release [optimized] target(s) in 15.26s
Running `target/release/ring_rs_greenthreads -m100 -n100000`
68.088705ms
Yep a lot slower, as expected, but still a lot faster than go and erlang, which again took (running again, and of course times vary):
❯ erl
Erlang/OTP 24 [erts-12.1.5] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit]
Eshell V12.1.5 (abort with ^G)
1> ring_benchmark:run(100, 100000).
6064000
❯ ./ring-go -m=100 -n=100000
7.098047836s
❯ ./target/release/ring_rs_greenthreads -m100 -n100000
89.482385ms
And yes, I confirmed that rust is indeed executing 100*100000
many times (the times were so fast that something didn’t seem right, and who knows, something about my code might not be right, please confirm above, but it is indeed counting down from 10000000).
EDIT: Don’t forget that Rust plays VERY nicely with erlang, it’s easy to make NIF’s. ^.^