I’ve been trying to understand aliases and when best to use them, and one thing I’ve been unable to understand is the performance implications of using aliases.
There doesn’t seem to be documentation on their performance, so I took some measurements quickly.
run = fn get ->
process = get.()
n = 1000000
for i <- 0..n do
:erlang.send(process, i)
end
for _ <- 0..n do
receive do
i -> i
after 0 -> throw "error"
end
end
end
Benchee.run(
%{
"pid" => fn -> run.(&:erlang.self/0) end,
"alias" => fn -> run.(&:erlang.alias/0) end
}
)
Name ips average deviation median 99th %
pid 8.36 119.56 ms ±22.31% 113.68 ms 277.74 ms
alias 4.12 242.48 ms ±21.28% 239.01 ms 343.54 ms
Comparison:
pid 8.36
alias 4.12 - 2.03x slower +122.93 ms
In this micro-benchmark using aliases was half as fast.
This next benchmark surprised me!
run = fn get ->
process = get.()
n = 1000000
for i <- 0..n do
:erlang.send(process, i)
end
end
Benchee.run(
%{
"pid" => fn -> run.(&:erlang.self/0) end,
"alias" => fn -> run.(&:erlang.alias/0) end
}
)
Name ips average deviation median 99th %
alias 2.38 0.42 s ±66.61% 0.29 s 1.11 s
pid 0.76 1.32 s ±67.38% 1.05 s 2.54 s
Comparison:
alias 2.38
pid 0.76 - 3.14x slower +0.90 s
Without the receive using aliases was much faster!
What’s going on here?
Is there documentation on alias performance expectations anywhere?
I would expect them to be pretty much the same in real code. In micro-benchmarks you can get all kinds of strange behavior.
I suspect that the reason that pid is faster in the benchmark with receive is because there are special optimization to send to self() that are not there for an alias to yourself.
In the second example I would guess that it might be because the pid benchmark is run after the alias benchmark. i.e. that there some somesort of interference inbetween the two. But this I’m not sure about as it depends a lot on how benchee is running its benchmarks.
There are mainly three reasons for why send/receive using aliases instead of pids is slower. Almost all of the overhead is on the receiving side.
Ordinary message signals sent using a pid will always unconditionally be moved into the message queue upon reception. Due to this, if there is a large uninterrupted sequence of message signals in the signal queue at reception, all of them can be moved into the message queue at once with just a few pointer assignments. This is not the case for message signals sent using aliases since the receiver needs to inspect each such signal in order to determine if it should be moved into the message queue or dropped.
When the receiver determine whether or not a message signal sent using an alias should be moved into the message queue or dropped it needs to check whether the alias is active or not. In order to do that, it needs to look up information about the alias in local data. Currently such information is stored in a red/black search tree together with information about monitors that the process has set up. No such operation is needed for a message signal sent using a pid since it always unconditionally should be moved into the message queue.
More data needs to be passed in the message signal sent using an alias compared to a message signal sent using a pid.
The overhead due to 3. is quite small. The overhead due to 1. is not as large as one might think (see below). The overhead of 2. will always be there regardless of how much you optimize it. That is, it will always be the case that send/receive of messages using aliases will be slower than using pids.
Micro benchmarks will typically give message signals sent using a pid compared to using an alias an unrealistic advantage, due to 1., since you seldom have huge sequences of uninterrupted message signals sent using pids. For example, when you do a gen_server:call(), three signals will be sent. One monitor signal, followed by a message signal using a pid, followed by a demonitor signal. There can of course be long uninterrupted sequences of message signals sent using pids, but 1000000, like in your benchmark, will be very rare.