Alias vs pid performance

lpil · February 18, 2025, 4:19pm

Hello friends!

I’ve been trying to understand aliases and when best to use them, and one thing I’ve been unable to understand is the performance implications of using aliases.

There doesn’t seem to be documentation on their performance, so I took some measurements quickly.

run = fn get  -> 
  process = get.()
  n = 1000000
  for i <- 0..n do 
    :erlang.send(process, i)
  end
  for _ <- 0..n do 
    receive do
      i -> i
    after 0 -> throw "error"
    end
  end
end

Benchee.run(
  %{
    "pid" => fn -> run.(&:erlang.self/0) end,
    "alias" => fn -> run.(&:erlang.alias/0) end
  }
)

Name            ips        average  deviation         median         99th %
pid            8.36      119.56 ms    ±22.31%      113.68 ms      277.74 ms
alias          4.12      242.48 ms    ±21.28%      239.01 ms      343.54 ms

Comparison: 
pid            8.36
alias          4.12 - 2.03x slower +122.93 ms

In this micro-benchmark using aliases was half as fast.

This next benchmark surprised me!

run = fn get  -> 
  process = get.()
  n = 1000000
  for i <- 0..n do 
    :erlang.send(process, i)
  end
end

Benchee.run(
  %{
    "pid" => fn -> run.(&:erlang.self/0) end,
    "alias" => fn -> run.(&:erlang.alias/0) end
  }
)

Name            ips        average  deviation         median         99th %
alias          2.38         0.42 s    ±66.61%         0.29 s         1.11 s
pid            0.76         1.32 s    ±67.38%         1.05 s         2.54 s

Comparison: 
alias          2.38
pid            0.76 - 3.14x slower +0.90 s

Without the receive using aliases was much faster!

What’s going on here?

Is there documentation on alias performance expectations anywhere?

garazdawi · February 18, 2025, 9:11pm

I would expect them to be pretty much the same in real code. In micro-benchmarks you can get all kinds of strange behavior.

I suspect that the reason that pid is faster in the benchmark with receive is because there are special optimization to send to self() that are not there for an alias to yourself.

In the second example I would guess that it might be because the pid benchmark is run after the alias benchmark. i.e. that there some somesort of interference inbetween the two. But this I’m not sure about as it depends a lot on how benchee is running its benchmarks.

lpil · February 18, 2025, 11:22pm

Thank you very much! Really useful to know they are expected to be the same.

max-au · February 22, 2025, 3:49am

Hm, I am actually inclined to think that sending a message to alias() is indeed slower.

./erlperf 'run(Pid) -> Pid ! 1.' --init_runner 'self().' 'run(Alias) -> Alias ! 1.' --init_runner 'alias().' -r full
OS : Linux
CPU: AMD Ryzen 9 5950X 16-Core Processor
VM : Erlang/OTP 26 [erts-14.2.3] [source] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit:ns]

Code                         ||   Samples       Avg   StdDev    Median      P99  Iteration    Rel
run(Pid) -> Pid ! 1.          1         3   2153 Ki   17.95%   1996 Ki  2594 Ki     464 ns   100%
run(Alias) -> Alias ! 1.      1         3    779 Ki   21.23%    699 Ki   969 Ki    1284 ns    36%

Master branch (OTP28) is 2x faster overall, but relative performance stays the same:

OS : Linux
CPU: AMD Ryzen 9 5950X 16-Core Processor
VM : Erlang/OTP 28 [RELEASE CANDIDATE 1] [erts-15.2.2] [source-64185e73b0] [64-bit] [smp:16:16] [ds:16:16:10] [async-threads:1] [jit:ns]

Code                         ||   Samples       Avg   StdDev    Median      P99  Iteration    Rel
run(Pid) -> Pid ! 1.          1         3   5780 Ki   27.11%   5242 Ki  7546 Ki     173 ns   100%
run(Alias) -> Alias ! 1.      1         3   2403 Ki   26.20%   2163 Ki  3118 Ki     416 ns    42%

Same results when I use loop mode. I am quite positive that sending to an alias() is slower, as in this case benchmarks are run concurrently.

lpil · February 22, 2025, 8:34pm

Thank you.

Is this considered a bug then? If they’re intended to be the same.

rickard · February 23, 2025, 4:49pm

No.

There are mainly three reasons for why send/receive using aliases instead of pids is slower. Almost all of the overhead is on the receiving side.

Ordinary message signals sent using a pid will always unconditionally be moved into the message queue upon reception. Due to this, if there is a large uninterrupted sequence of message signals in the signal queue at reception, all of them can be moved into the message queue at once with just a few pointer assignments. This is not the case for message signals sent using aliases since the receiver needs to inspect each such signal in order to determine if it should be moved into the message queue or dropped.
When the receiver determine whether or not a message signal sent using an alias should be moved into the message queue or dropped it needs to check whether the alias is active or not. In order to do that, it needs to look up information about the alias in local data. Currently such information is stored in a red/black search tree together with information about monitors that the process has set up. No such operation is needed for a message signal sent using a pid since it always unconditionally should be moved into the message queue.
More data needs to be passed in the message signal sent using an alias compared to a message signal sent using a pid.

The overhead due to 3. is quite small. The overhead due to 1. is not as large as one might think (see below). The overhead of 2. will always be there regardless of how much you optimize it. That is, it will always be the case that send/receive of messages using aliases will be slower than using pids.

Micro benchmarks will typically give message signals sent using a pid compared to using an alias an unrealistic advantage, due to 1., since you seldom have huge sequences of uninterrupted message signals sent using pids. For example, when you do a gen_server:call(), three signals will be sent. One monitor signal, followed by a message signal using a pid, followed by a demonitor signal. There can of course be long uninterrupted sequences of message signals sent using pids, but 1000000, like in your benchmark, will be very rare.