Erlang refc-binaries, why not lists too?

nhpip · January 1, 2024, 11:08pm

So I was reading this article:

Yes, I understand the benefits of share-nothing, but could there be performance gains in taking a similar approach with lists (and maps) as the VM currently does for binaries? Maybe with simpler rules if a list is updated (an entire copy would then be made).

Or maybe create shared heaps between 2 processes?

Just curious…

nzok · January 2, 2024, 3:42am

Lists aren’t updated and Erlang has been built with a shared heap.

garazdawi · January 2, 2024, 7:38am

One of the things that makes binaries very easy to do reference counting on is that they do not contain any references to other Erlang terms. Lists, tuples and maps can all contains any other term, so you would need to place the entire term tree into the global heap, or make the GC aware of references to term inbetween heaps, basically creating a global heap for all processes.

As @nzok mentioned, there are have been attempts to put more terms into a global shared heap before, but they did not prove to give enough of a performance benefit to warrent the complexity. The papers written by Jesper Wilhelson that can be found in the HiPE groups publications explore one such attempt.

rvirding · January 2, 2024, 5:20pm

Way back in the nineties I did some experiments with writing Erlang VMs which had a shared heap for all data. While this definitely did make message passing faster it also made other things much more complicated, especially the garbage collector.

So to keep the property that the VM should never block the GC had to be a real-time interactive collector which ran in the background while the system was running. While this is/was known technology it made it much more complicated and slower as it was working its way through the heap while the heap was being updated. In the end there was a very small increase in efficiency so it was not really worth the effort.

Also these experiments were done on a single thread implementation. Moving that to a multi-thread implementation like we have today would have made it much more complicated and slower as you would need locks and synchronisations to keep control of what is going on. Maybe if you had a very very limited version with one thread for doing all memory handling it would work, but goodbye speed up.

nhpip · January 2, 2024, 6:25pm

Thank you all, I guess considering the extra complexity and risk makes it not worth it, especially if it impacts Erlang’s soft real-time guarantees. I found this interesting:

Time performance. As can be seen in Figure 7(a), in the
synthetic procs benchmark, the shared heap system is much
faster when it comes to sending small-sized messages among
100 Erlang processes. This is partly due to the send opera-
tion being faster and partly because the shared heap system
starts with a bigger heap and hence does not need to do
as much garbage collection. When messages are small, in-
creasing the number of processes to 1000 does not change
the picture much as can be seen in Figure 7(b). On the
other hand, if the size of the message is increased so that
the shared heap system also requires garbage collection of-
ten, then the effect of the bigger root set which increases
garbage collection times becomes visible; see Figures 7(c)
and 7(d). This is expected, since the number of processes
which have been active between garbage collections (i.e., the
root set) is quite high.

From https://user.it.uu.se/~kostis/Papers/heap.pdf

I wonder if limiting the shared-heap to be between 2 processes only would be simpler? Maybe a flag on a spawn?

NelsonVides · January 3, 2024, 7:52am

I’ve wondered about that many times, often there are processes that are only meant to send messages between each other and only one of them sends messages to the outside world.

See for example using SSL over the new socket module: my main process is linked to the SSL process, who in turn is linked to the TCP process, and these other two processes (the SSL and the TCP ones) are pretty isolated from any other process in the entire VM.

Would in that case for example a flag to the spawn_link make the other two processes heaps linked to my socket owner process? And then sending all the {ssl, Socket, Payload} and {tcp, Socket, Payload} messages would be cheaper? Of course if it is at the expense of GC it’s probably not worth it, but it has made me curious many times

mikpe · January 3, 2024, 10:08am

The problem with automatic memory management (aka GC) is that whatever you do, it’s going to be a compromise. Optimize for one thing and you’ll be slowing down other things. The design with per-process heaps has strong benefits we wouldn’t want to lose, even if it means sends must copy.

raimo · January 8, 2024, 2:36pm

Some years back we brainstormed about ways to optimize inter-process communication, and one way that surfaced was to have a new kind of process relation explicitly selected by the programmer.

A work name of such a feature was ‘fibers’. When you spawned a new process it could be spawned as a fiber, i.e. a close sibling that had the same heap as the spawner. For some applications this could be really useful, when they need to co-operate on large heap terms.

Garbage collection would have to be common as in; all fibers stop for GC.

Such a feature would introduce a new granularity of parallelization.

I don’t recall if all fibers should share message box or have their own.

I think the gain was considered to be too small for the added complexity, so we never went ahead to implementation…

TD5 · January 19, 2024, 2:14pm

That would be really handy for convenient local parallelism, like a parallel lists:map/2