Thanks for the explanation @garazdawi, I guessed something like this was the case I also think that use cases that run a risk of exhausting the atom table are rare enough to not impose large-scale refactorings and performance penalties in Erlang to accomodate for them.
That said, I have no brilliant idea to contribute to the matter, either. The best I can think of (and I’m pretty sure the idea has countless holes in it) is this below:
Reading EEP 20, it sounds like it proposes storing global (permanent, ie what we have now) as well as local (reclaimable) atoms in the same (global) lookup table. When a process exits, the local atoms it owned disappear and pointers to it are replaced with an equivalence representation such that other processes referring to it don’t end up with “broken” atoms. I may be mistaken here, though…
The EEP aims at making the global/local distinction fully opaque, ie a user has to do nothing (in fact, can’t do anything) and does not notice any difference, in semantics at least, at the cost of performance and memory impact .
But maybe the proposed solution is trying to be too smart in that regard?
What if we had the global table as we do now, and each process has an own local table?
By default, atoms are created globally, the way it is now, meaning any existing code would work (or not) same as before. But iff a process creates an atom locally by somehow explicitly saying so (don’t know how that could look in code), the atom is stored in the process’ atom table. Unless that atom already exists globally: in that case, it would make no sense to store it a second time locally (Maybe we could also think of some pruning mechanism, such that when a global atom is created, local atoms of the same name are invalidated and removed).
As long as a local atom is used only within the process itself, there is no real difference between local and global atoms. But if a local atom is sent to another process, it has to be sent as a string, the same way as if it was sent to another node; global atoms, on the other hand, can be sent the same way as they are today, they are known globally.
A process receiving a local atom from another process checks if it has such an atom itself already, or create and store it locally. Receiving a global atom is the same as it is now.
If a process exits, all the local atoms it had just vanish, without affecting anything else. The global atoms stay, as usual.
(Glossing over details now…)
By separating things that way, a real performance penalty only occurs when a process sends a local atom to another process. This can be documented. The memory increase is controllable since a user has to take explicit action to create an atom locally, instead of automatic like the EEP suggests.
Put differently, a user can control if and to what degree he wants reclaimable atoms and suffer the performance/memory penalty, or the other way round.
What do you think? (And please don’t hurt me )