Per-process atom limit

We recently ran our Elixir code-base against Sobelow which, correctly identified a number of calls to String.to_atom/1 (which calls the bif erlang:binary_to_atom/2), recommending use of ..to_existing_atom/2 instead.

This got me thinking. In this case the input was sanitized, so calls to to_atom/2 are safe. Has anyone thought of an optional per-process atom create limit? Either through the use of a process flag, or a new to_atom variant/opt-arg?

The atoms would still end up on the global atom-table, and the count of created atoms would only apply to new atoms. This would safe-guard against the atom table filling up since processes that exceed the per-process limit would be terminated.

1 Like

I suppose a question is why do you want / need to create atoms (constants) at runtime?

My take without knowing is that we should avoid doing so, even in the context of a scoped set of space with limits, though I do agree this reduces the surface area of risk.

That said, I look forward to your answer, as it may change my mind.

1 Like

I don’t think a per-process limit would provide any effective protection. The atoms may be created by temporary processes, or a long-lived one that would be restarted by a supervisor if it was killed.

There is a mechanism to monitor the number of atoms, so you can poll that and generate alerts if a node enters a danger zone.

Perhaps extend erlang:system_monitor/2?

2 Likes

^ Excellent point. I assumed op meant they would live generally in the same literal area, but yet be scoped to a process and that they persist on beyond the life of the process that created them. Of course, that would have a lot of gotchas that need to be dealt with, starting with, and per your excellent call out, how do you scope them? They cant be scoped to a pid :smile:

Let the number of new atoms per process limit be 1.
Create N processes, each of which creates a new atom and terminates.
Presto Chango! We’ve now created N new atoms, and N can be as big as we want.

How can this solve the problem unless the new atoms created by such a
process go away when the process terminates?
But there are two problems with that.

atom_smasher(0) → ok;
atom_smasher(N) →
Self = self(),
Atom = list_to_atom(pid_to_list(Self)),

spawn(fun () → atom_smasher(N-1), Self!done end),
receive done → ok end.

This isn’t tested and illustrates the first problem rather than perfectly
embodying it. The thing is that all N processes exist at the same time,
so having atoms go away when a process terminates isn’t enough.

The other is that a minor tweak to the code would have process I send
atoms 1, 
, I to process I+1, so having atoms go away would break the code.

We KNOW how to completely solve the problem once and for all.
EEP 20.

Historically, what happened was that instead of fixing the global atom table issue, the Erlang community switched over to using binaries instead of atoms.
Given that this works right now, that’s something that can be done right now.

1 Like

Unfortunately monitoring is generally not a good-enough solution to this issue, especially if the atoms can be created from potentially-hostile external input. By the time you notice the atoms are getting exhausted there’s generally not much you can do other than take the system down and fix the “leak”. It’s effectively the same as the system just dying itself.

1 Like

The use case is a management tool used by support and engineering to debug a backend. In this case most of the atoms are fields in Elixir Ecto schemas, so should be known at compile time (although it seems like that’s only the case if a schema is used). We do permit some dynamic atoms to be used as aliases and column names in CSV files, but there are fewer than 10 users and they are smart enough not to do anything dumb.

I originally thought of a per-process atom table, but I wanted to limit the scope of the effort involved.

Are there any benefits of atoms as literals instead of using a binary?

1 Like

If you mean preferring atoms over binaries and such for say keys in a map, then yes, there indeed are. Namely, each key of type binary will cost a tiny allocation, particularly in the case that these are <= 64 byte, this can lead to long lived garbage in a process (if you will), especially when passing messages. Additionally, lots of little small allocations can sometimes lead to underutilized memory carriers as well.

So yes, there is solid reason to prefer atoms here. Yet, getting to where you’d like things to be is no small effort and no walk in the park I believe. I’m of course curious what OTP team thinks about this and what’s more what OTP team thinks about Richard’s EEP these days.

1 Like

Not holding my breath :smile:



Author:
    Richard A. O'Keefe <ok(at)cs(dot)otago(dot)ac(dot)nz> 
Status:
    Draft 
Type:
    Standards Track
Created:
    05-Aug-2008
Erlang-Version:
    R12B-4

aye, I doubt it would happen, but it’s an interesting conversation! :smile:

1 Like

Too busy coughing (thanks, COVID) to hold my breath either.
However, the Frames proposal went apparently unnoticed for a long time before the OTP team decided to do something almost but not completely different.
(There are things that maps can do easily that frames were deliberately designed NOT to do, on the other hand frames would have been more efficient for what they were designed to be.)
What matters is that

  • there is a problem
  • there is KNOWN to be a possible solution
  • someone may be sufficiently provoked by the existing proposal to do something
  • even if it IS something else, as long as the Erlang community benefit.

As for converting ‘strings’ (however defined) to atoms (or not) in the context
of JSON, I’m slowly putting together a new EEP, containing this example. (With a
debt of gratitude to Malcolm Bradbury. Coming as it does from a book called
“Rates of Exchange”, the vloska is the perfect fictional currency.)

The following is adapted from data I got via a free (limited-use)
API key from OpenExchange (openexchangerates.org).

{
"disclaimer": "Fictitious data",
"date-and-time": "2023-02-29T12:00:00.00Z",
"currencies": {
"SAD" : "South Atlantean Drachma",
"VLO" : "Slakana Vloska",
"ZLT" : "Erewhonian Zlotnick"
},
"conversion": {
"base" : "SAD",
"rates" : {
"SAD" : 1.00,
"VLO" : 488.01,
"ZLT" : 7.78
}
}
}

Some of the object keys here are part of the PROTOCOL, and you want those ones converted to atoms. base, conversion, Currencies, date-and-time, disclaimer, rates

Other of the object keys are part of the PAYLOAD, they have to be converted the same way in both subobjects, but there is no gain from converting them to atoms. SAD, VLO, ZLT The code processing these keys probably isn’t going to be matching them against known atoms (I mean, who knew that South Atlantis was going to break away from Atlantis, or that they’d switch from the Dirham to the Drachma?)

Will your example include avoiding the blunder of this API representing monetary values with floats?

To be fair, json doesn’t have floats – it just has numbers and arbitrary precision ones. It’s on the parser to determine how it wants to turn the string back into a native data value.

1 Like

Yes, I know, but most people do not understand the distinction and most libraries will give your an arbitrary prevision value. For monetary use case this is quite unfortunate.

I am happy the new json module makes this easy for me to deal with in the future.

Didn’t we have a very similar topic, like, >2 years ago? :sweat_smile:

Yeah, it’s a bit of an error loop :smile:

1 Like

Ah, but are they floats? One problem with JSON is that there are no integers, floats, rationals, or anything else. Just “numbers”. There is no rule that says 488.01 is not exact (and of course,
neither is there any rule that says 488 is exact either).

You are of course correct that it would be better to represent exchange rates as
[numerator,denominator] pairs. In practice, exchange rates are changing all the time.
Even from second to second. If you care about the precision of exchange rates enough
for floating-point to be problematic, you are NOT going to be satisfied with downloading
a table once per day, because the error from doing that is going to be FAR FAR bigger.
Seriously: suppose you download some conversion rates at time T, and determine that there
is an arbitrage opportunity (even taking fees into account), and begin a sequence of
transactions at T+2 T+3 T+4. At the end, you may be astonished to find that you lost
money. Why? Because the exchange rates changed a couple of times.

Take NZD and USD. Currently 1 USD = 1.6000 NZD. Two hours ago, 1 USD = 1.6019 NZD.
If I wanted to do a trade NOW and assumed the rate was still 1.6019, I’d be making a
0.12% error, which is HUGE compared with floating-point roundoff.

I used to tell my students in the BioInformatics paper and the Software Engineering
paper that there were two errors in your calculations: the BIG error and the little
error, and that the little error was in your numerical code or selection of a (feasible)
approximation algorithm instead of an (infeasible) exact optimisation, while the BIG
error snuck in when you chose how to model the world. Exchange rates are a perfect
example.

This whole thing about exchange rates may well be relevant to quite a few people in
the Erlang community. If you are doing business on the Internet, you might be paid
in a variety of currencies, and you might want to offer goods or services to your
customer in their currency, or you might be buying time or storage or whatever in a
currency other than your own. You really really need to have a good long talk with
your bank or payment service to figure out how they determine which rates to use and
make sure that you’re using the same figures. Using the same figures at the same
time for the same transaction is going to be more important than using integers or floats.

Indeed we did, with a great deal of FUD about EEP 20 I had mercifully forgotten.
Ecclesiastes 1:8-9.
8 All things are wearisome, more than one can say. The eye never has enough of seeing, nor the ear its fill of hearing. 9 What has been will be again, what has been done will be done again; there is nothing new under the sun.

1 Like

You’re a Lorite, I see, and so was Kohelet :smile:

Seriously though, if by FUD you mean “Fear, Uncertainty, Doubt” (yeah, I had to google it), I didn’t have that impression in the discussion. Concerns were raised re the feasibility and implications of the approach. A proof of concept would have been helpful, but this not being a small undertaking, everyone being busy, and there being no pressing need (even @Maria-12648430 found a good solution to the problem she had back then), it all petered out.

1 Like