On garbage-collecting atoms

Note that you already have a global storage for the atoms, the atom table! You only need a deterministic algorithm for generating new atoms. No need to store them anywhere. E.g:

my_atom_gen() -> ?LET(N, choose(1, 10_000), list_to_atom(integer_to_list(N))).
3 Likes

I donā€™t think that this will contribute much. It is unlikely that, for example, '1' will trigger something that '2'ā€¦'10000' wonā€™t. In general, it is questionable that generating totally random atoms is of much value, ie likely to trigger anything.

The interesting atoms, as I see it, are:

  • well-known atoms such as ok, error, true, false, undefined, infinity, etc
  • ā€œweirdā€ atoms like '', or containing weird characters like '{}', '\'' or 'šŸ˜±', or very long atom names, etc
  • module and function names
2 Likes

It was just an example. If you need random strings there are deterministic ways of generated a limited set of those too.

3 Likes

I really didnā€™t want to start a fight hereā€¦ :neutral_face:


Anyway, I think what would help (and what it would have to boil down to) is to have a reference implementation which can be tried against some reasonably large system to actually see the impact (or not).

Quoting from EEP 20:

The change is simple in concept, ā€¦"

:grin:

ā€¦ but affects several atoms in the core of the system.

ā€¦ the extent and impact of which would be good to know, too. Becauseā€¦

Theory meets practice.
ā€œHi, Iā€™m Theory :smiling_face_with_three_hearts: How do you do? :hugs:ā€
ā€œIā€™m Practice :smirk: Now go sit over there, and watch :boom:ā€

3 Likes

One class of atoms in Erlang thatā€™s worth remembering is
Internet domain names like snap.stanford.edu .
Erlang lexical structure was specifically bent to allow
such atoms to be used freely without quoting.
Indeed node names like logger@demo.example.org do not need
quoting, precisely because it was expected that such names
would be used often.

2 Likes

FWIW: Add ability to limit the number of generated atoms by Maria-12648430 Ā· Pull Request #307 Ā· proper-testing/proper Ā· GitHub

If anyone has any bright ideas on the topic, feel free to comment here or contribute to PropEr :kissing_smiling_eyes:

3 Likes

IMO you should fork PropEr and work on making the atom generator less dangerous without unduly worrying about ā€œupstreamā€.

There is a trick (due to Roberto Aloi if Iā€™m not mistaken) to retrieve the set of atoms known to the VM. One could limit the atom generator to that set plus perhaps a small set of random ones.

2 Likes

Well, if I wanted this for a private or work-related project, I might do just that. However, what brought the issue up for us was writing property tests for OTP, namely the lists module.

Now, on the one hand, the few existing property tests in OTP will only work with PropEr. They use features that the alternatives, Eqc and Triq, donā€™t have. Consequently, the automated tests are run with PropEr.

On the other hand, while PropEr does have that questionable atom generator that ultimately puts a tight lid on the number of property tests that can be in OTP, I think it unlikely that they will switch to a spur-of-the-moment fork of it, whatever its benefits. Even if they did, I canā€™t say for certain that we would be willing to maintain it for all eternity :sweat_smile:

For those reasons, we would like to see the change being done in PropEr :wink:

Something like this you mean?

(fun F(N) ->
    try binary_to_term(<<131,75,N:24>>)
    of A -> [A | F(N+1)]
    catch error:badarg -> []
    end
end)(0).`
1 Like

I see that PropEr upstream donā€™t seem to want to change their atom generator. You could suggest adding a new ā€œsafe_atomā€ generator (for some definition of ā€œsafeā€). There is a definite use-case for it, so upstream should IMO be willing to consider it (modulo implementation nits). That way users can opt-in on the new behaviour.

Yes that looks like the trick. Perhaps Roberto was the one who mentioned and used it locally (while he was at Klarna) so thatā€™s why I associated his name with it.

2 Likes

Way back in 1980-something, Quintus lifted the
size limit on atoms from 512 characters to 1024
so that any BSD file name could be represented
as a Prolog atom. I find it surprising that
Erlang limits atoms to 255 characters. Even
so, if an atom name is 0 to 255 Latin-1
characters, there are
126733357142396107061626967406548831217427853606727388361295472264802569172030944679204711417736915543877325336948619684290664937845062435667012362299797188708356381832552925850000775491724125909004147601506147326575655759382822573183678801327357531257182924909946294285833630947219865916108048624015413276556526797606542519296803822692410076450416376609122831600305120232670690951797883584902974873977165163631151387131222970029797455499222140801583127895032498945988728175336549842059439614615416954783053108159438031117827440268050975051825588887801715041347459354640116096648307759235106463504140602396312142081
possible atoms. Thatā€™s a bit over 10**614.

So weā€™re never going to be able to sample more than
an infinitesimally tiny fraction of possible atoms.

That suggests to me taking a leaf from simulation
and specifically from Deterministic Monte Carlo.
Iā€™m thinking that a carefully crafted fixed set
of say 200 atoms, ranging from say ā€˜ā€™ to
ā€˜\0(repeated 255 times)ā€™ might do BETTER at finding
bugs than ā€˜any possible atomā€™.

3 Likes

It is worth noting that the high estimate of the number of atoms in the observable universe is 1089, pun intended :laughing:

Just my opinion, only one addition: the universally used atoms (undefined, ok, error, true, false etc) should be contained in that set and emphasized, meaning that they should be generated more often than the other ā€œweirdā€ ones.

6 Likes

Ok, here is our new attempt on the matter of PropEr atom generation: Add generator for existing atoms by Maria-12648430 Ā· Pull Request #310 Ā· proper-testing/proper Ā· GitHub

2 Likes