`persistent_term` batch write performance

TD5 · May 14, 2024, 1:21pm

Storing or updating a term (using put/2) is proportional to the number of already created persistent terms because the hash table holding the keys will be copied. In addition, the term itself will be copied.
src: Erlang -- persistent_term

In my use case, I have a large set of associations I want to put into persistent_terms after some initial setup work is done. Right now, I think I pay a big penalty for repeatedly calling persistent_term:put/2, since it results in a lot of copying of the hash table. Would it be possible (and make sense) to have a bulk equivalent to put multiple keys at once, so we only need to copy the underlying hash table once? Is there a way to do something similar now (my assumption is that just storing one map is less efficient for reading than having the keys each as separate persistent term entries).

josevalim · May 14, 2024, 1:33pm

My understanding is that reading from persistent term is effectively giving you a pointer to the data structure. There is no copying whatsoever. So if you need to store three different keys in persistent term, foo, bar, baz, if you put them directly in persistent term, the reading cost of one of them is effectively the cost of finding the pointer. If you put them inside a map, you pay the same cost for finding the pointer plus the cost of looking up the given key in the map, which is super cheap. So I would say storing one map is the way to go in your case, you get faster writes, and potentially even faster reads if you need to read the keys at the same time?

TD5 · May 14, 2024, 1:37pm

I hoped to avoid needing to do two lookups, and to avoid building the map itself, if possible.

starbelly · May 15, 2024, 12:37am

I wonder how that would play into and if it would help at all in regard to the literal area collector. I suppose if you have huge a persistent term table, the cost of just copying the table is non-negligible, but I have found the bigger impact is the “global gc” that must happen when complex (non-immediate) terms are inserted, updated, or deleted (literal area collector). This is especially true if you have hundreds of thousands to millions of processes running. At first, my thought is it would not help at all. however, if you have multiple persistent terms you have to update, and it’s possible do one literal area collection for them all vs N, then that could help a lot.

Still, what Jose was saying makes sense to me as a best practice

garazdawi · May 15, 2024, 6:59am

I’m unsure whether we would like to do batch inserts into persitent term. Doing the lookup into a map as suggested should be enough for most applications. However if a prototype would show significant improvements in an application (and not just a micro benchmark), I think we would be open to the idea.

The way the literal GC works today it check a single pointer range for terms that should be GC:ed. So for there to be any benefit you would have to updates the exact same set of keys as you initially did the inserts for.

I’m unsure how much work it would do to allow multiple literal ranges to be GC:ed at the same time, probably not all that much. Though I think we want to discorage applications from updating many many terms, as even if we do have multiple ranges, it would be very expensive to update many keys.

TD5 · May 15, 2024, 8:21am

FWIW, I am only considering insertion of fresh keys, since I expect the persistent_term entries to need to live immutably for the life of the VM.

I am aware I am using Erlang/OTP well outside what would be considered idiomatic, so I don’t want to suggest we do anything which is in conflict with OTP’s general design goals or entices people to do things which are likely to cause them more problems.

starbelly · May 15, 2024, 1:25pm

eh, yeah, the potential for abuse vs maybe a slight performance gain is definitely not worth it. Thank you for satisfying my curiosity though

rickard · May 15, 2024, 3:40pm

It is also not only the literal area GC being made on each process that would be effected by handling multiple literal areas at once. It has the potential of degrading message passing performance in general while the collection is ongoing. This since you need to check whether or not each pointer in the message data points into an area being collected while the collection is ongoing. This is true for all messages being sent in the system while the collection is ongoing, i.e. until all processes have gone through the literal area GC.