Switching between 2 ETS tables for caching

zabrane · November 24, 2022, 6:47am

Hi there,

I’ve got 02 ETS tables and would like to switch between them atomically. Let me explain:

Tab1: contains latest cached data
Tab2: contains old cached data

After a certain period of time, id like to apply this logic:

delete Tab2 completely (maybe optional)
rename Tab1 as Tab2
create an empty Tab1

Between step 2 and 3, i might receive queries hitting Tab1 which doesn’t exist yet. And hence my problem.

Can I make the above steps look atomic?
If yes, can I generalize the idea to more than 02 tables?

Thanks
/Zab

max-au · November 24, 2022, 7:37am

There are two approaches known to me:

Keep both “latest” (usually called “active”) and “old” in the same table, wrapping the original key in a tuple of {TabName, Key}. It’s also often extended to the concept of the “striped table”.
Store active table name in the persistent_term, so fetching data from the cache looks like ets:lookup(persistent_term:get(active_table_name), Key). When you’re about to swap the tables, just call persistent_term:put(active_table_name, NextTableName).

Of course both approaches are easy to generalise.

Unfortunately there is no atomic swap operation swapping two names ETS tables. It might be worth adding it, it appears to be a common ask for OTP.

juhlig · November 24, 2022, 9:32am

You must already have a process that owns the tables and handles the turnover, right? So what I would do here is try to query Tab1. If that fails because Tab1 is currently missing, meaning turnover is happening just at that moment, send a message to and wait for a reply from the owning process. The owning process should (receive and) reply to such messages only after it has finished the turnover, meaning Tab1 will be there again, and the querying process can try accessing Tab1 again.

Depending on how the cache is used (query frequency, turnover frequency, …), this waiting may actually defeat the entire purpose of having a cache in the first place, so it could also be possible to just proceed as if the queried value was not present in the cache at all on failure.

NelsonVides · November 24, 2022, 10:24am

That is exactly what this caching library already implements: GitHub - esl/segmented_cache: Modern, performant, and extensible, Erlang in-memory cache, hopefully you can already just use it. Disclaimer, I’m the author )

How to make it look atomic, and generalised to more than one table? Keep an atomic in a persistent term that points to the current table to start from. Then querying processes will read that atomic and cycle through the ets tables starting at the index the atomic points to. A background process periodically increments the atomic (in a rotatory way), and cleans up whatever was there in the table that is now considered last after rotation.

zabrane · November 24, 2022, 7:19pm

Thank you guys. Exactly what i was looking for.

tsloughter · July 23, 2023, 9:40pm

Just happened across this old thread and thought I’d mention the implementation we have in OpenTelemetry: https://github.com/open-telemetry/opentelemetry-erlang/blob/main/apps/opentelemetry/src/otel_batch_processor.erl#L77

It uses a persistent term to store the atom name of the table so it can switch them (there is no GC because an atom is 1 word).

An operation in ETS would be very useful though.