Cache in process or ets - which is better in your opinion?

dominic · October 12, 2022, 7:17am

cache in process:

1. using mysql/mongodb to save data
2. using proecess to cache data
3. the data change frequently by user
4. sync to mysql every X second

possible problems:

1. data had changed(not sync to mysql)
2. process crash
3. data lost

example for ets:

1. using mysql/mongodb to save data
2. using ets to cache data
3. read/write from ets
4. (maybe)sync to mysql every X second

possible problems:

1. saving data as row(just like a row in mysql) is hard to use 
  and it's easy to use when cache in proecess
2. saving data as map/list/record/etc., read/write from ets is expensive

So, process or ets or a better way?
What’s your point

BTW, my company’s (game)project cahce data in process
If it really crash and need to fix, you have to fix it one by one

mmin · October 12, 2022, 7:45am

For a process cache you shouldn’t use periodic sync if you want your data to be consistent with DB. On each modification of data, you should either invalidate cache entries that contain modified data and perform modification or perform modification in DB and then put modified data back in the cache (if necessary). “Source of truth” should be DB, not cache, otherwise the DB is just a backup (not necessarily correct). Crash of the cache process could result in a small performance downgrade, but should not cause any other consequences. Cache shouldn’t be considered a critical component while retrieving data. Other details depend on your cache policy and what you want to achieve with this. After all, if cache crashes, supervisor should restart it and then the cache should fill again. If you want to have whole DB as cache, then consider using mnesia with disc_copies.

dominic · October 12, 2022, 12:48pm

Agree

But, I’m puzzled about

for example:

store(Id, K, V) ->
    %%  Supposing Id is exist
    [{_, HugeList}] = ets:lookup(table, Id),
    ets:insert(table, {Id, [{K,V}|HugeList1]}).

max-au · October 13, 2022, 2:01am

Think of ETS as a way to access process state circumventing message exchange.

When doing lookup you’re essentially accessing state of a process that owns (hosts) the table, but without synchronisation that message exchange provides. Hence ETS operations are usually much faster (up to an order of magnitude) compared to message exchange (via gen_server:call).

But that also means you’re running into all kinds of race conditions (especially for public ETS tables that can be modified by any process in the system). There is a whole chapter of @MononcQc book showing how easy it is to make a cache that would eventually fail (BTW that chapter is a great read/exercise for anyone implementing caches).