We had some (or a lot) of issues using Mnesia for MongooseIM clusters. We were using it for in-memory cache (i.e. no persistent on disk data storage).
First idea was to move to Redis, but it had its own issues.
Another idea was to write bare-bone implementation of dirty operations of Mnesia, but in a way we can tweak it if we have some issues. And issues with mnesia include:
Dirty operations could get stuck on other nodes if a remote node gets restarted quickly.
Joining node could crash cluster if there are a lot of write operations.
Issues with not-matching Mnesia table cookies.
So, we wrote our own mini-library, which allows to write into several ETS tables across the cluster.
We also added pluggable discovery backends, so nodes could find each other. We use RDBMS to find a list of nodes in the cluster, and there is a way to have a list of nodes in the file. Theoretically, it should be easy to add more backends, like k8s API or mDNS.
Performance is similar to Mnesia dirty operations. Instead of mnesia_tm, there is one gen-server per table, handling the remote messages (so, should be less bottlenecking).
There are still some issues with global and prevent_overlapping_partitions=true settings in Erlang/OTP, though. They both do not like conditions when some nodes cannot easily create connections to other nodes. So, discovery is not 100% bullet-proofed.
Example of usage of the library is in MongooseIM
The link to the library:
Khepri is a tree-like replicated on-disk database library for Erlang and Elixir. (and uses Raft).
CETS is just in-memory ETS, but writes would be send to all nodes.
Plus some extra logic to join nodes/handle loosing nodes.
Basically we needed to store a list of connected clients (their sessions), so we can convert JID (username) to a Pid (process ID). When we loose a node, we remove records with Pids from that node.
Probably CETS is faster, because simpler. But the stored data should be simpler too.
Also, partition tolerance is still a tricky topic, especially with prevent_overlapped_partition=true. It feels like global and prevent_overlapped_partitions are a bit broken in OTP, some tests say that non-related nodes could be kicked from the cluster when disconnect happens. And of course, DNS discovery is a bit broken with k8s, because sometimes DNS name could not be resolved ASAP from all nodes in the cluster, when a new node appears… So, there are some tests and logic related to it (i.e. cets/src/cets_dist_blocker.erl at main · esl/cets · GitHub and MongooseIM/src/mongoose_epmd.erl at master · esl/MongooseIM · GitHub). Basically, we had some “fun” testing erlang-distribution with k8s So, Mnesia is not the only source of grief, dist logic and global is another source of interesting race conditions and networking bugs
I guess I need a separate post describing k8s DNS issues and default full meshes configurations
Generally, you want to use a custom disco backend. But cets_disco_file is good for an example, it would reload the file automatically, when you add a new node.
I’m curious about this issue. I read somewhere that dirty operations are just send a message to another node and forget about it. What is the problem you’re referring to?