We had some (or a lot) of issues using Mnesia for MongooseIM clusters. We were using it for in-memory cache (i.e. no persistent on disk data storage).
First idea was to move to Redis, but it had its own issues.
Another idea was to write bare-bone implementation of dirty operations of Mnesia, but in a way we can tweak it if we have some issues. And issues with mnesia include:
Dirty operations could get stuck on other nodes if a remote node gets restarted quickly.
Joining node could crash cluster if there are a lot of write operations.
Issues with not-matching Mnesia table cookies.
So, we wrote our own mini-library, which allows to write into several ETS tables across the cluster.
We also added pluggable discovery backends, so nodes could find each other. We use RDBMS to find a list of nodes in the cluster, and there is a way to have a list of nodes in the file. Theoretically, it should be easy to add more backends, like k8s API or mDNS.
Performance is similar to Mnesia dirty operations. Instead of mnesia_tm, there is one gen-server per table, handling the remote messages (so, should be less bottlenecking).
There are still some issues with global and prevent_overlapping_partitions=true settings in Erlang/OTP, though. They both do not like conditions when some nodes cannot easily create connections to other nodes. So, discovery is not 100% bullet-proofed.
Example of usage of the library is in MongooseIM
The link to the library:
Khepri is a tree-like replicated on-disk database library for Erlang and Elixir. (and uses Raft).
CETS is just in-memory ETS, but writes would be send to all nodes.
Plus some extra logic to join nodes/handle loosing nodes.
Basically we needed to store a list of connected clients (their sessions), so we can convert JID (username) to a Pid (process ID). When we loose a node, we remove records with Pids from that node.
Probably CETS is faster, because simpler. But the stored data should be simpler too.
Also, partition tolerance is still a tricky topic, especially with prevent_overlapped_partition=true. It feels like global and prevent_overlapped_partitions are a bit broken in OTP, some tests say that non-related nodes could be kicked from the cluster when disconnect happens. And of course, DNS discovery is a bit broken with k8s, because sometimes DNS name could not be resolved ASAP from all nodes in the cluster, when a new node appears… So, there are some tests and logic related to it (i.e. cets/src/cets_dist_blocker.erl at main · esl/cets · GitHub and MongooseIM/src/mongoose_epmd.erl at master · esl/MongooseIM · GitHub). Basically, we had some “fun” testing erlang-distribution with k8s So, Mnesia is not the only source of grief, dist logic and global is another source of interesting race conditions and networking bugs
I guess I need a separate post describing k8s DNS issues and default full meshes configurations
Generally, you want to use a custom disco backend. But cets_disco_file is good for an example, it would reload the file automatically, when you add a new node.
I’m curious about this issue. I read somewhere that dirty operations are just send a message to another node and forget about it. What is the problem you’re referring to?
Dirty async just sends operation to all nodes and returns immediately.
Dirty sync operation waits for all nodes to return ack.
But it does not create monitors to the remote processes, instead it waits every 5 seconds and checks if the remote nodes are still in the list of alive Mnesia nodes.
So, if node reappears in the list very quickly, the dirty_write would think that it still needs to wait. But it will not receive anything, because the reincarnated node does not know anything about the fact that it should ack something.