Memory usage
Khepri stores everything on disk but also loads everything into memory. That’s how the underlying Ra library, responsible for the replication and consensys works. In the future, we might make something different to only maintain a cache in memory and keep everything on disk.
Mnesia uses ETS underneath, so I believe everything lives in memory too.
MySQL can do clever things here and will certainly have a lower memory footprint for a large set of data.
For your chat service, having the entire chat history for all rooms in memory may be a waste of ressources as users won’t need that often I suppose. You may want to mix Khepri/Mnesia and MySQL (or something other service) to have the most important data at hand in Mnesia/Khepri and but keep the historical data that is not accessed often elsewhere.
Network topology
MySQL is a full featured standalone service and you communicate with it through a network connection or perhaps Unix socket. It is easy to deploy separately from your Erlang application.
Mnesia and Khepri are Erlang libraries which must started and managed from an Erlang application. If you want to host the database on a subset of your service’s Erlang nodes, you need to manage that yourself inside your Erlang application.
Clustering
W.r.t. clustering, I can’t tell for MySQL as I don’t know how it works.
For Mnesia, you cluster nodes for the entire set of tables, but you can tune which table is replicated and how on a per-table basis.
For Khepri, you cluster nodes for an entire store. Everything in that store is replicated to all clustered nodes and written on disk. However, you can configure multiple stores with a different directory to write data and a different set of clustered nodes (or no clustering at all).
When you stop and start nodes, Mnesia and Khepri will behave very differently.
Mnesia will still serve data as along as there is one node running in the cluster. When you stop the cluster and start it again, it will only start to serve data again only after the last stopped node is back online.
Khepri, relying on Ra/Raft, will stop processing writes when there is less than a quorum number of nodes running. Reads will still be possible though. When a cluster is restarted, writes are possible again when a number of nodes is back online.
Conflicts handling and network partition recovery
After a network partition Mnesia and Khepri will behave differently for the same reason as the paragraph above. I can’t tell for MySQL.
Mnesia usually leaves that responsibility to the caller. It emits some events to warn the application above that there was some network partition, but that’s about it.
Khepri follows Raft principles and there is no recovery to perform. Changes to the database can’t happen if there is no quorum in the cluster.
Let’s take an example of a 3-node cluster. There is a network partition where node A can’t reach nodes B and C.
- Mnesia: An event is emitted to warn about the lost node(s) on both side. Changes can still be made to all nodes during the network partition. When the network is repaired, another event is emitted and the application is responsible for solving any issues.
- Khepri: Changes can still be made to nodes B and C, however, only (inconsistent) reads are allowed on node A. When the network is repaired, node A applies the backlog of changes it missed from nodes B and C. No intervention is required from the application, however the service was degraded on node A during the network partition.
It’s difficult to give an advice here, it really depends on where you want to put the cursor between availability and consistency.
Conclusion
I think I covered several parts already. Does it help you understand which one or which combination might be best for your project?