Mnesia new feature for eventual consistency with CRDT

CuSO4 · October 17, 2022, 9:59pm

Hi,

I am thinking about adding a new eventual consistency model for Mnesia. In particular, this is trying to address the question that when we are doing dirty async operations on Mnesia, according to the doc: “notice that there is a risk that the database can be left in an inconsistent state if dirty operations are used to update it”, and from mnesia documentation " Determining what data to keep after a communication failure is outside the scope of Mnesia". Both of these problems can be solved to some extend with CRDT (conflict-free replicated data types), and this has already been implemented by databases like Riak and Redis.

I wanted to start by treating each mnesia table as, say a set (we can think about bag, etc later on). And this can be a G-set or some other similar type of CRDT set. In this way we allow two replicas to converge even when they receive asynchronous operations. Later on, we can do more fine grained resolution such as allowing a filed of a record in a mnesia table to be a CRDT, etc.

I will start working on this on a fork of the otp repo, and then if this is goes well, consider PR it to the main repo. If anyone has any feedback/suggestions, please do let me know.

Thanks!

CuSO4 · May 2, 2023, 5:20pm

The project is now at a stage of working research prototype. In case any one wants to take a look at it (GitHub - Vincent-lau/otp: Erlang/OTP).

I am also considering submitting a PR for this feature to be included in the main OTP branch, although I am aware this might involve quite a lot of polishing. If anyone has advice on this, please let me know.

jchrist · May 5, 2023, 7:58am

I just watched your demo linked in your issue and I just wanted to say I’m super impressed by your project

Having this integrated into OTP would IMO be a great addition for mnesia. There’s also an older ticket for the built-in distributed application functionality becoming tolerant to netsplits and I wonder whether CRDTs could be used for that as well. But I have to admit I don’t fully understand the science behind it, Riak was like magic to me when I used it

One question I have: If the nodes that are affected by the netsplit are restarted, will the CRDTs be reloaded from disk?

CuSO4 · May 5, 2023, 2:31pm

Hi, thanks for watching the demo. Glad you liked it. Regarding your question on the distributed application and net split, I am not that familiar with that part of Erlang, but I am guessing a stronger consistency model provided by protocols like Raft or Paxos would be appropriate. My implementation of Mnesia uses CRDTs, which always favours availability and then allow them to merge later, maybe this could be used as well, depending on how much guarantee is needed for the distributed application.

Regarding your question on restarting the Mnesia nodes. I am currently only considering net splits and not node failures so there is actually no need to stop the node, and the resolution will happen automatically when the partition heals. I would imagine it is fairly straightforward to add support for persisting data onto the disk, etc.

Let me know if you have more questions, and I plan to write something to explain the principle behind this “magic” in the future and will post it here at some point.

jchrist · May 18, 2023, 2:54pm

I have a short question about your pull request: I have been looking into a way to make an OTP-“native” (i.e. no dependency to inet_tcp_proxy like here) way to test network partitions in the test suite - I want to see how mnesia’s majority mode deals with recovery from it. Have you tested some other solutions apart from inet_tcp_proxy? I saw some hacks with changing cookies around, like here in Riak’s test library but I’m not sure that it’s comparable, because in that case I believe you have a “clean” disconnection…

CuSO4 · May 18, 2023, 8:43pm

No I was not able to find anything along that line either for simulating network partition. That’s why I opted for inet_tcp_proxy rather than doing it natively in OTP. I think most of the answers online are to do with either distribution protocol or using cookies.

I guess another choice is to do this at OS level (drop ip tables, etc) as well rather than inside OTP, which could be a “less clean” way.

rfk23 · May 20, 2023, 10:18am

We opted to use OS level commands to emulate network partitions and network latency. This script may be used for inspiration: mria/slowdown.sh at main · emqx/mria · GitHub

jchrist · May 20, 2023, 11:57am

Thank you two very much, tc from iptables looks like a great way to test