How does data replication work in Mnesia?

Hello everybody,
I have a couple of questions concerning Mnesia :
1- How data replication works by default ? is writting require full locking of the table ?
If so then we will lost all performance gained by Dirty operations
2- It is known that each table maximum size is 2GB(also each table fragment is considered as isolated table) so we have to add fragments each time against scale so what is the best practice for that please ? starting the table with large fragments ? manual fragmentation constantly when the database growth ? automatization of fragmentation process ?
I will be very thankful for your help.

  1. In a transaction a write or read only takes a lock on {Table, Key} combo.
    In dirty no locks are taken.
  2. The 2GB is only relevant for disc_only_copies, there is no limit for disc_copies (only memory and disc space).
    If you don’t want to use disc_copies than you can plugin another backend for disc storage.
    See for example GitHub - aeternity/mnesia_rocksdb: A RocksDB backend plugin for mnesia, based on mnesia_eleveldb
3 Likes

Thank you @dgud for your reply and for your suggestion too, however it still unclear for me how data is replicated across the network when a dirty write is called, I found this in the docs :

Dirty operations are written to disc if they are performed on a table of type disc_copies or type disc_only_copies. Mnesia also ensures that all replicas of a table are updated if a dirty write operation is performed on a table.

But nothing is mentionned about which locking mechanism is used for replicas or no locking at all

In dirty operation no locks are taken.

A Dirty write just sends a message to all nodes that says write this to the table.
No guarantees about order or anything really.

2 Likes

I have given some presentations about Mnesia in the past, that might provide some answers.

Mnesia for the CAPper
Video: Ulf Wiger - Mnesia for the CAPper on Vimeo
Slides: https://www.erlang-factory.com/upload/presentations/286/MnesiaEFLondon2010.pdf

Mnesia_rocksdb 2 - Mnesia backend plugin on steroids:

Note that if the direct access API in mnesia_rocksdb is used, that bypasses all mnesia’s checks and replication. If you want replication, you use mnesia’s transactions.

I don’t think you should assume automatically that performance is killed when using transactions. For one thing, you presumably replicate because you want redundancy guarantees, and as Dan has mentioned, dirty writes don’t really give you those - they provide a form of best-effort replication.

There are things that can be done, such as sticky locks, which will speed things up for certain update patterns.

For extra credits, the backend plugin system has a hook for table sync. You could try using it to implement a more scalable table sync protocol. To my knowledge, no one has tried this for real.

BR,
Ulf W

6 Likes

@dgud thanks for clarification, this is exactly what I was looking for, if so then atomicity is not guaranteed by the network and the best practice I think to use dirty operations is to start mnesia on a centralised node and replicate data manually to a stand-by second node that will take the role when the first fails. (2 nodes per database)
@uwiger great suggestion, I agree that dirty operations should be used only if we have a need for that.
I have see your video and as my understanding you have used another plugin backend for mnesia than the default dets because this later has the 2 GB limit, I can understand your needs for æternity to create a distributed ledger so full nodes don’t have to do anything except downloading the database and start participation, but how about regular databases managed by Admins : why should I use a different backend instead of adding constantly fragments(which also improve lookup performance) since one of the main reasons to use mnesia is to stay in ERTS ? I will be thankful for your answer

@Abdelghani There are other reasons to use mnesia, not least that it offers a very small semantic gap, such that the database access feels very well integrated into the language.

But all solutions have their pros and cons. This is certainly true for database libraries. There is no solution that is right for everyone.

If it fits your problem to run a fragmented database, then by all means. You can of course combine this with an alternative backend if you want, e.g. keeping fragments in rocksdb.

Being able to keep all data in RAM is great, and disc_copies have wonderful performance. But if your data set outgrows your existing hardware, simply adding RAM can be a costly proposition.

1 Like

Thank you @uwiger,
yes that’s right we should move with the appropriate solution and what is good for someone is not necessarily good for another one and both solutions as you said have pros and cons.