Mnesia_rocksdb - fast and scalable mnesia backend

uwiger · June 16, 2025, 12:54pm

The mnesia_rocksdb backend plugin offers support for very large disk-based tables in mnesia. It also offers a low-level access API, enabling efficient use of the Rocksdb access APIs while using Mnesia as the management framework: Tables are created via Mnesia, and can be access either using the low-level API or the normal Mnesia API.

We at QPQ AG are improving mnesia_rocksdb, in a new location:

uwiger · June 19, 2025, 8:35pm

the latest PR: adding fold_reverse() and select_reverse().

uwiger · July 2, 2025, 12:50pm

New PR: #7 - Add mrdb_index:select() et al - mnesia_rocksdb - QPQ AG

The basic idea is this:

Index plugins allow for derived index values, operating on the whole primary object

mnesia_rocksdb indexes are ordered sets

To efficiently traverse indexed tables, the mrdb_index module has provided fold and iterator functionality

With mrdb_index:select[_reverse](Tab, Ix, MatchSpec, [, Limit], the match spec operates on a set of tuples {IndexValue, ActualObject}, so that filtering can be done simultaneously on the index value and the object itself.

With structured index values, prefix matching is done on the bound prefix of the value, allowing for efficient searching.

uwiger · July 11, 2025, 10:17am

To illustrate the utility of this, here are some snaps from an indexing plugin in one of our blockchain nodes. The plugin uses the indexing plugin feature in Mnesia, registering callback functions that derive secondary keys from a given table. The callbacks can operate on the whole object.

Combined with the ordered set nature and efficient iterators, one can create complex secondary keys that are suitable for range queries.

In the example here, an index helps locate smart contract creation transactions. This particular transaction resided at height 141648 on the chain, so a sequential search had been unworkable. The select operation, combining an index traversal with object lookup, took ca 10 ms.

uwiger · July 11, 2025, 11:16am

Of particular note: The index above is a sparse index. When the callback returns [], no index entry is created.

zabrane · July 11, 2025, 1:42pm

@uwiger Hi Ulf

Thanks for the update !

We moved from Mnesia to Khepri a couple years ago and are happy with it, but I’m exploring different approaches. The prefix key optimization and sext-based ordering look compelling for our query patterns.

Are there any comprehensive tutorials or real-world examples beyond the code snippets in the docs? I’m particularly interested in migration strategies, performance tuning, and applications that demonstrate the RocksDB backend at scale.

Any substantial applications or case studies you’d recommend looking at?
Thanks for the continued work on this!

uwiger · July 11, 2025, 2:26pm

So little time …

Apart from the stuff I’ve written in the mnesia_rocksdb docs, in part trying to address some blind spots in the mnesia docs, I did give a presentation about mnesia_rocksdb, where I covered some consistency and performance issues, and explaining the approach to use mnesia mainly as scaffolding and then using a mnesia-like API against Rocksdb.

In terms of using Rocksdb at scale, there are some users out there that use RocksDb directly, i.e. not from Erlang. In the blockchain world, I guess the Ethereum blockchain is one of the biggest examples I know, with a total database size of ca 1TB, if you sync the whole blockchain. I believe most of the big implementations use LevelDB, but at least the Rust client uses Rocksdb.

In blockchains, the performance-critical part is when synching the database - i.e. catching up to the top from scratch. This can lead to very high write pressure, which gave us problems when viewing it as a simple backend and using the Mnesia transaction support. This is explained in the presentation above.

I would mainly recommend mnesia_rocksdb for single-node databases. While the backend plugin system does have hooks for more sophisticated table sync protocols, this is not something I’ve explored, since I’ve worked on blockchains for the last 7-8 years, and we have no use for that type of distribution.