How does Mnesia compare to Postgres?

So I’m coming from a world of almost exclusively PERN stacks and, when comparing the “P” (postgres) to Mnesia… Postgres just seems like a superior, more catastrophically tolerant system.

What do I mean by that?

Well, based on what I have seen with Mnesia so far, if the supervisor crashes? Or the VM? It’s game over. Data gone. It seems like in order to really take advantage of Mnesia, you have to some some form of data backup scheduler running every minute or two to send the current table contents externally to an S3 bucket or even Postgres itself.

Postgres, on the other hand, survives even if the server it is running on literally shreds itself and needs a full hard restart–the data will still be there 99% of the time, backup or not.

Maybe I am misunderstanding the use case of mnesia, or redundancy features/best practices that go beyond a supervisor tree/the VM itself?

Would love to learn more from experts that live and breathe this stuff. I’m open minded and ready to learn where Mnesia fits in and where it doesn’t!

1 Like

If you’re thinking about storing valuable data that you cannot afford to lose in Mnesia: simply DON’T. I knew some companies that did just that, but they had heavily modified Mnesia, to the point where it resembled the open-source version in interface only.

Proper use case for mnesia is building a very fast distributed cache, and as a safer, more convenient alternative to storing data in a file.

1 Like

I imagine one of those companies that did it is Whatsapp for example. But then let’s say we have a situation where we’re handling millions of requests per minute, and we need a VERY fast way (sub 250ms) to store and retrieve user data back and forth… Going the postgres/epgsql route feels like it would add latency, even being fully optimized with poolboy/cached query formats, etc.

What kinds of queries against what kinds of schemas? Are you ok with eventual (soon) consistency on the inserts? We’ve built our own internal Erlang PersistStore that supports our CQRS architecture that we’re looking to turn into a product. If you’re interested in trying it out I think you might find it has what you need.

We were the SEAsia press contact for the Postgres project and have pushed it quite far. It’s definitely what you need for an “enterprise” database and super high resiliency but, yes, that has costs. Mnesia provides reliability by being distributed but you have to code this into your system design. It’s a more minimal db that takes advantage of the distributed capabilities of the BEAM. It could do what you want but would require lots of development to do it right and then you’ve pretty much built a database product (like we did). Our solution uses our own fork of rocksdb (Mnesia wasn’t fast enough for our core use case) as the core database tech underneath but does a lot of things to manage the system making it a more practical database - especially for time series and event-oriented data although we’ve used it for some rather large document stores as well.

– Ben Scherrey

Coming from PostgreSQL, I understand your concerns about Mnesia’s durability. Many teams, including ours, now use Khepri instead. It’s a tree-structured database built on the Raft consensus algorithm (via Ra) that ensures data survives node and VM crashes without needing external backups. You get PostgreSQL-like durability while staying within the Erlang ecosystem, with simpler operations and management.

Check Khepri vs Mnesia benchmark

3 Likes

let’s say we have a situation where we’re handling millions of requests per minute, and we need a VERY fast way (sub 250ms) to store and retrieve user data back and forth…

That doesn’t tell much, what is the proportion of reads to writes?

  1. In a clustered environment stock mnesia’s write throughput has a fundamental limit, because it has to broadcast transactions to singleton processes at remote nodes (mnesia_tm). This is fine when you use mnesia to store mostly static data, like configuration (I think it was originally designed for that exact purpose), but when writes exceed a certain frequency, mnesia_tm tends to explode, especially in large clusters.
  2. Durability options include disk_copies (copy of all data will be stored in RAM), and disk_only_copies (never ever use them, because they use dets).
  3. One can use a 3rd party storage backend, RocksDB being the most popular option. But it has limitations as well, like doubling of the WAL.

We use mnesia because we’ve forked it and fixed it for our specific purposes, but that was not a small feat.

Very very write heavy, 95/5 frontloaded, over time as user data is collected it would balance out to more of a 50/50 situation. Postgres equivalent would be INSERT/UPSERT. Eventual inserts would be alright, as long as that eventuality is within a couple of seconds.

To start, 95% would be insert/upsert queries. Over time, this percentage would gradually drift to a more 50/50 situation, although it would likely never actually get to that point. It seems like, when it comes to mnesia in general, the options are ‘be ok with everything potentially exploding and losing the data’ or ‘fork and modify it into something more resilient’ haha. And even with the latter option, there’s nothing to say things wouldn’t get squirrely after X million records have been added. I’ll take a look at rocks. From what I can tell, mnesia MAY be good for some sort of short term caching solution? But that would be about it.

Wonder how this would actually compare to postgres, managed via poolboy.