How are you approaching distributed resilient systems? Any best practices to share?

scherrey · May 4, 2024, 2:37pm

I’m curious how many erlang orgs have deployed systems that are doing live code upgrades and/or true distributed/resilient systems. If you’re doing this, are you rolling your own distribution model, using riak-core, using raft, something else? Trying to discover what are the latest “best practices” in the space. We’ve got a CQRS architecture that we’re looking to scale out and eliminate all single points of failures (including data centers). Our backend is almost entirely Erlang but we also have Elixir/LiveView for web-facing apps. I don’t find much recent conversations about this kind of resiliency and would definitely like to explore this more with the community. Appreciate any pointers/suggestions/conversation!

(I did see the recent post regarding using raft to distribute a db. It reminded me to ask this question that’s been on my mind for a while before we go committing to something without understanding our options more fully.)

arcanemachine · May 5, 2024, 4:16am

This is a very engaging talk on the subject:

scherrey · May 31, 2024, 6:49am

I thought I replied to this already but don’t see it here. Anyway - yes, Bryan’s efforts are epic and I’ve followed his projects for some time. I’m more curious how others are getting along or are people just not taking full advantage of what the BEAM has to offer in terms of resiliency and distribution of compute power? I see riak_core as something mentioned but it doesn’t seem to be getting much love. We can’t be the only ones out there trying to do this stuff.
thanks,

– Ben Scherrey