Priority-based Leadership / Consensus

adrianroe · February 7, 2022, 4:15pm

We are working on a project that will be launching groups of servers in the cloud in multiple regions across the world. Each of these groups will be responsible for publishing a stream of data to a third party but it is extremely important that precisely one server from each group publishes the data and having no publications is preferable to having two or more.

To complicate things further, each group has a preferred node that, in a perfect world, should be doing the publication (e.g. if all of the servers are available, I’d rather the one in Frankfurt publish, but if it isn’t the London server is next, then the Dublin server…) Once a server starts publishing it should stay as the publisher throughout (or at least until it fails). So if, for example, the Frankfurt server was offline at the start of the event and London published instead, even should Frankfurt later be available we should stick with London.

We used to use etcd for the purpose but found its resource usage to be extremely high (disk activity in particular) and are looking to implement our own solution on top of an existing “academically underpinned” (so built on Raft, Paxos etc) library.

We’ve been working on a state machine on top of the excellent Ra library by the RabbitMQ team and that very nearly delivers what we want. Our implementation made assumptions about Ra Leadership transfer behaviour that turned out not to be the case (our bad, not the rabbitMQ team’s!) and we can probably work around the issue, but the code will be considerably more complex and the timeouts around failures will be longer than we had hoped.

Before simply ploughing on I wanted to reach out to the community to see whether we are simply re-inventing the wheel. Certainly https://github.com/uwiger/locks also looks to be a very strong candidate (having been the subject of academic analysis by Thomas Arts), albeit there has not been much activity on the project for a while (and the tests fail on Erlang 24, probably due to a change in how BEAM forms are represented, so the bug could well be with the test). The lack of activity might basically be because it all just works

But I am well aware that within the Erlang community we cannot be alone in tackling problems of this sort, so all opinions gratefully welcomed. We are more than happy to put some time into an existing project if required should the current authors welcome the idea and be prepared to accept PRs / provide some guidance.

Thanks in advance,

Adrian