I have a question about Whatsapp Backend implementation, in this talk, @alavrik talked about hashing and not consistent hashing to interract with the backend cluster, I wonder how we can scale this backend since we can’t resize the total number of partitions as we can do in
riak_core for example by consistent hashing ?
This is a very old talk, talking about even older things that are no longer in use.
Although, to answer your question, there is no problem with (semi-automated) expansion done through re-partitioning (re-sharding):
- Start with “double writes” when all writes are going to both “old” and “new” partition
- Run a crawler that reads all data from all “old” partitions and writes into “new” partitions
- Switch reads to the “new” partitioning schema and stop double-writing.
- Remove old partitioning.
This method is usually applied to split a single shard in a consistent hash ring, but nothing prevents you from using it at scale to re-shard everything at once.
Waw genius way to keep availability but so expensive because we talk about transition of billions of records, but I think it doesn’t matter because that will not happens constantly(only if we want to add more servers), and that returns to the benefits of the STAND BY secondary partition on the peer node from the same island when a node has a sudden crash but, is this more efficient than consistent hashing ?
It is indeed a very rare event. And it’s surprisingly easy to automate, due to simplicity of this method. Which is the best thing about Erlang, - it is so simple that it is indeed beautiful!
I’m not sure how consistent hashing can help for fleet expansion. You’d still need some way to move the data (physically!) from one server to a bunch of others.
Yes you have right, even with consistent hashing you still need to move data from node to an other, but the benefit is to minimize data that should be moved by balancing loads.
If we take
riak_core as a good example of consistent hashing, it’s very similar to whatsapp backend from many sides like
vnode master => API server and
database workers except the nature of persistent data and hashing mechanism, but I think the problem with it (I didn’t read enough code to know) that it can’t make difference between node-down and node-removing and in both cases it will balance vnodes loads via the new hash ring and move the data… but as I said, since the nature of persistent data is not the same, I can’t be sure of anything.
You can take a look at my project, I just implemented the idea and that works fine, but I didn’t try it under high concurrency, the only difference is that I didn’t use the API server which can be a bottleneck, instead of that the frontend server can pick-up directly a worker by double-hashing.