Consistent Hashing: Algorithmic Tradeoffs
Like this article? Buy me a coffee.
Here’s a problem. I have a set of keys and values. I also have some servers for a key-value store. This could be memcached, Redis, MySQL, whatever. I want to distribute the keys across the servers so I can find them again. And I want to do this without having to store a global directory.
One solution is called mod-N hashing.
First, choose a hash function to map a key (string) to an integer. Your hash function should be fast. This tends to rule out cryptographic ones like SHA-1 or MD5. Yes they are well distributed but they are also too expensive to compute — there are much cheaper options available. Something like MurmurHash is good, but there are slightly better ones out there now. Non-cryptographic hash functions like xxHash, MetroHash or SipHash1–3 are all good replacements.If you have N servers, you hash your key with the hash function and take the resulting integer modulo N.
This setup has a number of advantages. First, it’s very easy to explain. It’s also very cheap to compute. The modulo can be expensive but it’s almost certainly cheaper than hashing the key. If your N is a power of two then you can just mask off the lower bits. (This is a great way to shard a set of locks or other in-memory data structure.)
What are the downsides of this approach? The first is that if you change the number of servers, almost every key will map somewhere else. This is bad.
Let’s consider what an “optimal” function would do here.
When adding or removing servers, only 1/nth of the keys should move.
Don’t move any keys that don’t need to move.
0 Comments