Chapter 5: Design Consistent Hashing

0:00 / 0:00
Report an issue

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome back to the Deep Dive.

Today we're looking at a stack of research focused on, well, one of the biggest challenges in modern infrastructure.

How do you scale without, you know, breaking everything?

We're diving into distributed systems and our goal here is to really understand the genius behind consistent hashing.

It's this mechanism that lets massive systems add or remove servers without everything just collapsing.

Right.

Without that, that data migration storm you always hear about.

And that storm, it's pretty much the inevitable failure point in almost any simple traditional scaling solution.

And that's called the rehashing problem.

That's the one.

It's based on the surprisingly simple but really flawed load balancing idea where you figure out which server gets which data using a basic formula.

It's just server index equals hash of the key, modulo n.

And n is just the number of servers you have at that moment.

The current total, yeah.

And that formula works perfectly.

I mean, as long as your system never ever changes.

Right.

Which is never.

Let's use the sources example because it makes it so clear.

Imagine you've got four servers.

So n is four.

You also have, say, eight different keys for your data.

OK.

When you hash those keys and do module four, they spread out really nicely.

Key zero goes to server one.

Key one goes to server zero.

It's all balanced.

Everybody's happy.

It is perfectly balanced.

Yeah.

But only until you have to change n.

And in the real world, things are never fixed.

Servers fail or hopefully your traffic grows and you need to add a new one.

OK.

Let's take the failure scenario first.

It's always more dramatic.

So server one just crashes and drops from four down to three.

Right.

And the hash values for the keys, they haven't changed at all.

They're still the same.

But when you run that same hash value, modulo three,

everything changes.

Almost every single key gets a brand new server index.

And this is the moment it all falls apart.

This is it.

The clients, the apps trying to get data for, say, key zero, they're still using that same old hash value.

They run the math with the new n equals three.

And the formula tells them key zero is now on server zero.

But it's not.

The data hasn't physically moved there.

Exactly.

So the clients connect to a server that has no idea what they're talking about.

And this isn't just one confused client.

This is system wide because n changed.

Ninety percent, maybe more, of your data has been, well, computationally reassigned to the wrong home.

It results in this sudden cascading failure.

A massive storm of cash misses.

The clients just cannot find their data.

So if you're building anything that needs to scale, this hash module and approach is basically dead on a rifle.

We need a better way, a way to change the number of servers without forcing all the data to move at once.

And that need is what leaves us straight to the elegance of consistent hashing.

It tackles this problem head on.

If you look at the core benefit, it's just dramatic.

When you resize the server pool,

only n divided by n keys need to be mapped on average.

Right.

I want to really underline that gap.

We're going from a world where almost all the keys have to move, which is this huge disruptive data transfer that costs bandwidth and CPU.

A total system stall, basically.

To a world where only a tiny fraction is affected.

I mean, that's the difference between a system crashing and a system just smoothly scaling.

It is.

So how do we do it?

The trick is to stop using modular math and start thinking

geometrically.

It all starts with defining the hash space.

Okay, so we pick a strong hash function, something like SHA1.

It gives us this gigantic theoretical number line.

It runs from zero all the way up to two to the 160 minus one.

An astronomical range that basically guarantees no two things will hash to the same spot.

And then instead of a line, you take the two n's, zero and the max value, and you connect them.

You make a circle, the hash ring.

And that rings the foundation.

This is where we get rid of that modular operation.

So instead of key percent n, we just assign positions on this ring.

To the servers themselves.

To the servers and to the keys.

Yeah.

You take a server's IP address or its name, you hash it with the same function, and it lands on a specific fixed point on that ring.

You do the same thing for every single data key.

Everything gets a coordinate on the circle.

So no more division, just mapping points.

Just mapping points.

Okay, so now we have our servers and keys all scattered around this, giant clock face.

So how do we know which server owns which key?

This gives us the core rule for data lookup.

To find out which server stores a key, you start at the key's position on the ring, and you just move clockwise.

The very first server you encounter.

That's the owner.

So key zero lands at, say, 12 o 'clock.

We move clockwise, and we hit server zero at one o 'clock.

Server zero owns key zero.

That's it.

And if key one owns key one, each server becomes responsible for that whole arc of the ring right behind it, anti -clockwise.

All right, let's put this to the test.

The whole point is how it handles change.

Let's scale up first.

We add a new server, server four.

We hash its ID, and it happens to land on the ring between, let's say, key zero and the existing server zero.

Okay, so before server four showed up, key zero was owned by server zero because it was the first stop clockwise.

Correct.

But now server four is sitting in that path.

So when a client looks up key zero, they start at its position, move clockwise, and boom, the first server they see is now server four.

So only key dear has to move to the new server four.

Exactly.

But what about key one, key two, and key three?

They're further down the ring.

Their clockwise paths to their original servers is completely unchanged by server four's arrival.

So they stay put.

Only that tiny slice of keys right before the new server gets moved.

And removing a server is just as clean, I assume.

Let's say we pull server one off the ring.

So all the keys that were assigned to server one are now

orphaned, essentially.

But the rule still works.

Clients looking for those keys start moving clockwise.

They pass the empty spot where server one used to be, and they just keep going until they hit the next server.

Which might be server two.

Let's say it's server two.

So now all the keys that server one used to own are moved over to server two.

But, and this is the key, the data on server zero, three, and the others.

Completely untouched.

They don't even know anything happened.

That gain is just, it's monumental.

But the basic approach we've just described, it has a couple of pretty big flaws, right?

It's not quite production ready.

Right.

We've solved the rehashing problem, but we haven't actually guaranteed that the load will be balanced.

Wait, why not?

If you're hashing things onto the ring, shouldn't it be random and, you know, balanced by default?

You'd think so, but it's a problem of probability when you have a small number of items, in this case servers.

The first major issue is non -uniform partition size.

Okay.

So when we remove server one, its entire section of the ring, its partition, got absorbed by server two.

So server two's workload could suddenly double while server zero and server three have the same amount as before.

That sounds like you're just creating a new hot spot.

That's exactly what happens.

Yes.

And the second issue is just down to bad luck.

Non -uniform key distribution.

Think about it.

We only have four servers.

When you hash just four items onto this enormous ring, there's no guarantee they'll be spread out nicely.

Oh, I see.

You could have server one and server three hash to positions that are right next to each other by pure chance.

Right.

And then you have this huge empty gap on the other side of the ring before you get to server two.

If most of your data keys happen to land in that huge gap.

And now server two gets almost all the work and servers one and three are just sitting there idle.

Total imbalance.

It's a recipe for disaster,

but there's a really clever solution for both of these problems.

It's a technique called virtual nodes.

Sometimes you'll see them called replicas.

And this is what really makes consistent hashing practical.

So instead of mapping one physical server to one point on the ring, you map it to many points.

A single real server is represented by say a hundred virtual nodes, all scattered randomly around the ring.

So server zero isn't just one point.

It's as zero, zero, zero, as zero, one, as zero, two, and so on all over the place.

Ah, so you're using statistics to smooth out the randomness.

You're using statistics to beat bad luck with hundreds of points for each server.

They're much more likely to be spread evenly.

So each physical server now owns lots of little tiny partitions from all over the ring instead of one big one.

And the sources have data on this.

The every server has about the same load.

Yeah.

And it works with say a hundred virtual nodes per server.

The standard deviation might be around 10%.

So some servers are still doing 10 % more work than average.

But if you double that to 200 virtual nodes, that standard deviation drops dramatically, maybe down to 5%.

The load gets much, much tighter around the average, fewer hotspots, better performance.

But it's not a free lunch.

There's a trade off.

Of course.

More virtual nodes means you need more memory just to store the metadata for all those points on the ring.

So system designers have to tune that number.

It's a balancing act.

Okay.

Let's get into the mechanics of that.

When a server is added or removed, how does the system figure out exactly which keys need to move?

It uses a directional rule again, but this time applied to the servers.

Let's go back to our clock face visual.

If we're adding a server, let's call it S4, the keys that need to move are in the section immediately anti -clockwise from it.

So S4 gets added at the three o 'clock position.

To find the affected keys, you start at S4 and you move anti -clockwise.

Anti -clockwise until you hit the first existing server.

Let's say that's S3 at the two o 'clock position.

All the keys between two o 'clock and three o 'clock used belong to S3.

Right.

Because S3 was the next clockwise server for them.

But now S4 is there.

So all of those keys in that one segment have to be moved from S3 over to the new server S4.

That's clean.

So you're only stealing keys from one neighbor.

Now, what about removing a server, say server one?

It's similar to define the affected range.

You still start at the remove node S1 and move anti -clockwise until you hit the first server, which we'll call S0.

All the keys between S, error on S1 were owned by the now gone S1.

Okay.

So we've identified the keys that need a new home and their new home is the next server on the ring moving clockwise from the one that was just removed.

In our example, that would be server two.

So the keys from the S0, S1 range all get redistributed to server two.

So the range is defined by moving backwards, but the destination is found by moving forwards.

That ensures every key still has an owner.

And these operational rules are what allow massive systems to scale horizontally with zero downtime.

At the end of the day, consistent hashing gives you a system that just minimizes key redistribution when you scale.

Which is the absolute cornerstone of any modern high availability system.

And with the virtual nodes on top, it also solves the load balancing problem and make sure the data is spread evenly.

And it helps mitigate the hotspot key problem, which is huge for anything that might go viral.

Oh, absolutely.

Yeah.

If you have one key like profile data for a celebrity that gets billions of views without this, all that traffic could just hammer a single server and take it offline.

But with virtual nodes, that one key's load is effectively managed across many small partitions owned by different physical servers, keeping everything stable.

And this isn't just theory.

I mean, this is the backbone of the internet.

Our sources list huge platforms that depend on this.

You're talking about Amazon's Dynamo, Apache Cassandra, the Discord chat application, Akamai's entire content delivery network, and even things like Google's Maglev network load balancer.

It's everywhere.

It really is impressive how a geometric idea solves such a tough algebraic problem.

So as a final thought for you to chew on, we talked about that trade off between perfect balance, low standard deviation, and the memory cost of having tons of virtual nodes.

Think about two very different systems, a low latency chat app like Discord on one hand, and a high durability database like Amazon Dynamo on the other.

Okay, very different requirements.

How would their specific needs,

instant message delivery versus guaranteed data durability, how would that influence where they draw the line on the number of virtual nodes they use?

That's a great question.

Where do they each decide that balance point is?

It really shows that even with a solution as elegant as this, design always comes down to trade offs based on what you're actually trying to build.

A fantastic deep dive into the geometry that runs our distributed world.

It really was.

Thank you for joining us for this deep dive into consistent hashing.

We'll catch you next time.

β“˜ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers
Consistent hashing is a fundamental technique for distributing data and traffic uniformly across server clusters while minimizing the computational and operational costs associated with scaling infrastructure. Traditional load-balancing approaches using modulo arithmetic create significant inefficiencies when server pools change size, forcing the remapping of nearly all existing keys and triggering widespread cache invalidation that degrades system performance. Consistent hashing addresses this limitation by mapping both servers and data keys onto a circular hash space, typically generated through cryptographic functions like SHA-1, where key placement is determined by clockwise traversal along the ring until a server node is located. This mechanism dramatically reduces the fraction of keys requiring redistribution, limiting remapping to approximately k/n keys when servers are added or removed, where k represents total keys and n represents the number of server nodes. The basic circular model, however, produces uneven data distribution patterns and creates concentration problems where certain keys experience disproportionate access loads. Virtual nodes resolve these issues by assigning each physical server multiple virtual identifiers distributed strategically across the ring, allowing finer-grained load balancing and reducing variance in partition sizes. The number of virtual replicas per server represents a critical design trade-off between distribution uniformity and memory overhead for maintaining node metadata. When servers join a cluster, affected keys transition anticlockwise from the new node to the adjacent upstream server. During server removal, keys between the departing node and the preceding server clockwise migrate to the next available server. This approach enables predictable, incremental scaling without requiring expensive global recomputation of key assignments. Consistent hashing powers production systems at scale, including Amazon Dynamo, Apache Cassandra, Discord's infrastructure, and Akamai's content distribution network, demonstrating its practical utility across diverse distributed computing contexts.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML β™₯