Welcome to Last Minute Lecture.
This free chapter overview is designed to help students review and understand key concepts.
These summaries supplement not replaced the original textbook and may not be redistributed or resold.
For complete coverage, always consult the official text.
Welcome back to The Deep Dive.
Today, we are pulling back the curtain on something truly foundational to the modern internet, the news feed.
I mean, if you've ever scrolled a timeline on Facebook, Instagram, Twitter,
you've used one of these.
A massive scale system designed to keep billions of people hooked.
It really is the ultimate system design challenge.
You've got this this incredible volume and you need incredibly low latency.
Right.
The core job, as our source material lays it out, is building an architecture for that constantly updating list of stories, status updates, photos, videos, you name it.
Okay, so let's unpack this step by step, because before you write a single line of code, you have to, you know, establish the rules of the game.
What's the scope here?
Scope is everything.
You have to start with the basics.
So functionally, users need to be able to publish posts, and obviously they need to see their friends' posts.
And it has to work on your phone and on your desktop.
Both mobile apps and web browsers, for sure.
But the real challenge, the thing that breaks normal databases, is the scale, right?
Exactly.
That's the whole game.
For our design, we're targeting, say, 10 million daily active users.
DAU.
10 million.
Wow.
And each user can have up to 5 ,000 friends.
So think about that.
Every single post could potentially go out to 5 ,000 people.
And to keep things simple at first, we're just sorting by time.
Newest first.
Reverse chronological, yeah.
We're going to sidestep all the complex ranking algorithms for now.
But we still have to support rich content, images, videos, text, the whole package.
Okay, so 5 ,000 friends, 10 million users, rich media.
That sounds like a firehose of rights and an ocean of reads all at the same time.
How do you even begin to structure that?
Well, the design naturally splits.
It kind of breaks itself down into two core flows that are structurally separate.
Okay.
First is feed publishing.
That's the right operation.
It kicks in the second a user posts something.
The second is newsfeed building, which is the read operation.
So retrieval.
When I refresh my time.
That's the one.
So when you publish, the system has to save that data and then immediately push it out to potentially thousands of feeds.
And when I'm reading, it has to grab posts from all my friends, sort them and show them to me almost instantly.
Instantly.
Yeah.
And the bridge between the app and all that backend complexity is just a couple of simple HTTP based APIs.
Okay.
So what does the client actually see?
What's that interface look like?
It's pretty standard stuff for publishing a post.
It's an HTTP P OST request.
It was DV one.
I feed.
You just need the content and an off token.
And for reading a GT request, get DV one, me feed.
All you need is that off token,
simple interfaces, hiding, um, enormous complexity.
Let's try to visualize that complexity for the publishing flow.
The right side, my request hits a load balancer goes to some web servers, but those servers are really just the entry point, right?
Precisely.
They're the front desk.
They handle initial validation,
authentication, rate limiting, right?
Rate limiting is crucial.
It stops spam and abuse by, you know, limiting how often someone can post.
After that check, the web servers pass it off to specialized internal services.
You've got a post service, which handles actually saving the content to a database and a cache.
You have a notification service, but the real star of the show here is the fan out service.
Ah, the fan out service.
That's the engine room that pushes the new post out to everyone's feed cache.
That's it.
Now for the retrieval flow, the read operation, the path is much cleaner user load balancer, web servers, and then straight to one place.
The newsfeed service and its only job is to get the right stuff from the newsfeed cache.
It's sole responsibility, right?
It orchestrates that whole retrieval.
Okay.
Let's dive into that fan out service because that sounds like where the scalability monster lives.
I mean, if a post has to go to 5 ,000 people, the big question is, do you do that work now or later?
Fan out on write versus fan out on read.
Yes.
It feels like choosing your poison.
Do you want slow reads or do you want to overload your database on every write?
How do you even quantify that risk?
That's a great way to put it.
And the quantification, it really comes down to user expectation.
Users expect their feeds to be real time, instant.
Which points you towards fan out on write.
The push model.
It does.
With the push model, we pre -compute the feed the moment the post is published.
We push it right into the cache for all of that user's friends.
The pro being it's lightning fast to retrieve.
Immense pro.
But the con is a big one.
It's called the hotkey problem.
A celebrity with millions of followers posts something.
And the fan out service could just fall over trying to process 5 million writes all at once.
Plus, you're wasting tons of energy pushing content to users who might not have opened the app in months.
So the alternative fan out on read, the pull model, completely avoids that hotkey problem.
The celebrity posts, the system stores it once and that's it.
Right.
The work only happens when a follower actually refreshes their page.
So the pros are obvious.
No wasted work on inactive users.
No hotkey overload.
But the con must be speed.
It's the deal breaker.
Fetching the news feed becomes incredibly slow.
You're doing all that aggregating posts from thousands of friends, sorting them on demand every single time.
It just violates the low latency requirement.
So we can't have slow reads, but we can't crash the system with celebrity posts.
This sounds like we need a hybrid.
We need a hybrid approach.
It's the only way.
Since speed is so critical, we use the fan out on write, the push model, for the vast majority of users.
So my feed, your feed, we get instant updates.
But for those hotkey users, the celebrities, we flip it.
Their followers use the fan out on read, the pull model.
So when one of their followers refreshes, the newsfeed service has to go, okay, get the pre -computed feed, but also go ask for the latest posts from this specific celebrity and merge them in.
That's really clever division of labor.
Now you mentioned consistent hashing before.
How does that help with the hotkey problem even within the push model?
Ah, good question.
Consistent hashing is a technique to distribute data or post IDs in this case more evenly across your cache servers.
Okay.
See normal hashing, if one server goes down, almost all your keys have to be remapped.
It causes a huge storm of cache misses.
Consistent hashing ensures that when you add or remove a server, only a tiny fraction of the data needs to move.
I see.
So it spreads the load better.
It's crucial.
It ensures that even a sudden burst of writes gets distributed across all the available fan out workers and cache nodes, so no single server becomes a bottleneck.
Fantastic.
So let's trace that detailed push workflow.
The request hits the fan out service.
What happens next?
Okay.
So step one, the service fetches the list of friend IDs.
This comes from the graph database, which is built for exactly this kind of relationship query.
Who follows whom?
Precisely.
Step two, it takes those IDs and gets user info from the user cache.
This is a critical filtering step.
To see if I've muted someone or if the post was only shared with certain people.
Exactly.
Then step three, that filtered list of friend IDs and the new post ID gets sent to a message queue.
Ah, so it becomes asynchronous.
It decouples the whole process.
The user gets their post successful message instantly.
The actual distribution happens in the background.
And fan out workers are just pulling jobs from that queue.
They are.
Step four, the workers pull that data and write it into the newsfeed cache.
And this is where that crucial design choice comes in.
To keep memory usage down, the cache isn't storing the whole post, is it?
No, absolutely not.
We only store posted user mappings in the newsfeed cache.
Just the keys.
Storing the whole post object would be impossibly large.
This keeps the memory footprint tiny and the cache super efficient.
Just the keys, not the whole house.
Okay.
Let's flip it.
Newsfeed retrieval.
The user pulls to refresh.
What's that process look like?
So the user sends that GE2 request.
It goes through the web servers and hits the newsfeed service.
Step one, the service fetches that list of post IDs from the newsfeed cache.
And again, it's just a list of pointers for that user's feed.
Right.
But the client can't do anything with just IDs.
This is where the complexity comes back in.
This is the hydration step.
A hydration step.
The newsfeed service sees it only has these bare IDs.
So it has to go out and fetch all the complete details needed to actually render the feed.
A username, profile picture, the post text, like counts.
All of it.
And this means querying multiple specialized cache layers all at once to build that fully hydrated newsfeed.
And that all has to happen in milliseconds.
And the images, the videos,
they're not in the post cache either, are they?
No.
The post data just has a pointer to the media.
The media itself is stored in a content delivery network, a CDN, which has servers distributed all over the world.
So the media loads from a server that's physically close to you.
Exactly.
The system stitches together all that metadata from the different caches with the media from the CDN and then sends the final JSON package to your phone.
It's clear that cache is the absolute lifeblood here.
And if the service has to query all those different sources during hydration, it must be a really specialized cache architecture.
It is.
The source material describes it as having five distinct specialized layers.
Why so many?
Why not one big cache?
Specialization buys you performance.
If everything's in one giant cache, reads are slower and a stale year is catastrophic.
By separating data types, you can tailor everything, the eviction policies, the consistency models for each specific job.
Okay, let's break down the five layers.
Layer one is...
The newsfeed cache.
That's where we score those simple lists of post IDs for everyone's feed.
High read volume, simple data.
Layer two is the actual content.
The content cache.
This holds the post data itself, text, timestamp, media locations, and really popular viral content often gets its own hot cache inside this layer to protect the database from getting hammered.
Makes sense.
Then you have the social structure.
Layer three, the social graph cache.
All the relationship data.
Who follows whom, who's blocked.
Critical for that fan out filtering step we talked about.
And layer four is for actions.
The action cache, yeah.
Who liked a post, who replied to it, all that specific activity.
And the fifth layer is just for counters.
Why do counters need their own dedicated cache?
The counters cache.
Because counters are unique.
Things like counts or follower counts.
They're tiny, they're accessed constantly, and they change very frequently.
So they need to be updated really fast.
They need rapid atomic updates, and they need to be read at lightning speed during hydration.
Separating them means we can use highly performant cache mechanisms just for volatile numbers without messing up the stability of, say, the content cache.
That makes perfect sense.
Okay, so this design gets us a robust system for 10 million users.
But the work is never really done.
What are the big picture things you'd think about for scaling this to, say, 100 million?
At that point, your focus has to shift more towards data durability and consistency across a massive distributed architecture.
Which means database scaling becomes the main event.
Absolutely.
You'd move past simple replication and get into aggressive database sharding.
How would you shard?
Shard by user it.
That's essential for both the newsfeed cache and the post database.
It creates what's called data affinity.
Everything related to user A lives on the same shard.
So reading my feed or writing my post only ever has to hit a single database server.
No complex cross -server queries.
Exactly.
Distributes the load beautifully.
It's a key system -specific insight, not just some generic buzzword.
And what about just good system hygiene at that scale?
You double down on the basics.
You keep the web tier totally stateless.
So any server can handle any request.
That's key for scaling and resiliency.
You absolutely have to support multiple data centers for global traffic and disaster recovery.
And more caching.
And you maximize caching everywhere.
And you lean on those message queues for a loose coupling between services.
That's your safety net.
If the fan -out workers go down for a minute, users can still post.
The messages just wait in the queue.
And you must be monitoring everything like a hawk.
Non -negotiable.
You're watching queries per second, of course.
Yeah.
But you're also scrutinizing the latency on every newsfeed refresh.
And critically, you're measuring the health of the fan -out workers to make sure those queues aren't backing up.
If they do, that whole real -time feeling is lost.
Okay.
Quick recap.
We established the massive scope 10 million users, 5 ,000 friends each.
We defined the two core flows, publishing and retrieval.
We navigated the huge trade -off between fan -out on write and read and landed on a hybrid model.
Fortified by consistent hashing.
Yeah.
And we detailed that crucial five -layer caching system.
IDs, content, social graph, actions, and counters.
All of which enables that rapid hydration step.
You know, this raises an interesting final question.
That hybrid push -pull model.
It relies on knowing who the hot key users are.
But what happens if a regular user goes viral overnight?
We initially put them on that fast push track.
So what specific monitoring metrics would alert us that this user has suddenly become a celebrity?
That we need to migrate them to the pull model, like right now, before the fan -out service fails?
Something for you to mull over as you think about building your own high -scale systems.
Thank you for joining us for this deep dive into newsfeed system architecture.
We hope this knowledge speeds up your own next endeavor.
This has been another deep dive.