Chapter 10: Design a Notification System

0:00 / 0:00
Report an issue

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

If you stop right now and, you know, look at your phone or your computer, chances are within the last few hours you've gotten a notification.

Over -sure.

A little ping from your bank, a recommendation from a streaming service, or maybe an alert that a delivery is on its way.

They're like the constant heartbeat of our digital lives.

They really are.

And while, you know, getting one notification seems so simple, designing the actual architecture to handle billions of them and to do it reliably and instantly all over the world, that's one of the biggest scaling challenges out there.

And that's exactly what we're digging into today.

Yeah, our mission is to break down the blueprint for a notification system that can scale to a massive level without, and this is key, without losing a single message.

Okay, so let's set the stage.

Let's unpack the scale we're actually dealing with here based on the source material.

We're not just talking about one channel.

No, not at all.

We need a unified system that can handle three totally different formats.

Mobile push notifications, good old SMS messages, and of course emails.

And the volume is pretty intense.

It is staggering.

I mean, look at these numbers, 10 million mobile push notifications,

1 million SMS messages and 5 million emails all flowing through the system every single day.

The capacity required is just enormous.

And there's a really interesting performance requirement here.

Soft real time.

I like that phrase.

It means speed is critical, like getting a sports score update.

But if we hit, say, a Super Bowl level traffic peak, a little bit of a delay is actually acceptable.

Exactly.

The system has to prioritize stability over a guarantee of immediate delivery when things get crazy.

And on top of all that flexibility, it has to support everything.

Every device you can think of, iOS, Android, desktop browsers, and critically, we have to build respect for the user right into the foundation.

If they opt out of promotional emails, the system has to honor that.

No questions asked.

That sets up the challenge perfectly.

It does.

So to even start meeting those requirements, we first have to get down to the basic mechanics.

I mean, a notification doesn't just magically appear on a device.

It's relying on these big external services.

So walk us through how a message actually gets delivered.

Let's start with iOS.

So for mobile push notifications, we absolutely have to rely on the platform owners.

That's Apple and Google.

For iOS, it's a three -part flow.

First, you have our server, which we call the provider.

Its job is to build the request.

And that request needs two specific things.

Two essential components,

the device token and the payload.

The device token is basically the address for the notification, but it's not like an email address.

Not at all.

It's much more fundamental.

It's a unique sort of opaque identifier that Apple itself issues for one single physical device, a specific iPhone, a specific iPad.

And you can't transfer it.

It's non -transferable and it's vital for security.

If our system loses that token, we just can't send a push notification to that device, period.

Got it.

And the payload is the message itself.

Exactly.

It's the actual content, usually structured as a JSON dictionary.

It's got the title, the body, maybe a sound preference, and any specific action keys like buttons you can tap on.

So the provider packages all that up.

Yes.

And then it sends that request to the second component, the Apple push notification service, or APNS.

Which is Apple's middleman.

It's Apple's intermediary service, yeah.

It manages the connection to the device.

It handles all the security signing.

And then finally, APNS delivers that message to the third component, which is the target iOS device.

And I'm guessing Android follows a very similar path, just with Google's system.

You got it.

Instead of APNS, Android uses Firebase Cloud Messaging, or FCM.

But the structure is identical.

Provider, third -party service, device.

Now, SMS and email,

they're handled a bit differently.

Yeah, I was going to ask.

Why not just build our own email or SMS server?

You could try, but it's the immense complexity involved in high -volume delivery.

You're dealing with all different carrier networks, ISP spam filters, international compliance rules.

This sounds like a nightmare.

It is.

So to get high delivery rates and really good analytics,

any large -scale system is going to delegate those tasks to a commercial third -party service.

Think Twilio for SMS or SendGrid for email.

They're specialized and just offer way better guarantees than we could ever build in -house.

That makes a lot of sense.

So before any of this delivery can even happen, we need to talk about a prerequisite,

the contact info gathering flow.

How does that initial data even get collected and organized?

Right.

So when a user signs up or installs the app for the first time, our API servers are responsible for grabbing all those potential contact points.

We store the information that's pretty static, you know, email addresses, phone numbers, in the main user table.

But a single user could have, what, a phone, a tablet, a laptop that complicates the push notification side.

It does.

Which is exactly why we need a dedicated, separate device table.

This table stores all of those unique device -specific device tokens.

There's one entry for every single phone, tablet, or desktop login.

And crucially, it links them all back to that one user ID in the user table.

So you can hit all of a user's devices at once.

Precisely.

That data structure is absolutely fundamental before we can even think about moving millions of messages around.

Okay, let's talk about scaling this thing.

The first simplest architectural idea, what the source calls the initial high -level design, it just relied on a single centralized notification server doing everything.

It sounds simple, but I have a feeling it was a disaster waiting to happen.

It was essentially non -viable for the kind of scale we're talking about.

We found three critical failures almost immediately.

First,

and the most obvious one, the single point of failure, or SBOF.

Right, if that one server fails.

Everything stops.

A bad deployment, a hardware crash, doesn't matter.

All notifications worldwide just cease instantly.

That is a catastrophic business risk.

Absolutely.

The second problem was scalability.

Because all the components, the database, the cache, the processing logic, they were all tightly coupled together.

You couldn't scale the database without also scaling the notification processing at the same time.

And the third flaw was just a pure performance bottleneck.

Yes.

Certain tasks are just really resource -heavy.

Building a complex, personalized HTML email takes a lot of CPU.

Waiting for a slow response from a third -party service blocks the whole thread.

So trying to do all of that in one server is just a recipe for overload?

It guarantees it.

You'd have crippling latency during any kind of usage spike.

So the solution is what the source calls the improved high -level design.

And it's all about one beautiful principle,

decoupling.

How did we surgically break this system apart to make it resilient?

We needed distribution.

So first thing, we moved the database and the cache out into their own services with their own horizontal scaling.

We also added multiple load -balanced notification servers.

But the real game -changer, let me guess, the linchpin of the whole thing, was introducing message queues.

Okay, so the queue acts as a buffer, right?

It smooths out those big traffic spikes.

But the source really emphasizes its role in segregation.

And that segregation is the architectural master strike.

We don't just use one big queue for everything.

We assign a distinct queue for each notification type.

So one for iOS push, one for Android push, one for SMS.

Exactly, one for email.

Four separate queues.

Why is that separation so powerful?

Think of them as completely independent pipelines.

Let's say, for example, the email third -party service has a catastrophic global outage.

The email queue is going to back up and fast.

But because it's totally isolated,

the mobile push notifications and the SMS deliveries, which use different queues and different vendors, they remain completely unaffected.

So a failure in one vendor doesn't cascade and take down the entire system.

It can't.

And we use really robust, high -throughput queue technologies here.

Things like Kafka or RabbitMQ to handle that massive ingestion rate.

That's a huge boost to system stability.

All right, let's trace a notification through this new, resilient architecture.

It's a six -step flow, right?

Starting with whatever service triggers the notification.

Yep, step one.

An internal service, maybe a microservice handling a user purchase, it calls the API on one of our notification servers.

And these APIs have to be protected, strictly internal, to prevent just anyone from spamming our system.

OK, step two.

The notification server gets the call, but it doesn't just send the message right away.

It's more of a gatekeeper.

That's right.

Its job is to fetch all the necessary metadata user info, the right device tokens, and very importantly, the user's opt -in settings from the cache or the database.

It does some validation, too, making sure an email is formatted correctly, for instance.

Step three.

The actual notification event is packaged up and sent to the right message queue.

So, for an iPhone alert, it goes to the iOS PN queue.

OK, and step four is where the workers come in.

That dedicated set of workers are just constantly pulling events from that queue.

These are scalable, and they're dedicated only to processing that one channel.

Then step five.

The worker sends the final constructed request to the correct third -party service, APNS, FCM, Twilio, whatever it is.

And finally, step six.

The third -party service delivers it to the user's device.

And that's the full flow.

The notification servers are really the workhorses here, then.

Handling the API, the validation, the data fetching, and putting the event in the right queue, they're the ones making sure the API call has, you know, the required inputs like userID and subject.

They ensure the integrity of that request before it even gets into the pipeline.

But now, this decentralized flow,

it creates a new, really crucial challenge.

Reliability.

Because the requirement is that notifications can be delayed, but they can never be lost.

So how do you guarantee that persistence in a distributed world?

This is where we need an insurance policy.

That's a great way to put it.

We introduced the notification long database.

This database stores every single notification request that comes in.

It acts as a permanent audit trail.

So if a worker fails for some reason.

Right.

Maybe the third -party service timed out or the worker itself crashed.

The fact that we have that persistent record in the log allows the system to have a really robust retry mechanism.

But retries immediately open up that classic distributed systems problem.

Duplication.

How do we make sure a user doesn't get the same message five times?

Yeah, that's the paradox.

And the technical truth is that getting true exactly once delivery is, for all practical purposes, impossible across microservices and external vendors.

But we have to stop them from getting five identical promotional emails.

Of course.

So we use what's called a DDoP mechanism.

And how does that work?

So when a notification event arrives at a worker, we check its unique event ID against a record of IDs we've processed recently.

And you have to do that check really, really fast.

Super fast.

So we typically use a cache like Redis, which can store these IDs for a short time, maybe 24 hours.

If the worker checks Redis and sees the ID is already there, it just discards the message.

Done.

If it's new, it processes it and then adds the ID of the cache.

So even if the first service sends the event twice by mistake, only the first one gets through.

Exactly.

OK, that handles the structural integrity.

Let's shift a bit to internal efficiency with notification templates.

Oh, these are essential.

Since we're sending millions of messages that often have really predictable formats, like your order has shipped, templates are key.

A template gives you a pre -formatted structure, the layout, the call to action button.

And we only have to inject the specific variables, like item name or date.

So the system isn't wasting CPU cycles, building an entire HTML email from scratch every single time.

Exactly.

And you guarantee consistent brand experience.

Templates save a ton of processing time, and they reduce the chance of human error in the content.

Now we get into what I think is a crucial area, respecting the user.

This is non -negotiable for keeping people engaged long term.

Let's start with notification settings.

We have to give users fine -grained control.

If we don't, they will just silence our application permanently.

So before any message is even sent to a queue, the system has to perform a mandatory check against the notification setting table.

And what's in that table?

It stores the user ID, the communication channel, so push, email, SMS, and a critical opt -in Boolean value.

And that opt -in isn't just one single flag for everything, is it?

Oh, no.

For a system this complex, that table usually handles multiple settings at once.

For example, a user might be opted in to transactional emails, like a password reset.

But opted out of all promotional emails.

Right.

The system has to verify that the message category matches the user's specific preferences before it even thinks about sending it to the queue.

And right alongside honoring their preference, we need rate limiting to prevent just user fatigue.

Absolutely vital.

If you bombard a user with five promotional pings in an hour, you're guaranteeing they either uninstall the app or they globally disable your notifications.

So we have to enforce frequency limits.

You must.

This is often done with something like a time series database inside Redis, where you can monitor rolling time windows.

For example, limiting promotional emails to one per user per 24 hours.

It's a small architectural decision that dramatically reduces user churn.

Let's quickly touch on security for the mobile APIs.

Yeah, this is paramount.

Especially since push notifications mean sending data through external platforms.

We use an app key and app secret pair for our iOS and Android API calls.

An authentication layer.

Basically.

It ensures that only our verified, legitimate client apps can send push notifications through our internal system.

It stops attackers from using our infrastructure for spam.

OK, finally,

monitoring and tracking.

What happens after the message hits the queue and a worker tries to send it?

We already mentioned the retry mechanism.

Right, the retry mechanism is that feedback loop for when things fail.

If a third party service sends back an error, maybe the device token is invalid or their service is just down for a minute,

the worker doesn't just give up.

It tries again.

It logs the failure in our notification log and then automatically places the message right back into the message queue for another attempt.

If it keeps failing after a few retries, that's when we fire an alert to the dev team for a human to look at it.

And for measuring the overall health of the system, what's our most important metric?

The total number of queued notifications.

That is the pulse of our system.

If that number starts climbing really fast and unexpectedly, it's a clear signal that our workers are processing too slowly.

We need to immediately scale up the number of workers to drain that queue and stop notifications from getting delayed.

And the final piece of the whole puzzle is events tracking, actually measuring customer behavior.

Yeah, we have to integrate with an analytic service to track what happens after delivery.

We track four key event types,

sent, deliver, which is confirmed by the third party, open, and click tracking.

And those metrics tell you if the campaign was actually successful.

They're essential for measuring our open rate, our click rate, our overall engagement.

That's the data that informs future product decisions.

And that brings us to the final comprehensive architecture, which ties all these pieces together.

The authentication, rate limiting, the log database, the decoupled queues, and that analytics tracking service.

It's the culmination of it all.

It prioritizes stability through isolation, reliability through persistence,

and the user experience through control and rate limiting.

It creates a truly robust system that can handle global scale.

So this deep dive has really shown us that designing a scalable notification system is, I think,

less about raw speed and more about strategic resilience.

I think that's a perfect summary.

Decoupling components with queues, ensuring reliability through that notification log and automated retries, and just respecting users through those settings, checks and rate limiting.

And if you look forward, the whole design really hinged on extensibility, the ability to easily swap out those third party services,

given how mobile operating systems are always evolving and the complex global market like FCM not being available everywhere, for instance, in China.

A critical long term thought for any designer is this.

How do you future proof that integration layer beyond just using separate message queues?

That's a great question.

That need for flexible expansion is the constant challenge in this space.

A profound challenge to leave you with and one that demands constant architectural vigilance.

Thank you for joining us on the deep dive.

We hope this has given you a complete view of the complex engineering it takes to power those simple little pings that orchestrate your digital world.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers
Designing a notification system requires architecting a distributed platform capable of handling millions of messages daily across multiple communication channels while maintaining reliability and user experience. The system must process approximately 10 million mobile push notifications, 5 million emails, and 1 million SMS messages within soft real-time constraints that permit brief delays during peak traffic periods. The foundational layer organizes user contact information including device tokens, email addresses, and phone numbers in a persistent data store that serves as the single source of truth for routing decisions. The architecture decouples notification generation from delivery by introducing message queues between distinct system components, preventing failures in one channel from cascading throughout the entire platform. When internal services trigger notification events, they route through dedicated Notification Servers that perform authentication, validate input data, and enforce rate limiting policies before enqueueing messages. Specialized worker processes continuously consume events from channel-specific queues, transform data into the appropriate format for target platforms, and communicate with external providers such as Apple Push Notification Service, Firebase Cloud Messaging, SMS gateways, and email services. System resilience relies on persisting all notification records in a dedicated log database and implementing intelligent retry logic that gracefully handles transient failures from third-party services. Because achieving guaranteed single delivery across distributed systems presents fundamental technical challenges, the platform employs deduplication mechanisms that track event identifiers to minimize duplicate message delivery. Additional features enhance user satisfaction and system efficiency: notification templates standardize message formatting, preference checking respects user opt-out selections, frequency rate limiting prevents notification fatigue, and integrated monitoring and analytics track delivery success rates alongside engagement metrics such as message open and click rates. This comprehensive approach balances the competing demands of scale, reliability, user control, and operational observability.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥