Chapter 10: Design a Notification System

Loading audio…

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

If there is an issue with this chapter, please let us know → Contact Us

The design must accommodate high daily volumes, specifically 10 million mobile push notifications, 5 million emails, and 1 million SMS messages, while meeting a soft real-time delivery requirement where minimal delays are acceptable under peak load. The initial architectural step involves collecting and storing user contact information, such as device tokens, email addresses, and phone numbers, in a persistent database structure. The improved high-level design introduces crucial mechanisms to ensure scalability and reliability, primarily by utilizing message queues to decouple various system components. Notification events are triggered by internal services, passed through Notification Servers—which handle authentication, basic validation, and rate limiting—and then placed into distinct message queues, ensuring that an outage affecting one communication channel (like SMS) does not halt the entire system. Specialized workers continuously pull events from these queues, prepare the data (including JSON dictionary payloads for iOS push notifications), and interface with necessary third-party services like Apple Push Notification Service (APNS), Firebase Cloud Messaging (FCM), or commercial SMS and email providers. To guarantee system resilience, all notification data is persisted in a notification log database, and a robust retry mechanism is implemented for handling temporary failures with external providers. Because exactly once delivery is fundamentally challenging in distributed systems, a deduplication mechanism checks event IDs to minimize the chance of users receiving duplicate alerts. Furthermore, the system incorporates features like notification templates to maintain consistent formatting and efficiency, checks user settings to respect opt-out preferences, applies frequency rate limiting to prevent overwhelming recipients, and integrates monitoring and analytics services for tracking performance and measuring user engagement metrics such as open and click rates.