Chapter 2: Nearby Friends System Design
Loading audio…
ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.
The design must account for the highly dynamic nature of user locations, distinguishing it from systems handling static addresses, such as proximity services. Key requirements established include maintaining low latency, acceptable reliability (occasional data loss is fine), and eventual consistency, ensuring location lists update every few seconds. Based on scale assumptions of 10 million concurrent users, the system is estimated to handle approximately 334,000 location updates per second, requiring the backend to forward around 14 million updates per second to friends. The proposed high-level design utilizes a shared backend rather than an impractical peer-to-peer structure, incorporating a Load Balancer to distribute traffic between stateless RESTful API Servers (used for auxiliary tasks like profile and friend management) and stateful WebSocket Servers. The WebSocket servers manage persistent connections and real-time location updates. Central to the data layer is a Redis Location Cache, which stores the current position of active users using a Time-to-Live (TTL) to automatically purge inactive users, and a horizontally scaled Location History Database, often implemented using a heavy-write-capable solution like Cassandra, sharded by user ID. The communication backbone is the Redis Pub/Sub server, functioning as a lightweight routing layer where each user possesses a unique channel. When a user submits an update, the WebSocket server publishes it to their dedicated Pub/Sub channel. The WebSocket connection handlers of all online friends subscribe to that channel, compute the distance upon receiving the update, and only forward the new location and timestamp to the client device if the distance is within the search radius (e.g., 5 miles). Due to the immense volume of message pushes (14 million per second), the Redis Pub/Sub server's CPU capacity, not memory, becomes the primary bottleneck, requiring a distributed cluster managed as a stateful service via a service discovery component (like etcd or ZooKeeper) that maintains a consistent hash ring sharded by publisher user ID. An alternative approach to Redis Pub/Sub is using Erlang/OTP, which is better suited for highly concurrent applications and can model each of the 10 million active users as an extremely lightweight process. Furthermore, the system can be extended to handle "nearby random person" features by utilizing Redis Pub/Sub channels sharded by geohash grids.