Chapter 12: Digital Wallet System Design

Loading audio…

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

If there is an issue with this chapter, please let us know → Contact Us

99 percent), and reproducibility. Initial designs explored simple in-memory solutions, like sharding user accounts across multiple Redis nodes, managed by a centralized configuration storage solution such as ZooKeeper. However, this approach fails to ensure atomicity, as a single balance transfer requires two separate updates which must both succeed to prevent incomplete transactions. To address this, the design transitions to distributed transaction mechanisms using transactional relational databases and sharding. Two primary high-level solutions for distributed transactions are examined: the low-level Two-Phase Commit (2PC) protocol, which uses database locks but suffers from performance issues and single point of failure risks with the coordinator; and higher-level application-driven solutions like Try-Confirm/Cancel (TC/C) and Saga. TC/C, a compensating transaction method, uses separate transactions for its phases and requires the business logic to handle compensation (undo operations) if a failure occurs, necessitating a phase status table to track progress. Saga, often used in microservice architecture, enforces a linear order of operations, using compensating transactions for rollback if any step fails. The chapter concludes that the ultimate solution for auditability and correctness verification is event sourcing, a philosophy where all changes are saved as an immutable history of events, allowing historical states to be reconstructed, guaranteeing reproducibility. This architecture uses a deterministic state machine to validate incoming commands and generate events which are then applied to update the system state. To optimize performance and latency, the system incorporates Command-Query Responsibility Segregation (CQRS), separating the write path (commands and state updates) from the read path (queryable historical views). Further deep dives introduce performance enhancements by using file-based storage, such as RocksDB and mmap, for persistence to maximize I/O throughput, and using periodic snapshots to accelerate state reconstruction. Reliability is achieved by utilizing consensus algorithms, specifically Raft, to replicate the immutable event list across a node group, ensuring fault tolerance as long as a majority of nodes remain operational. Finally, to meet real-time latency requirements, the asynchronous event sourcing framework is adapted using a reverse proxy and a push model, and the entire system scales horizontally by partitioning data and coordinating transfers between node groups using distributed transaction protocols like TC/C or Saga.