Chapter 7: Design a Unique ID Generator (Distributed Systems)

Loading audio…

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

If there is an issue with this chapter, please let us know → Contact Us

Design a Unique ID Generator (Distributed Systems) focuses on the complex challenge of designing a highly scalable, unique ID generator essential for distributed systems, recognizing that traditional methods like database auto-increment functionality are inadequate in multi-server environments. The initial design scope establishes critical requirements: IDs must be strictly numerical, fit within 64 bits, guarantee absolute uniqueness, and crucially, be sortable based on time, with the system needing to sustain a generation rate exceeding 10,000 identifiers per second. Several high-level design approaches are analyzed and dismissed for failing to meet these strict criteria. These include multi-master replication, which uses an increment factor based on the number of database servers but suffers from poor time-based sorting across servers and difficult scaling; the use of Universally Unique Identifiers (UUIDs), which are simple to generate and scale well but violate the 64-bit size limit (being 128 bits long) and lack chronological sortability; and the Ticket Server approach, exemplified by Flickr, which relies on a centralized auto-increment database, presenting a critical single point of failure. Ultimately, the chosen strategy is based on the Twitter Snowflake model, which divides the 64-bit ID into distinct sections. This partition includes a single reserved sign bit, followed by 41 bits dedicated to the timestamp (measured in milliseconds from a custom epoch), which ensures the essential chronological sortability of the IDs and provides approximately 69 years of capacity before requiring migration. The next 10 bits are split equally: 5 bits for the datacenter ID (allowing 32 centers) and 5 bits for the machine ID (allowing 32 machines per center). These values are generally fixed during system initialization. Finally, 12 bits are reserved for the rapidly incrementing sequence number, which resets every millisecond, enabling the generation of up to 4096 unique IDs per millisecond on a single server, thereby successfully meeting the high throughput requirement for a mission-critical system. The chapter concludes by discussing supplementary real-world considerations such as high availability, proper clock synchronization (e.g., using Network Time Protocol), and the optimization of section length based on the application's concurrency needs.