Chapter 8: Design a URL Shortener

Loading audio…

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

If there is an issue with this chapter, please let us know → Contact Us

The initial phase involves scoping the project, which reveals high-volume requirements, including generating 100 million unique URLs per day. Based on a 10:1 read-to-write ratio, the system must handle approximately 11,600 read operations per second and support over 365 billion total records during a ten-year lifespan. The architectural plan defines two core REST-style API endpoints: a POST request for the URL shortening function and a GET request responsible for redirection. A critical design choice for redirection is balancing server load reduction using the 301 permanent redirect versus facilitating ongoing analytics and click tracking through the 302 temporary redirect. For the deep dive, the system utilizes a relational database to store the mapping between short and long URLs, preferring this over memory-intensive hash tables. The creation of the short alias, or hash value, uses a 62-character alphabet (0-9, a-z, A-Z). To reliably support 365 billion URLs, the hash value length is determined to be seven characters, which provides sufficient capacity of approximately 3.5 trillion possible combinations. The preferred implementation approach is base 62 conversion, which converts a unique primary key ID generated by the database into the seven-character short URL string. This approach avoids the collision issues inherent in the "hash plus collision resolution" method, where standard hash functions like MD5 might require recursive collision checks against the database, possibly optimized by using a bloom filter. The complete URL shortening flow involves checking the database for the long URL's prior existence before generating a new globally unique ID, converting it to base 62, and saving the new mapping. To improve performance for the high volume of read operations, the URL redirecting flow heavily relies on a caching layer to quickly retrieve the long URL associated with a short alias before redirecting the client. Finally, comprehensive system design includes implementing a rate limiter to protect against malicious abuse, ensuring scalability by maintaining a stateless web tier, and utilizing database replication and sharding techniques.