Chapter 11: Hash Tables: Hash Functions, Open Addressing, and Collision Handling

Loading audio…

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

If there is an issue with this chapter, please let us know → Contact Us

While direct addressing — using a key as an array index — is fast, it becomes impractical for large key universes, so hash tables compress the key range through a hash function h(k) that maps keys to slots in a table of size m, inevitably producing collisions when two keys map to the same slot. The two primary collision resolution strategies are chaining, where each slot holds a linked list of colliding elements and average-case search time is Θ(1 + α) with load factor α = n/m, and open addressing, where all elements are stored within the table itself and collisions are resolved by probing alternate slots through a sequence h(k, i) — with linear probing offering good cache performance but suffering from primary clustering, and double hashing providing better dispersion by incorporating a second hash function, though deletion in open addressing requires special DELETED markers to preserve probe sequence integrity. For load factor α < 1, open addressing yields expected probes of at most 1/(1–α) for unsuccessful searches and (1/α) × ln(1/(1–α)) for successful ones. Hash function design is critical: the division method uses k mod m with a prime m, the multiplication method applies a fractional constant A, and the multiply-shift method exploits bit operations for speed when m is a power of two. Universal hashing strengthens these guarantees by randomly selecting a hash function from a carefully constructed family, ensuring collision probability of at most 1/m and providing worst-case protection against adversarial inputs through the construction h(k) = ((ak + b) mod p) mod m. Practical considerations round out the chapter, noting that cache-friendly access patterns — favoring linear probing on modern hardware — and specialized handling for variable-length keys like strings are essential for building high-performance hash tables in real systems.