Chapter 19: Data Structures for Disjoint Sets
Loading audio…
ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.
A straightforward linked-list implementation makes MAKE-SET and FIND-SET O(1) but requires UNION to update every element's set pointer, producing Θ(n²) total cost in the worst case over a sequence of operations; the weighted-union heuristic improves this by always appending the shorter list to the longer, limiting each element's pointer updates to O(log n) and reducing total cost for m operations to O(m + n log n). The more powerful implementation uses disjoint-set forests, representing each set as a rooted tree where every node points to its parent and the root points to itself, with FIND-SET following parent pointers to the root and UNION linking one root to another. Two heuristics dramatically improve forest performance: union by rank attaches the tree with smaller rank as a child of the larger, keeping trees shallow, while path compression during FIND-SET rewires every node along the traversal path to point directly to the root, flattening the tree for all future operations. Together these heuristics yield a total running time of O(m × α(n)) for any sequence of m operations, where α(n) is the inverse Ackermann function — a quantity that grows so slowly it never exceeds 4 for any conceivable real-world input, making disjoint-set forests essentially linear in practice — a bound formally proven through amortized analysis using a potential function that tracks the evolving structure of the forest.