Chapter 32: String Matching: Rabin-Karp, KMP, and Finite Automata

Loading audio…

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

If there is an issue with this chapter, please let us know → Contact Us

The naive algorithm checks every possible shift s from 0 to n–m by directly comparing P against the corresponding substring of T, running in O((n–m+1) × m) worst-case time and serving as the baseline against which smarter approaches are measured. The Rabin-Karp algorithm replaces character-by-character comparison with hash value comparison, using a rolling hash to compute each new substring's hash in O(1) from the previous one and performing a full string comparison only on hash matches, achieving Θ(m) preprocessing and efficient average-case performance with a well-chosen prime modulus — particularly advantageous for multi-pattern search and two-dimensional matching. The finite automaton approach builds a state machine from P whose transition function encodes the suffix structure of the pattern, reading T in a single O(n) pass after O(m × |Σ|) preprocessing and declaring a match whenever state m is reached. The Knuth-Morris-Pratt (KMP) algorithm achieves the same Θ(n) matching time without explicitly constructing the automaton, instead computing a prefix function π in Θ(m) time where π[q] stores the length of the longest proper prefix of P[1..q] that is also a suffix, using this to skip redundant comparisons on mismatch by shifting the pattern by q–π[q] positions. Suffix arrays extend the chapter's reach to more general text indexing, storing the sorted starting indices of all suffixes of T and enabling binary search for any pattern in O(m log n) time after O(n log n) construction using radix sort and rank-based merging, with the companion LCP array — computed in Θ(n) via the inverse suffix array — supporting efficient queries for the longest repeated substring and longest common substring between two texts.