Chapter 26: Parallel Algorithms: Fork-Join, Matrix Multiplication, and Merge Sort
Loading audio…
ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.
Execution is modeled as a computation DAG where vertices represent strands of sequential instructions and edges capture spawn, call, return, and sync dependencies, providing a rigorous foundation for the work-span model: work T₁ measures total operations as if run serially, span T∞ measures the longest dependency chain through the DAG, and the ratio T₁/T∞ defines parallelism — the theoretical upper bound on achievable speedup. Two fundamental laws constrain any P-processor execution: the Work Law (T_P ≥ T₁/P) and the Span Law (T_P ≥ T∞), while a greedy scheduler — one that assigns all ready tasks at every step — achieves T_P ≤ T₁/P + T∞, guaranteed to be within a factor of two of optimal. Parallel compositions take the max of subspans while series compositions sum them, enabling compositional analysis of recursive algorithms. Concrete examples illustrate the trade-offs: parallel Fibonacci achieves parallelism of Θ(φⁿ/n), recursive parallel matrix multiplication reduces span to Θ(log² n) with parallelism Θ(n³/log² n), Strassen's parallelization delivers even higher parallelism with span Θ(log² n), and parallel merge sort using a recursive parallel merge achieves span Θ(log³ n) and parallelism Θ(n/log² n) — far superior to a naïve version using serial merge. The chapter closes by addressing determinacy races, where unsynchronized concurrent reads and writes to shared memory produce nondeterministic results, emphasizing that correct parallel programs must ensure tasks operate on disjoint memory or share only read-only data.