Chapter 7: Log-Structured Storage & LSM Trees

Loading audio…

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

If there is an issue with this chapter, please let us know → Contact Us

The fundamental design philosophy treats storage as an append-only system where data is never modified in place, similar to an immutable ledger that preserves historical records while adding new entries. The data flow begins with incoming writes being stored in a memory-resident structure called a memtable, typically implemented using skip lists to maintain sorted order for efficient range queries. When the memtable reaches capacity, it is flushed to disk as an immutable Sorted String Table, creating a layered storage hierarchy that accumulates multiple files over time. This accumulation necessitates a critical background process known as compaction, which merges overlapping files to reclaim space and maintain query performance. The system must navigate the fundamental trade-offs outlined by the RUM conjecture, which describes the inherent tension between read amplification, update amplification, and memory amplification in storage systems. Deletions present unique challenges in immutable systems and are handled through tombstone markers that logically indicate removed keys without physically deleting data. Read operations require sophisticated reconciliation mechanisms that perform multiway merge operations across multiple memory and disk components to identify the most recent version of requested data. To optimize read performance and reduce unnecessary disk access, the system employs probabilistic data structures like Bloom filters that can definitively determine key absence with minimal memory overhead. The chapter also explores specialized implementations including Bitcask for high-throughput point lookups and WiscKey's key-value separation strategy optimized for modern storage hardware. Additionally, the discussion addresses the complexity of log stacking, where multiple layers of log-structured systems interact across the database engine, filesystem, and SSD's internal flash translation layer, emphasizing the importance of hardware-aware design decisions to prevent performance degradation and write amplification cascades.