Chapter 6: Partitioning & Distributed Data Layouts
Loading audio…
ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.
The primary goal of this distribution is to spread data and query load evenly across a shared-nothing cluster to avoid skew and prevent an overloaded single partition called a hot spot. For simple key-value data, two main approaches are used: key range partitioning, which assigns a continuous range of sorted keys to each partition, facilitating easy range scans, but creating a risk of hot spots if sequential keys (like timestamps) receive disproportionate writes; and hash partitioning, which applies a hash function to the key to ensure data is uniformly distributed across partitions, thereby mitigating skew, though this method makes range queries inefficient as key ordering is lost. Cassandra utilizes a hybrid approach with compound primary keys, hashing only the first element for partitioning while keeping the rest of the key sorted locally for efficient retrieval of related records. The chapter also explores strategies for handling secondary indexes, which complicate partitioning: document-partitioned indexes (local indexes) are stored alongside the document within a single partition, making writes efficient, but requiring a potentially slow scatter/gather query across all partitions during reads. Alternatively, term-partitioned indexes (global indexes) are partitioned by the indexed value, allowing reads to be served from a single index partition, but requiring complex, often asynchronous updates across multiple index partitions during a single document write. When nodes are added or removed from a cluster, rebalancing is necessary to redistribute the load fairly, which must occur while the database remains operational. Common rebalancing strategies include using a large fixed number of partitions that are moved between nodes, or implementing dynamic partitioning (common in key range systems like HBase) where partitions automatically split when they exceed a maximum size, adapting to the total dataset volume. Finally, efficient request routing is required to direct a client's query to the specific node owning the relevant partition. This partition-to-node assignment is often tracked as cluster metadata by a dedicated coordination service like ZooKeeper, although systems like Cassandra use a decentralized gossip protocol for disseminating state changes. The partitioning fundamentals discussed underpin both transactional systems and more sophisticated massively parallel processing (MPP) systems used for complex analytical workloads.