Chapter 2: Data Models & Query Languages
Loading audio…
ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.
The chapter provides an in-depth analysis of data models and query languages, emphasizing that data models are crucial abstractions that define how software developers conceptualize and solve problems, layered from application objects down to electrical currents. It contrasts the three primary general-purpose data models for storage and querying: relational, document, and graph. The venerable relational model, based on SQL and Codd’s 1970 proposal, organizes data into structured tables and focuses on normalization to eliminate duplication, making it highly effective for managing complex many-to-one and many-to-many relationships through the use of joins and foreign keys. In contrast, the newer document model, popularized by NoSQL movement drives for greater scalability and developer preference for flexible data structures, stores data as self-contained units, often in JSON format. This model inherently reduces the object-relational impedance mismatch and offers excellent data locality, speeding up retrieval of an entire document, though it faces challenges when handling complex, interconnected relationships that require joins. Document systems typically employ a schema-on-read approach, offering greater schema flexibility compared to the rigid schema-on-write enforced by traditional relational databases. Historically, the document model shares traits with the old hierarchical model (like IMS), a system that was largely superseded by the relational model due to its limitations in handling many-to-many structures. The discussion then shifts to the power of declarative query languages—such as SQL, Cypher, and SPARQL—which allow developers to specify what data they want rather than how to retrieve it, enabling the database's query optimizer to select the most efficient access path and facilitating parallel execution. This contrasts sharply with older, imperative systems like CODASYL. Finally, the chapter introduces graph data models, like the property graph and triple-store models, which are naturally suited for highly interconnected data and complex many-to-many relationships where traditional models become awkward. Query languages like Cypher and SPARQL allow concise pattern matching and variable-length traversals (such as following a relationship zero or more times) that are difficult to express using standard SQL (which requires clumsy recursive common table expressions). Furthermore, foundational query languages like Datalog demonstrate how complex queries can be built up systematically using reusable rules. Despite their differences, the models are converging, with relational databases increasingly supporting document features like JSON data types.