Chapter 11: Music & Speech Perception
Loading audio…
ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.
Music & Speech Perception begins by defining the unique dimensions of musical pitch, specifically tone height, which corresponds to frequency, and tone chroma, which describes the shared quality of notes separated by octaves. These dimensions are often visualized through a musical helix to illustrate how we perceive melodic shifts. The text delves into the mathematical foundations of harmony, explaining how simple frequency ratios create consonant, pleasing chords, while more complex ratios result in dissonance. Musical structure is further defined by melody, tempo, and rhythm, with a focus on how the human brain is naturally predisposed to grouping sounds into rhythmic patterns, even when they are perfectly uniform. Cultural studies, such as those involving the Tsimane’ people, highlight that while some aspects of hearing are universal, many musical preferences and the ability to recognize octaves are heavily influenced by environmental exposure. The phenomenon of absolute pitch is examined as a rare intersection of genetic predisposition and early childhood training. Transitioning to speech, the chapter outlines the three-part production process: respiration in the lungs, phonation at the vocal folds, and articulation within the vocal tract. It explains how manipulating the shape of the oral and nasal cavities creates formants—concentrations of energy at specific frequencies—that allow listeners to distinguish between different vowels and consonants. A significant challenge in speech research is coarticulation, where the overlapping movements of speech organs cause individual sounds to change based on their context, leading to a "lack of invariance" in the acoustic signal. Despite this variability, humans utilize categorical perception to snap ambiguous sounds into distinct phonemic groups. The chapter also discusses the multisensory nature of communication, exemplified by the McGurk effect, where visual lip-reading cues can fundamentally alter the sound a person perceives. Developmentally, it is shown that infants are born as "universal listeners" capable of distinguishing all human speech sounds, but they gradually tune their perception to their native language through statistical learning. Finally, the neurological basis for these abilities is mapped out, noting that while general auditory processing occurs in the primary auditory cortex, the interpretation of intelligible language is primarily lateralized to the anterior and ventral regions of the left superior temporal lobe.