Chapter 10: Hearing in the Real World

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome back to another Deep Dive.

Today we are opening up a really interesting stack of because we are tackling chapter 10 of Sensation and Perception, sixth edition.

And this one I think marks a really important shift in how we've been looking at the senses.

It really does because up until now when we've talked about hearing it's been very

microscopic.

We get into the ear canal, the eardrum, the ossicles, the hair cells.

The nuts and bolts.

It's all been about the mechanics, the plumbing and wiring inside your head.

How does the cell fire?

How does that membrane vibrate?

Exactly.

It's about the machinery.

But this chapter, titled Hearing in the Environment, takes this huge step back.

It's like we're finally walking outside of the workshop.

And we're asking, okay, we have this incredible biological device, but what is it for?

How does it actually help us navigate a really, really chaotic world?

And that's our mission for this deep dive.

We're moving beyond just how a hair cell fires.

We're now asking how your brain takes all those signals and constructs a coherent three -dimensional reality from them.

And when you really stop and look at the challenges, the sheer physics of it, it honestly seems impossible.

It does.

It feels like a computational miracle.

And the chapter opens with this analogy that I thought was just perfect for setting the stage.

It calls it the world of glass.

I love this.

It's such a great way to frame the fundamental problem of hearing.

So walk us through it.

What is this world of glass?

Okay.

So to get it, you first have to think about vision.

If you're looking out a window right now, you might see a tree, a car, and maybe a person walking by.

Right.

Three separate objects.

They're separate in space.

Light bounces off the tree and hits one part of your retina.

Light from the car hits a different part.

The key thing is they don't overlap.

Your retina is basically a map of the world.

Yeah.

If something is on my left, it hits the right side of my eye.

The location is sort of baked into the input from the very beginning.

Precisely.

The spatial information is preserved.

But now imagine if that entire world was made of transparent glass.

The tree, the car, the person, all ghostly and see -through.

And they could pass right through each other.

Okay.

That's a bit trippy.

A jumble.

If you looked at that scene, you wouldn't see separate objects anymore, would you?

You would just see a complex overlapping pattern of light, shadows, and edges all summed together.

You wouldn't know where the car ended and the person began.

It'd just be a mess of information.

And that mess is exactly what your eardrum gets all the time.

Ah, so that's the connection.

That's the connection.

Think about being at a park.

You've got a bird chirping over there, a kid laughing over here, a car horn in the distance, wind rustling the leaves.

All separate sound sources.

All separate.

But those sound waves travel through the air.

And by the time they reach your ear canal, they've all added together into one single fluctuating pressure wave.

Just one wiggly line.

Wow.

So my ear doesn't get a bird signal and a car signal.

It just gets pressure change.

Just one stream of data.

And the brain's job, the absolute mystery of this chapter, is how in the world does it take that single jumbled signal and unmix it?

How does it pull it back apart into bird, child, and car?

And not just that, but know where they are and what they are.

It's like trying to taste a cake and not just identify every single ingredient, but also say,

ah, the flower came from this farm and the sugar came from that one.

It's a staggering computational feat.

So the chapter kind of organizes this big journey for us.

How are we going to tackle it?

We're going to follow the book's path.

First, we'll tackle sound localization, which is the where problem.

Then we'll get into complex sounds, the what problem.

And finally, auditory scene analysis, which is how the brain groups it all together into a coherent picture.

Okay.

So let's start with the where.

Because as you said, the eye has a map.

The ear does not.

That's the fundamental hurdle.

The cochlea, our hearing organ, is amazing, but it organizes sound by frequency.

High pitches at one end, low pitches at the other.

It's a frequency analyzer, not a spatial one.

So a sound from my left or my right or even from above me.

It all enters the same ear canal.

It stimulates the same receptors in the same way.

The cochlea itself has absolutely no idea where that sound came from.

So it's up to the brain.

The brain has to do the math.

It has to calculate the location based on clues.

Exactly.

It has to solve for X.

And the text gives us this really great intuitive scenario to start with.

The owl in the dark.

Right.

The camping trip.

You're out camping.

It's pitch black.

You're sitting by the fire.

You can't see a thing beyond the flames.

And then suddenly an owl hoots.

And you instantly know where it is.

Not precisely maybe, but you know if it's to your left or your right or behind you.

It's immediate.

It feels immediate.

It feels effortless.

But in that fraction of a second, your brain just ran some seriously high speed physics calculations.

And to figure out the azimuth, which is just the technical term for that left right angle around your head, it uses two main clues.

And the first one is all about time.

Interal time difference or ITD.

It's probably the most intuitive one.

We have two ears and they're separated by a head.

So if a sound comes from your left.

I got a shorter trip to my left ear.

It hits that one first.

Simple as that.

But we really need to appreciate the scale we're talking about here.

Sound travels incredibly fast.

About 340 meters per second.

And your head is relatively small.

So that delay has got to be minuscule.

It is.

If a sound is directly to your side at 90 degrees azimuth, the time difference is at its absolute maximum.

And that maximum difference is only about 640 microseconds.

Microseconds.

So millions of a second.

640 millions of a second.

That is the biggest possible delay your brain will ever get from natural sound.

If the sound is just a tiny bit off center, like one degree to the left, the delay is much, much smaller.

I mean, can our brains even register something that fast?

Does that even make a difference?

It's shocking how sensitive we are.

The textbook points out that humans can reliably detect interaural time differences as small as 10 microseconds.

10.

10 millionths of a second.

That seems physically impossible.

It's what allows us to distinguish the angle of a sound source to within about one degree.

It's genuinely one of the most precise temporal judgments the entire nervous system can make.

Hold on.

I remember from our other deep dives on neuroscience that neurons themselves aren't that fast.

A single action potential.

A neuron firing.

That takes about a millisecond, right?

That's right.

A neural spike is about a thousand microseconds long.

So 10 microseconds is one hundredth of the duration of a single neuron's signal.

How can a system built with slow components measure something so incredibly fast?

It feels like trying to time an Olympic sprint with a grandfather clock.

That is the million dollar question, isn't it?

And this is where the chapter takes us deep into the physiology, down into the brain stem.

We have to look at a structure called the medial superior olive, or MSO.

The MSO.

And this is the first stop in the brain where the signals from the left ear and the right ear actually meet up.

It's the first point of convergence.

Before the MSO, the auditory nerve from the left ear goes to the left cochlear nucleus and the right goes to the right.

They're on separate tracks.

But at the MSO, they come together.

So this is where the comparison has to happen.

This is it.

Now, for a very long time, since 1948, actually, the leading theory for how the MSO did this was called the Jeffress model.

This was the neural ladder idea, right?

I've seen diagrams of this.

It's a really cool concept.

It's a beautiful, elegant concept.

Lloyd Jeffress proposed that the MSO is set up like a ladder.

You have neurons from the left ear coming in one side and neurons from the right ear coming in the other.

The signals basically race towards each other along the rungs of this ladder.

Exactly.

And here's the clever part.

The axons, the neural wires, have different lengths.

The axon acts as a delay line.

So if the sound came from the far left, the signal from the left ear gets a head start.

It travels further down its side of the ladder before the signal from the right ear, which started later,

finally catches up.

So they collide.

They meet at a specific

And the neurons on those rungs are what he called coincidence detectors.

They only fire if they get a signal from the left and the right at the exact same time.

So if rung one fires, the brain knows the found was far left.

If the middle rung fires, it knows the sound was dead ahead.

It brilliantly turns a timing problem into a witch neuron fired problem, a place code.

It's so elegant.

It's simple.

It explains all the psychophysical data perfectly.

It's a textbook perfect model.

But I can hear a but coming.

A big but.

The text is very clear on this.

While this ladder like structure is definitely found in some birds, barn owls have a beautiful system that looks just like the Jeffress model.

The evidence for it in mammals is well, it's elusive.

Elusive is a very polite scientific term for we can't find it.

Pretty much.

Scientists are now quite skeptical.

And the book details some of the newer evidence, particularly from researchers like Joris and Vanderheiden,

that suggests mammals might be doing something a bit different.

Something that actually involves the physics of the cochlea itself.

This is the traveling wave idea.

Yes.

Remember that when a sound enters the ear, it creates a physical wave that travels down the basilar membrane inside the cochlea.

And that wave takes time to move from the base, where we hear high frequencies, to the apex for low frequencies.

OK, so there's a built in mechanical delay right there in the ear itself.

And the new theory is that the brain is clever enough to use the timing of that mechanical wave as part of its calculation.

It might be tuning into what are called interaural frequency differences.

OK, you're going to have to break that one down for me.

It's a bit tricky, but think of it this way.

Instead of just measuring the start time of the sound in each ear, like a stopwatch,

the brain might be comparing the phase of the sound waves.

The phase, like where the wave is in its up and down cycle.

Exactly.

If the left ear hears the sound first, the waves hitting it will be at a slightly different point in its cycle compared to the wave that hits the right ear a moment later.

The brain might be comparing those phase differences, which are themselves created by the traveling wave in the cochlea.

So it's not just a simple race between two nerve signals.

It's a much more integrated system where the physics of the ear are part of the calculation.

It suggests the whole system is working together.

It's not just ear sends data brain calculates.

The ear is helping with the calculation before a signal is even sent.

That is heavy.

OK, so that's our first clue, time or ITD handled by the MSO.

But the book says that's not enough, especially for certain kinds of sounds.

Right.

Time is great, but it has a weakness.

So the brain has a second clue,

loudness,

or more technically the interaural level difference or ILD.

This one seems a lot more straightforward.

If a sound is on my left, it's going to be louder in my left ear than my right.

It is.

But the reason why is really important.

It's not just because of the tiny extra distance to the far ear.

It's because your head is in the way.

My head casts a sound shadow.

Your head physically blocks the sound waves from getting to the far ear.

But, and this is a critical point the chapter makes,

the shadow only works for certain sounds.

It's completely dependent on the frequency.

And the text has a great analogy for this with the ocean waves.

I love this one.

It makes it so clear.

Imagine you're at the beach and there's a big wooden piling like a thick post sticking out of the water.

OK.

Now picture a huge long ocean swell coming in, a very low frequency wave with a huge distance between the crusts.

When that giant wave hits that skinny little piling, what happens?

Nothing really.

The wave just kind of wraps around it and keeps going.

The piling is too small to stop it.

Exactly.

The water on the other side is barely disturbed.

Now imagine tiny choppy little ripples on the water, very high frequency waves.

What happens when they hit the piling?

They get blocked, they bounce off, and the water directly behind the piling stays calm.

It creates a shadow.

And that is a perfect model for how your head interacts with sound.

Low frequency sounds like a deep bass note or a tuba have very long wavelengths.

They just wrap right around your head as if it wasn't even there.

No sound shadow at all.

So for low pitched sounds, the loudness in both ears is basically the same.

The ILD is useless.

Completely useless.

The book says it's pretty much ineffective for any frequency below a thousand hertz.

But high frequency sounds, like a simple crash or a high pitched whistle, those are the ripples.

They get blocked by your head.

So my head casts a shadow for high frequencies, making the sound quieter in the far ear.

And this explains a really common experience with home audio systems.

You can put a subwoofer, the bass speaker, pretty much anywhere in the room, behind the couch, in a corner, and you can't tell where the bass is coming from.

Right.

It just seems to fill the room.

Because the low frequency waves are so long, they're just wrapping around your head.

Your brain gets no ILD clue, and the ITD clues for continuous low tones are also ambiguous.

Your brain literally cannot localize it.

But the tweeter, the little speaker that makes the high pitched sounds, you know exactly where that is.

Because my head is casting a shadow for it, and this whole process is handled by a different brain structure, right?

The lateral superior olive or LSO.

Correct.

If the MSO is for time, the LSO is for level.

And it works on a really cool principle of competition.

It gets excitatory go signals from the ear on the same side, and inhibitory stop signals from the ear on the opposite side.

So it's like a tug of war.

A neural tug of war.

If the sound is louder on the left, the go signal from the left ear is really strong, and the stop signal from the right is weak.

The left side wins the tug of war, and the brain reads that as the sound is on the left.

So we've got two complementary systems.

ITD for the low frequencies, and ILD for the high frequencies.

It's called the duplex theory of sound localization.

And between the two of them, we should be able to pinpoint any sound on the horizontal plane.

We should be able to.

But of course, nature has a flaw.

The textbook introduces this problem called the cone of confusion.

I love that name.

It sounds like something out of a spy movie.

But it's a very real geometric problem for the brain, isn't it?

It's a huge problem.

It's an ambiguity that comes from trying to map a 3D world with essentially 2D inputs.

Imagine a sound source that's say 45 degrees to your left and slightly in front of you.

Okay, got it.

Now imagine another sound source that's 45 degrees to your left, but slightly behind you.

Right.

If you were to take out a tape measure and measure the distance from each of those two points to your left ear and your right ear, the difference in distance would be identical.

So the interval time difference is exactly the same for both spots.

Exactly the same.

And the sound shadow cast by your head, also exactly the same.

So the brain receives the exact same ITD and ILD values for a whole cone of locations in space.

It can't tell the difference between front left and back left, or between high left and low left.

That seems like a pretty major design flaw.

I mean, knowing if the predator is in front of me or behind me is kind of important for survival.

Critically important.

And the solution that our brains have evolved is, as the text points out, almost embarrassingly simple.

We just move our heads.

We move.

The moment you make even a tiny turn of your head, the geometry completely changes.

If the sound was in front of you and you turn your head to the right, your left ear gets a little closer to the sound and the ITD changes.

If the sound was behind you and you make the same head turn, your left ear might get further away.

So the brain isn't just taking a single snapshot of the data.

It's looking at how the data changes over time as we move.

It's integrating sensation with action.

It resolves the ambiguity by getting more data points.

This is why you see a dog cock its head when it hears a strange noise.

It's actively sampling the environment to break the cone of confusion.

That makes perfect sense.

But there's another piece to this puzzle, isn't there?

Particularly for telling up from down, we don't just have two holes in our head.

We have these weird folded crinkly things on the side, our pinet.

The pinet.

Yeah, for the longest time, people just thought of them as, you know, funnels.

They just catch more sound.

Like when you cup your hand behind your ear to hear better.

But they're way more sophisticated than that.

Oh, much more.

If they were just simple funnels, they'd be smooth, like a cone.

But look at your ear.

It's got all these ridges and folds and this bowl called the concha.

It's a complex bumpy landscape.

And all those bumps and ridges actually do something to the sound before it goes in.

They filter it.

This effect is called the directional transfer function or DTF.

When a sound comes from above you, like a bird in a tree, it bounces off the top ridge of your pinna in a very specific way before it enters the ear canal.

When sound comes from below, it bounces off different parts.

So the actual shape of my ear changes the quality of the sound, depending on what angle it's coming from.

Precisely.

It acts like a tiny built -in equalizer.

A sound coming from directly above might have certain frequencies boosted and others cut.

The text explains that it often creates a spectral notch, a sharp drop in intensity at a specific frequency.

Maybe a sound from above has a notch at 8 ,000 Hz, while a sound from below has a notch at 6 ,000 Hz.

So up literally sounds different from down in a way that has nothing to do with pitch.

Exactly.

And your brain learns that code.

It knows that spectral notch at 8 ,000 Hz means the sound source is elevated.

This finally explains something that's always bothered me about headphones.

When I listen to music on headphones, it sounds like it's coming from inside the middle of my head.

It doesn't sound like it's out in the room with me.

That's a perfect example of this.

Headphones bypass your pinna entirely.

They shoot the sound directly down your ear canal.

So you lose all those DTF cues.

Your brain gets the sound, but it's missing the spatial signature that your outer ear provides.

So it defaults to this inside the head perception.

Unless you listen to one of those binaural recordings.

Right.

Those are recordings made with tiny microphones placed inside the ears of a dummy head that has realistic pinna.

So they capture all those reflections and spectral notches.

And when you listen to that on headphones, it is spooky.

It's uncanny.

You'll hear a sound and swear someone is standing behind you.

And speaking of the pinna and how important it is, we have to talk about the Vulcan Ear Experiment.

Hoffman, 1998.

This is one of my favorite studies in the whole book.

It is an absolute classic in neuroplasticity.

Hoffman and his team wanted to ask a really fundamental question.

Is this DTF map, this code of notches mean up, hardwired from birth, or do we learn it from experience?

So how did they test that?

They couldn't just change people's ears, so they did the next best thing.

They took adult volunteers and had them fitted with these custom plastic molds that sat inside the concha, the bowl of the ear.

So they basically filled in all the natural nooks and crannies.

They smoothed them out.

They effectively gave these people new, differently shaped ears.

They completely changed the physics of how sound entered the ear canal.

And what happened when they first put them in?

Immediate chaos.

The participants' ability to tell up from down, their elevation judgment was completely gone, wiped out.

They could still tell left from right because the ITD and ILD were unaffected.

But their up -down localization was no better than chance.

But the key part of the experiment is that they made them wear these molds for weeks on end.

They lived with their new vulcan ears.

And every day, the researchers brought them into the lab and tested their localization ability.

And amazingly, day by day, they started to get better.

So the brain was adapting.

The brain was learning the new rules.

It started to notice the new patterns of spectral notches created by the plastic molds and began to associate them with specific locations in space.

It was writing a new map.

And after about six weeks, they were pretty much back to normal.

Almost as good as they were before.

They learned to hear with their new ears.

But then comes the twist.

The part that really just blows your mind.

They took the molds out.

And logically, you'd expect them to be confused again, right?

You'd think, OK, the brain overwrote the old map with the new one.

So now they have to relearn how to use their original ears.

That would make sense.

It's what everyone expected.

But it's not what happened.

The moment the molds were removed, the participants could localize perfectly with their old natural ears immediately.

So the brain didn't overwrite the old map.

No.

It just added a second one.

The brain was holding two completely separate spatial maps in memory at the same time and could switch between them instantly.

That is absolutely incredible.

It really implies that what we perceive as space isn't a direct measurement.

It's a learned software model running in our heads.

That's it.

Exactly.

Our auditory reality is unbelievably plastic and constantly being calibrated by experience.

It makes you wonder what else we perceive is just a habit that our brain has learned over time.

OK, so we've covered left, right, azimuth and up down elevation.

But we live in a 3D world.

There's one dimension left.

Distance.

How do we know how far away that owl is?

And this is where we're weakest.

The textbook is pretty clear that humans are generally worse at judging absolute distance than we are at direction.

We tend to have this compression effect.

We overestimate how far away close sounds are and underestimate how far away distance sounds are.

We kind of squish the world.

But we do have a few cues we can use.

They're more like rules of thumb, heuristics.

And the first one is the most obvious, relative intensity.

Louder means closer.

Quieter means farther away.

Simple.

It seems simple, but it relies on a fundamental law of physics called the inverse square law.

Let's break down the math on that because the book gives a really clear example with a number that's easy to remember.

The physics states that as sound travels out from a source, its intensity decreases with the square of the distance.

But the simple rule of thumb is this.

Every time you down your distance from a sound source, the sound pressure level drops by 6 decibels.

6 decibels.

That's a pretty significant and consistent drop.

It is.

And the everyday life box in the chapter uses the example of a wind farm to illustrate this.

If you're standing right at the base of a giant wind turbine,

it's pretty loud.

Maybe 100 decibels.

Like being next to a lawnmower.

Okay, loud.

Now, walk about 300 meters away.

The sound level is dropped to around 43 decibels.

Double that distance again to 600 meters.

And it drops another 6 decibels down to 37.

And 37 decibels is very quiet.

That's like a library.

It's extremely quiet.

And this is where the text brings up the practical issue of noise complaints.

At 600 meters away, your kitchen refrigerator, which hums along at about 40 decibels, is actually objectively louder than that wind turbine.

So the fridge masks the sound of the turbine.

It can, but there's a complication.

Remember our ocean wave analogy.

That 6 decibel rule works great for the high and mid frequencies.

But the really low frequencies, the infrasound, they don't get absorbed by the air as easily.

They wrap around obstacles like trees and hills and even the walls of your house.

So even if the overall decibel level is low, you might still perceive that low frequency thrumming or pulse.

Exactly.

But loudness isn't our only cue.

We also use spectral composition to judge distance.

This is the muddiness factor.

The air itself acts like a filter, right?

It's not perfectly transparent to sound.

No, it's a low -pass filter.

It lets the low frequencies pass through, but it absorbs and scatters the high frequencies.

So over very long distances, the air literally strips the high frequencies out of a sound.

And the book uses the perfect example for this.

Thunder.

It's the best example.

If lightning strikes a tree right across the street from you, what do you hear?

A really sharp crack.

A snap.

All high frequencies.

A very sharp, sudden sound.

But if that same lightning bolt strikes 5 miles away, what do you hear then?

A low, rolling rumble.

A boom.

All the crack is gone.

The air has absorbed it.

So your brain has learned this rule.

If a sound is muddy and lacks high -frequency detail, it must be far away.

It's the auditory equivalent of aerial perspective in painting.

You mean how distant mountains are painted as being blue and hazy?

Same exact concept.

The atmosphere is filtering the light, just like it filters the sound.

And the last cue for distance is more for indoor spaces, right?

The ratio of direct versus reverberant energy?

Yes.

If you're standing a foot away from me in a room, most of the sound that reaches your ears has traveled in a straight line from my mouth to you.

That's direct energy.

But if you're standing in the back of a large church, the sound from my mouth first travels up to the ceiling, bounces off the back wall, the side walls, the floor, and then it gets to you.

It's all echo and reverberation.

Exactly.

So your brain is constantly calculating the ratio.

A high proportion of direct sound means the source is close.

A high proportion of reverberant sound means the source is far away.

OK, so that really covers the wear problem.

We've built this amazing 3D map around ourselves, using time differences, loudness differences, the shape of our ears, and even echoes.

But knowing where the owl is doesn't do you much good if you don't know that it is an owl.

Which brings us to the second major part of our journey, complex sounds.

The what problem?

Because the real world isn't made of pure tones from a hearing test.

It's made of voices, music, cars.

All complex sounds.

And the building blocks of these sounds are harmonics.

Most natural sounds have what's called a fundamental frequency.

That's the lowest frequency in the sound, and it's what determines the pitch that you perceive.

So if I play a C on a piano, that note's frequency is the fundamental.

Right.

But the piano string is also vibrating at other frequencies at the same time.

These are the harmonics.

And they're always integer multiples of the fundamental.

So if the fundamental is 200 hertz, the harmonics will be 400 hertz, 600 hertz, 800 hertz, and so on.

A neat mathematical stack.

A very neat stack.

And this mathematical relationship leads to one of the most bizarre and wonderful allusions in all of hearing called the missing fundamental effect.

This is the phantom pitch.

I remember reading about this.

And it feels like a genuine magic trick.

It really does.

So let's say I play you a sound that is made of three pure tones.

400 hertz, 600 hertz, and 800 hertz.

And that's it.

What pitch would you expect to hear?

Well, the lowest tone I'm actually hearing is 400 hertz.

So I guess my brain would say the pitch is 400 hertz.

That's what you'd think.

But you don't hear 400 hertz.

You hear a clear, unambiguous pitch of 200 hertz.

Even though there is no 200 hertz sound wave physically present in the sound.

Even though it is completely physically absent from the signal, your brain just invents it.

Why?

Why would the brain just make up a note that isn't there?

The book explains this using the temporal code.

It's all about timing.

Let's look at the numbers.

A 400 hertz wave has a peak every 2 .5 milliseconds.

A 600 hertz wave peaks every 1 .66 milliseconds.

An 800 hertz wave peaks every 1 .25 milliseconds.

Okay, so they're all firing at different rates.

Yeah.

But look at when their patterns align.

If you map them all out, they all sync up and peak at the exact same moment every 4 milliseconds.

And a period of 4 milliseconds corresponds to a frequency of?

250 hertz.

Wait, my math is off.

The example from the book is 500, 750, and 1 ,000 hertz.

My apologies.

So a 500 hertz wave peaks every 2 meters.

750 marry 1 .3 meters.

1 ,000 every 1 meters.

They all align every 4 meters.

And a 4 meters period is 250 hertz.

Got it.

So the neurons in your auditory nerve are firing in a synchronized volley every 4 milliseconds because that's the common repeating pattern of the combined waves.

The brain doesn't care about the individual ingredients as much as it cares about the overall pattern.

It's pattern matching.

The brain says, hey, I'm getting a signal that repeats every 4 milliseconds.

That means the source must have a fundamental frequency of 250 hertz.

It's filling in the blank.

It reconstructs the most likely cause of the signal it's receiving.

It's an inference.

So that explains pitch.

But what about timbre?

Why does a saxophone playing a middle C sound so different from a trombone playing the exact same note at the same loudness?

That is timbre.

It's the quality, the character, the flavor of a sound.

And it comes from the relative strength or amplitude of all those harmonics we just talked about.

The recipe.

It's the recipe.

A trombone might have a really strong third harmonic and a weak fourth one.

A saxophone might be the opposite.

That unique spectral shape is the instrument's fingerprint.

But it's not just the static recipe, is it?

It's also about how the sound changes over time, the attack and decay.

This is such a crucial point.

We often think of sounds as static things, but they are dynamic events.

The backward piano example in the chapter is the perfect way to understand this.

If you record a single piano note and then play the recording backward.

The frequency content is identical.

The fundamental is the same.

The harmonics are the same.

Their relative amplitudes are the same.

The recipe hasn't changed at all, but it doesn't sound like a piano anymore.

It sounds like an organ or an accordion or something.

Exactly.

And the question is why?

It's because you've reversed the attack and decay.

A piano has a very sharp percussive attack.

A hammer hits a string, you get a bang, and then the sound slowly fades away.

But an accordion is a wind instrument.

It has a gentle onset.

The sound swells in.

A whoosh instead of a bang.

By playing the piano note backward, you've turned its sharp attack into a gentle swell.

And the brain hears that gentle swell and says, that's not a percussive instrument.

That must be a wind or bowed instrument.

The classification of the entire object changes just based on the first few milliseconds of the sound.

It shows that timbre is not just about the spectra frequencies.

It's about the temporal envelope, how that spectrum changes over time.

So we know where an object is, and we know what it is.

Now comes what feels like the hardest task of all.

Separating that object from all the other sounds happening at the same time.

This is auditory scene analysis.

We're right back in the world of glass.

We have all these sound sources overlap, including that one jumbled waveform.

The brain's job is to segregate them into meaningful auditory streams.

And to do it, it uses a set of rules very similar to the Gestalt principles of vision.

Like the principle of similarity.

Right.

In vision, things that look similar group together.

In hearing, sounds that have a similar timbre or pitch tend to group together.

This is the basis of what's called auditory stream segregation.

And the book uses the galloping illusion to demonstrate this.

This is a really fun one.

If you play a high tone and a low tone and you alternate between them slowly, high,

low, high, low.

What do you hear?

One thing, a single stream of sound that's jumping up and down, a galloping rhythm.

One auditory stream.

But now if you speed up that alternation faster and faster, something amazing happens.

The brain can't keep up and it splits them.

The perceptual experience just breaks apart.

The brain decides that the jump in frequency is too large to be happening that quickly from a single object.

So it splits them into two separate streams.

You suddenly hear a high pitch stream beeping along up here, and a completely separate low pitch stream beeping along down there.

The galloping rhythm disappears entirely.

What's amazing is the text mentions that composers knew this intuitively for centuries.

Johann Sebastian Bach.

Bach was an absolute master of auditory hacking.

In his famous Toccata and Fugue in D minor, he's playing on a single instrument, a pipe organ.

But there are passages where he alternates between very high and very low notes so rapidly that the listener perceives two independent melodies playing at the same time.

He's creating virtual polyphony, using the limitations of the listener's brain to create a richer musical texture.

He's playing our auditory system like an instrument.

It's genius.

Then there's also the principle of common fate.

This is a huge one.

In vision, things that move together belong together.

In hearing, it often means things that start and stop together, or change together belong together.

It's about common onset.

The text mentions a study by Rasch that it's easier to distinguish notes in a chord if their onsets are slightly staggered.

Even by just 30 milliseconds, if all the notes start at the exact same instant, they fuse into one sound.

If you stagger them just a tiny bit, you can hear the individual notes.

And the bottle example from the book illustrates this really well for more complex sounds.

Yes, if you drop a glass bottle and it bounces on the floor, you hear it as one object,

a bouncing bottle.

Because all the frequencies, all the harmonics created by the glass arising and falling together, they have a common fate.

But if the bottle shatters?

The common fate is broken.

Now, you have dozens of little shards of glass, each bouncing independently.

They have chaotic uncorrelated onsets and decays, and you instantly hear that chaos is shatter.

Your brain's grouping mechanism has failed, and it correctly interprets that as the object breaking apart.

Now, I want to touch on how hearing interacts with our other senses, especially vision, because sometimes our eyes and ears can tell us different things.

And the bouncing balls illusion is the key experiment here.

It's a fantastic demonstration of multi -sensory integration.

The setup is simple.

You see two discs or balls on a screen.

They move toward each other.

They meet in the middle.

They pass through each other, and they continue on their way.

OK, so without any sound, it just looks like they're passing through each other like ghosts.

Right.

The visual system defaults to the simplest interpretation, which is continuous motion.

But now you play a sharp click sound at the exact moment the two discs meet in the middle.

And suddenly you see them bounce.

Your perception of the visual event completely changes.

You don't just think they bounced.

You physically see them collide and retreat.

The sound has completely altered your visual reality.

That is so powerful.

The ear is literally hacking the eye.

It makes a lot of sense when you think about it.

Our auditory system is the master of time.

It has incredible temporal precision.

Our visual system is a bit slower.

So when there's an ambiguity about the exact timing of an event,

the brain trusts the ear over the eye.

OK, so we're grouping sounds.

We're building this scene.

But what happens when parts of a sound are covered up by other noises?

This brings us to the continuity and restoration effects.

This is the auditory version of what's called occlusion in vision.

Like seeing a person standing behind a picket fence.

You don't perceive them as being sliced into vertical strips.

Your brain assumes the person is a whole continuous object that is simply being blocked.

So our ears do the same thing.

They do.

The classic experiment involves a tone glide.

Imagine a sound like a slide whistle, smoothly going up in pitch.

Now in the middle of that glide, imagine I play a loud burst of static of white noise.

So during the noise, I shouldn't be able to hear the tone.

It's being masked.

Exactly.

The tone is physically absent or completely covered.

But that's not what you perceive.

You hear the tongue gliding smoothly and continuously through the noise, as if the noise was just a brief interruption in front of it.

And the text makes it clear this isn't just you guessing.

Your brain actually hears the missing part.

It does.

They've done signal detection studies where they asked listeners to tell the difference between two sounds.

In one, the tone glide is physically continuous behind the noise.

In the other, there's a physical gap in the tone that is filled with noise.

And people can't tell the difference.

They're at chance level.

The perceptual restoration is so complete, so convincing, that it's indistinguishable from reality.

And this doesn't just work for simple tones.

It worked for complex sounds like speech, too.

The novel versus nozzle experiment.

This is a landmark study because they didn't just ask people what they heard.

They looked directly at their brain activity.

They were working with surgical patients who had electrodes on the surface of their brains.

And the stimulus was a word like novel or nozzle.

But they replaced the middle consonant with a burst of noise.

So the person just heard no -sh -shell.

Exactly.

And depending on the context of the sentence, the listeners would report hearing the full complete word.

They'd hear novel or nozzle.

But here's the crazy part.

The brain activity in the auditory cortex didn't look like the activity for the incomplete no -shell sound.

It looked like the brain activity for the restored word.

If they perceived novel, their brain lit up as if it had actually heard the V sound.

The brain isn't just passively receiving the sound.

It's actively creating a reality.

It's filling in the neural gaps based on its expectations in the context of the world.

It's why you can have conversation in a noisy restaurant.

You aren't actually hearing every single sound of every word.

Your brain is restoring the missing pieces in real time.

Even animals do this.

The book mentions a study with starlings.

Yes.

Starlings are songbirds.

And the studies show they could perceptually restore missing parts of their own species songs.

But, and this is a key point, they were much better at it for familiar songs than for unfamiliar ones.

Which brings us back to the role of learning and familiarity.

The McDermott study on novel sounds.

This is a great one.

It shows that our ability to pull sound out from a mixture is massively dependent on experience.

Listeners were played a mix of sounds and couldn't pick out a new novel sound until they had heard it presented by itself a few times.

We need a template.

We need to know what we're listening for.

We need a template to latch on to.

Our ability to analyze a scene isn't just bottom -up physics.

It's heavily top -down based on what we've learned to expect.

Okay, this brings us to the final section of the chapter.

We have all this rich complex information coming in, a fully constructed scene.

How do we decide what to pay attention to?

Auditory attention.

And the chapter frames this as having two modes.

The sentinel and the selector.

The sentinel sounds dramatic.

That's the startle reflex.

It's a deeply primitive, hardwired response in the brainstem.

A sudden, loud, unexpected sound triggers a full -body motor response in as little as 10 milliseconds.

It's faster than conscious thought.

It's the brain's alarm system screaming something big and potentially dangerous just happened.

Get ready.

But most of the time, we're not in sentinel mode.

We're in selector mode.

The cocktail party effect.

Our remarkable ability to focus our auditory attention.

To stand in a noisy room and choose to listen to one conversation while filtering out all the others.

But with this amazing ability comes a cost, which the text calls inattentional deafness.

This is the auditory version of that famous invisible gorilla video.

It is the exact same principle.

The study they describe is fantastic.

It had people listen to a recording of Thus Spoke Zarathustra, the iconic theme from 2001, A Space Odyssey.

And their task was very specific.

Count the number of times the timpani drums are struck.

OK, so they're hyper -focused on the drums.

Very focused.

Now, during the piece,

the researchers digitally mixed in a completely new instrument that wasn't in the original recording.

They mixed in a wailing electric guitar solo.

An electric guitar solo in the middle of a classical orchestra piece.

That should be unbelievably obvious.

You would think it would be impossible to miss.

But a huge number of the participants who were busy counting the drum beats completely missed it.

They were asked afterward, did you hear anything unusual?

And they said no.

They were auditorily deaf to a guitar solo because their attentional spotlight was pointed elsewhere.

That just goes to show that hearing isn't just about what comes into our ears.

It's about what our brain decides to process.

The tension is the gatekeeper.

And if the gate is closed to a particular sound stream, for all intents and purposes, that sound doesn't exist for your conscious mind.

And the chapter makes it clear we can't really multitask here, can we?

We can't actually listen to two conversations at once.

No, we absolutely can't.

We can switch our attention between streams very quickly, like at a speed dating event where you're kind of listening to your date, but also trying to overhear the person at the next table.

But there's always a switch cost.

You lose a bit of information during the transition.

We are fundamentally serial processors of complex information like speech.

So to bring this all together, we started this journey with a complete mess.

A single jumbled pressure wave, the world of glass.

And throughout this deep dive, we've seen how the brain systematically deconstructs it.

It uses pure physics, the ITD and ILD, to build a spatial map.

It uses the custom geometry of our ears, the DTF, to figure out up and down.

It uses mathematical pattern matching, like the missing fundamental, to figure out pitch.

And it uses top -down knowledge and prediction to fill in the gaps and restore sounds that are hidden by noise.

It is a massive, active, constructive process happening every single millisecond that we are awake.

It really is.

And I think if there's one final thought to leave you with, it's to go back to that vulcan -ear study.

The plasticity,

the brain holding two maps.

The fact that our perception of space is not a fixed, hardwired thing, but a learned map that can be rewritten and updated is profound.

It suggests that our auditory reality isn't a direct window onto the world.

It's a constantly updating software model.

If your ears change shape tomorrow, your world would be chaos.

But in six weeks, you would have learned a new reality.

It really makes you wonder what other parts of our perception are just deeply ingrained habits that could, in theory, be relearned.

A fascinating thought to end on.

That's Chapter 10, Hearing in the Environment.

A huge thank you for tuning in.

This has been the Last Minute Lecture Team.

We'll see you in the next deep dive.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Sound localization operates through a sophisticated system of binaural processing that allows the auditory cortex to pinpoint acoustic sources in three-dimensional space. The brain extracts timing differences between the two ears, or interaural time differences, which are processed in the medial superior olive and provide precise information about horizontal positioning relative to the listener. Simultaneously, the lateral superior olive analyzes interaural level differences to extract additional directional cues, particularly effective for higher frequencies. Together these mechanisms enable azimuth calculation, though they encounter inherent limitations known as cones of confusion, where different spatial locations yield identical timing and intensity signatures. Resolution of this ambiguity requires the directional transfer function, a mechanism that leverages the acoustic filtering properties of the pinnae, head, and torso to extract elevation information through spectral cues unique to each location. Distance perception relies on multiple acoustic indicators, including the inverse square law that governs sound intensity attenuation with distance, the progressive loss of high-frequency energy during propagation, and the balance between direct sound energy and reverberant energy bouncing off environmental surfaces. Complex acoustic signals derive their identity from harmonic spectra, where the fundamental frequency typically anchors pitch perception while overtones establish timbre. The auditory system demonstrates remarkable flexibility through the missing fundamental effect, wherein the brain constructs pitch perception from temporal patterns alone even when the lowest frequency component is physically absent from the acoustic stimulus. Auditory scene analysis describes how the brain segregates overlapping acoustic streams through principles including frequency and timbre similarity, common onset timing, and spatial consistency across sources. Perceptual restoration effects enable the brain to bridge gaps in acoustic information when loud noise temporarily masks a signal, allowing seamless continuation of heard sequences through predictive completion. Auditory attention operates along a spectrum from automatic, unselective responses like the acoustic startle reflex to deliberate selective listening that enhances focused perception in complex environments, though this focused attention can paradoxically produce inattentional deafness where unexpected background sounds escape awareness entirely.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 10: Hearing in the Real World

Related Chapters