Chapter 7: Attention & Scene Perception

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture!

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Hello everyone and welcome back to the Deep Dive.

Hey there!

We are back with the Last Minute Lecture team, ready to tackle another dense topic and hopefully make it make sense.

Yeah and this one is a big one.

It really is.

Yeah.

Today we are looking at something that is literally happening to you right now in this very second, but you probably aren't even aware of how

incredibly complicated it is.

It's one of those things that feels completely effortless until you try to explain how it works.

Right.

And then you realize it's kind of a miracle your brain doesn't just short circuit every time you open your eyes.

We are talking about attention and scene perception.

We're working our way through chapter seven of the Sensation and Perception textbook, the sixth edition.

And honestly, this material is a bit of a reality check.

We walk around thinking we are, you know, recording the world in 4k videos, seeing everything, noticing everything, but the research says,

well, not so much.

Not even close.

No.

If we're being honest, your perception of the world is less like a 4k video camera and more like a flashlight beam bouncing around a pitch black room while your brain furiously paints in the details that things should be there.

That is slightly terrifying.

It's a little unsettling, yeah.

But that's the mission for this deep dive.

We want to understand the speed limits of the human mind.

We want to figure out why we can find a friend in a crowd but, you know, lose our keys on a messy table.

We're going to talk about how your brain decides what matters, the weird glitches that happen when it gets overwhelmed, and yes, we are going to talk about spitting fish.

The spitting fish are essential.

Can't skip the fish.

I can't wait for the fish.

But before we get there, we have to start with the problem.

The sheer, unadulterated noise of the world.

The chapter opens by setting the stage with this huge problem of information overload.

It's the fundamental constraint of the world.

And the text opens with a really, really clever demonstration of this.

It asks you to imagine a simple task.

Reading.

Which, presumably, if you're using this textbook, you know how to do.

Hopefully, yeah.

One would hope.

But then it poses a challenge.

It presents a visual demonstration.

Let's just call it the two book problem.

Imagine you have two columns of text side by side.

Okay.

The text is large.

It's perfectly clear.

Your eyesight is fine.

The challenge is read both columns at the exact same time.

And just to be clear, we don't mean looking back and forth really fast.

No, not at all.

We mean processing the meaning of the left sentence and the right sentence simultaneously in a single mental instant.

I actually tried this with the text provided in the notes.

I stared right at the center of the page and I could see the letters.

You know, I could see the shapes.

But as soon as I tried to actually read the cat sat on the map on the left, the right side just became visual noise.

That is the bottleneck.

That's it right there.

Acuity isn't the problem.

Your eyes are physically capable of receiving the photons.

The problem is processing capacity.

The brain cannot extract meaning from two distinct streams of syntax at the same time.

It's just a hard limit.

And there's another illustration in the chapter that really drives this home visually.

It's a Japanese screen painting.

It's this gorgeous but just totally chaotic scene.

The Tales of the Hike.

Right.

It's incredibly busy.

I mean, you have crowds of people, samurai, horses, carriages, umbrellas, clouds, buildings.

It is a visual feast.

But if I asked you to recognize everything in that painting all at once, you'd freeze.

It's impossible.

You can see a crowd, but you don't recognize the individual faces, the patterns on the kimonos, and the type of swords all in a single instant.

No way.

However, and this is the key, if I tell you, the horse.

Well, then I can do it.

Instantly.

Suddenly, you can slice through that chaos.

Your brain can just ignore the umbrellas, ignore the samurai, and lock onto the horse.

So the question is, why is the brain built this way?

I mean, why can't I just have a supercomputer brain that processes the samurai and the horse and the clouds all at once?

Well, it comes down to biological cost.

The text breaks down the math of perception a little bit.

Recognizing a single object, let's take that horse or, I don't know, an elephant,

it isn't free.

It costs energy.

Your brain has to process its edges, its color, its lighting, its 3D orientation, its size.

It takes a sizable chunk of neural computing power.

A lot of work for just one thing.

And if you tried to do that for every pebble and every leaf and every person in your field of view simultaneously?

The text suggests that to handle that kind of computational load, you would literally need a brain larger than your own head.

And since we are constrained by the size of our skulls, we had to come up with a workaround.

And that workaround is attention.

Attention is the solution to information overload.

It's the filter.

But we have to be really careful with definitions here.

Attention is a bit of a suitcase word.

We just throw a lot of different things inside it.

The chapter breaks this down nicely.

It's not just one switch.

It's more like a whole family of mechanisms.

Exactly.

You have distinctions like internal versus external attention.

Internal attention is when you're, say, debating what to eat for dinner inside your head.

You're attending to a thought.

Right.

External attention is attending to the world, the traffic light, the text on a screen.

And for this deep dive, we are focusing almost entirely on external.

Then there's overt versus covert.

I felt like overt is the easy one.

Overt is what you do naturally all the time.

You want to look at something so you move your eyes.

You point your fovea, that high resolution center of your eye, right at the target.

But covert is the sneaky one.

Covert attention is looking out of the corner of your eye.

It's a great way to put it.

Imagine you're at a really boring dinner party.

Okay.

You are staring politely at the person talking to you, but your attention is actually focused on the much more interesting conversation happening at the next table.

Your eyes haven't moved, but your mind's eye has shifted.

I do that way too often.

Guilty.

And then we have divided versus sustained.

Divided attention is basically the multitasking myth, you know, trying to read the script while listening to a song with lyrics.

Doesn't work?

No.

And sustained attention is vigilance, just watching a pot waiting for it to boil or a radar operator watching a screen for a tiny blip.

But the main character for today, the one that really governs how we navigate the world,

is selective attention.

Oh, selective attention.

That's the stimulus out of all that noise and give it priority processing.

And to understand how that works, psychologists came up with this metaphor that has stuck around for decades, the spotlight.

The spotlight.

I like this because it's so easy to visualize.

But to prove how this spotlight works, we have to go into the lab.

There's a classic experiment discussed in the text called the Posner queuing paradigm.

This is foundational.

I mean, if you take a Psych 101 class, you will probably do this.

Imagine you are sitting in front of a center.

Fixate your eyes there.

Don't move them.

Okay, I'm staring at the cross.

My eyes are locked.

Good.

There are two boxes on the screen, one to the left of the cross and one to the right.

Your job is incredibly simple.

A target.

Let's say an X is going to appear in one of those boxes.

As soon as you see it, you hit a button.

That's it.

We are measuring your reaction time.

Hit.

Simple.

But we're going to mess with you, of course.

Before the X appears, we give you a cue, a hint.

A hint about where the X is going to be.

Yes.

But the hint can be true or false.

We call them valid or invalid cues.

A valid cue highlights the correct location.

For example, the right box might flash briefly and then boom, the X appears in the right box.

And I assume that makes me faster.

Much faster.

Your attention was already summoned to that location.

Your spotlight was already shining there, waiting.

Makes sense.

But then we give you an invalid cue.

The right box flashes, but the X appears in the left box.

Ooh.

That's dirty.

That's mean.

It is.

And your reaction time suffers.

You were significantly slower than if we hadn't given you a hint at all.

Why?

What's the cost?

Because you have to disengage your spotlight from the wrong location,

sweep it across the screen, and engage it on the correct location.

That whole process takes time.

That makes perfect sense.

It's like when you're driving and the car in front of you puts on their left turn signal, but then they suddenly turn right.

You freeze for just a second.

You are primed for left turn.

That's a perfect real world example of an invalid cue.

That little moment of confusion.

Now the text adds a layer of complexity here that's really interesting.

There are two different ways we can cue you.

Exogenous and endogenous.

I always get these mixed up.

Okay.

Think of the roots.

Exo means outside.

An exogenous cue is driven by the external world.

In the experiment, this is when the box flashes.

A bright flash of light summons your attention automatically.

You don't have to think about it.

It just grabs you.

Okay.

So that's a reflex, a bottom -up thing.

What about endogenous?

Endo means inside.

An endogenous cue comes from within.

It requires your internal interpretation.

So in the experiment, instead of a flash, we might put a red dot in the center of the screen.

And we tell you beforehand, red means the target will be on the right.

So I see the red dot and I have to think, okay, red, that means right.

Move attention right.

It's a choice.

Precisely.

It's a symbolic cue.

It's voluntary.

You have to push your attention there.

It's not pulled.

And does that difference matter?

I mean, besides one being a choice and one being a reflex?

It matters a lot for timing.

The text shows a graph of the time course of these shifts.

We call it SOA stimulus onset asynchrony.

Which is just a fancy way of saying how much time passes between the cue and the target.

Right.

And what we find is that exogenous cues, the flashes, are incredibly fast.

They work best when the target appears almost immediately, like 100 to 150 milliseconds later.

Super quick.

But endogenous cues, the symbols, they take longer to come online.

You need time to interpret the symbol so the benefit doesn't really kick in for a few hundred milliseconds.

But there is one weird exception mentioned in the text, isn't there?

Eye gaze.

Yes.

This is my favorite part of this section.

If you put a face in the center of the screen and the eyes on that face look to the left, that is technically a symbolic cue.

It's just an image.

It shouldn't summon your attention like a flash of light.

But it does.

It does.

Eye gaze triggers a shift in attention that is almost as fast and automatic as a peripheral flash.

It suggests that humans are evolutionarily hardwired to follow gaze.

If I look scared and look over your shoulder, you don't stop to interpret, hmm, his eyes are angled at 45 degrees, which likely indicates a threat.

You just look.

Because if you don't and it's a tiger, you're dead.

Exactly.

It's a survival mechanism baked right in.

Now, we've been using this spotlight metaphor.

Does the spotlight actually sweep across space?

Like, does it illuminate the space in between the left box and the right box as it moves?

That's a huge debate in the field.

The spotlight metaphor implies a moving beam, right?

But there are other theories.

Some propose a zoom lens where attention zooms out to cover the whole screen and then zooms in tight on the target.

Like a camera lens.

Exactly.

Others suggest a melting model where the spotlight just extinguishes at point A and instantly reappears at point B.

It essentially teleports.

I like the teleporting idea.

That feels more efficient.

But regardless of how it moves, there is one rule the spotlight follows that prevents us from going in circles.

The text calls it inhibition of return.

This is a crucial, crucial mechanism for searching.

Imagine you are looking for your keys.

You look on the coffee table.

They aren't there.

You move your eyes to the couch.

Inhibition of return is a brain mechanism that makes it harder for your attention to go back to the coffee table immediately.

It tags it as already checked.

Essentially, yes.

It inhibits that location for a short time.

If we didn't have this, we might get stuck in a loop, checking the most salient thing, like the bright white coffee table over and over again.

You'd be stuck on the coffee table forever.

Inhibition of return forces our search to move forward, to explore new territory.

It's like a built -in breadcrumb trail of been there, done that.

Okay, so that's how we select a point in space.

But usually, the world isn't just flashing boxes.

Usually, we are looking for something specific.

We are doing a visual search.

This is where we leave the simple queuing labs and get into the complex grids.

Visual search is the act of looking for a target among distractors.

Like looking for the red sock in the pile of white laundry.

Or where's Waldo?

Exactly.

And psychologists measure this using set size.

The total number of items and efficiency.

Efficiency is measured by the slope of the reaction time.

Basically, for every extra sock I add to the pile, how many milliseconds longer does it take you to find the red one?

And there are two main types of search here.

The easy one and the hard one.

The easy one is called a feature search.

Imagine looking for a single red bar in a field of blue bars.

That sounds trivial.

It is.

It exhibits the pop -out effect.

Because the target is defined by a single unique attribute.

In this case, color.

It doesn't matter if there are five blue bars or 500.

The red one just pops out at you.

So the search time is always the same?

Pretty much.

The slope is near zero.

We process all the items in parallel all at once.

So I can scan the whole pile at once.

It's pre -attentive.

You got it.

But then we have inefficient search.

And for this, imagine looking for a letter T hidden in a pile of letter Ls.

Oh, that sounds annoying.

A T is a horizontal line and a vertical line.

And L is a horizontal line and a vertical line.

Exactly.

They share the same basic features.

So you can't just look for verticalness or horizontalness.

You have to check how the lines are connected.

This forces you into what's called a serial self -terminating search.

Serial meaning one by one?

Yes.

You have to put your spotlight on item one.

Is this a T?

No, item two.

Is this a T?

No.

One after another.

That sounds exhausting.

It is.

The cost is high.

Adding just one more L to the pile adds about 20 to 30 milliseconds to your search time.

And if the target isn't there, it takes twice as long on average because you have to check every single item to be 100 % sure.

Wow.

And the Tide points out that familiarity plays a role here, too.

If you ask me to search for a specific Chinese character in a page of Chinese text.

If you don't read Chinese, it's a nightmare.

It's a serial search of complex shapes.

But if you are fluent, the character might pop out a bit more because it has meaning.

But this is where the chapter takes a delightful detour into the animal kingdom.

Yes.

The archer fish.

I was waiting for this.

This is such a cool study.

So the archer fish is this fish that hunts by spitting jets of water at insects hanging on branches above the surface.

Right.

It knocks them into the water.

And eats them.

It has to be incredibly visually accurate to do that.

So researchers in Israel thought, hey, let's see if fish have the same attention span as undergrads.

Pretty much.

That's the gist of it.

They train these fish to spit at targets on a computer screen placed above the tank.

If the fish spit at the right target, a little dispenser dropped a food pellet.

That is amazing.

So what did they find?

Did the fish pass the test?

They tested the fish on feature search, the easy one, and conjunction search, the hard one.

And the results, which are shown in figures 7 .10, were nearly identical to humans.

Identical.

Really?

Well, the fish were slower overall.

I mean, they're fish.

But the pattern was the same.

When the target was a unique color or size, it popped out for the fish.

The number of distractors didn't matter.

The search was efficient.

Flat line on the graph.

Flat line.

But when the fish had to look for a target that combined features like red and D moving, it became inefficient.

They had to check item by item, just like us.

So this suggests that the way we search this bottleneck of attention isn't just a human quirk.

It's a fundamental biological rule.

It seems so.

Whether you are a human looking for your car keys or a fish looking for a beetle, you are running the same basic software.

Now, in the real world, we rarely do these pure searches.

We use what the text calls guided search.

Right.

Real life isn't usually find the T and the Ls.

It's find the tomato in the salad.

We use shortcuts.

We don't look at every single leaf of lettuce.

We restrict our attention to things that are red and round.

So we use a basic feature to narrow the field and then do a serial search on just the red, round thing.

Exactly.

That's guided search.

It's a two stage process.

But we also use scene based guidance.

This is the kitchen faucet example.

Oh, this one I relate to so much.

If I ask you to find the faucet in a photo of a kitchen.

You don't scan the ceiling.

You don't scan the floor tiles.

You don't even scan the stove.

You look for the sink.

Because faucets live near sinks.

That's their natural habitat.

The sink acts as an anchor object.

Our knowledge of the world, our scene schema guides our eyes.

We predict where things should be, which allows us to ignore 90 % of the visual field.

We are constantly cheating the system to save energy.

But sometimes that system has to do something really, really difficult.

It has to put the world back together.

And this brings us to section three.

The binding problem.

This is one of those philosophical woe moments in perception.

We know from neuroscience that different parts of your brain process different things.

You have a part that sees color, a separate part that sees motion, a separate part that sees orientation, you know, vertical or horizontal.

So my brain has chopped the world up into little ingredients, like a deconstructed meal.

Right.

But you don't see a pile of ingredients.

You don't see redness and roundness.

You see a red round rolling ball.

You see a unified object.

The binding problem is how does the brain take all those separate signals and tie them back together to the correct object?

Did you know the red goes with the ball and not say the car next to it?

Exactly.

And this is where Anne Kreisman's feature integration theory comes in.

Yes,

she proposed that attention is the glue.

She argues there are two stages.

First, the pre attentive stage.

This is before you look directly at something.

The features are just floating around in your brain soup.

You register redness and roundness and moving leftness, but they aren't attached to anything yet.

They're just free floating attributes.

Exactly.

And then comes the attentive stage.

You shine your spotlight of attention on a specific location.

That spotlight grabs the red, grabs the round and binds them together into a red ball.

So attention literally builds the object for us.

It does.

It's the assembly line.

And we can prove this by breaking it.

There's a phenomenon called illusory conjunctions.

This is the hallucination part.

It is, in a way.

In the experiment, they flash a display of colored letters really, really fast.

Let's say a green H and a red X.

Then they hide it and ask, what did you see?

And people get it wrong.

They don't just guess random colors.

That's the key.

They often report seeing a red H and a green X.

They swap the coats.

They put the red on the H and the green on the X.

Exactly.

The brain saw redness and it saw H -ness, but it didn't have enough time to use the attention glue to stick them together properly.

So it just grabbed the nearest available feature and slapped it on.

That is wild.

It means that without attention, our reality is just a loose collection of parts.

We are actively constructing the world moment by moment.

And we do it with other senses, too.

The text briefly mentions speech perception.

You bind the visual image of lips moving with the auditory sound of a voice.

If they don't match up, you get weird illusions there, too.

It's all about binding features together.

Okay.

So we've talked about searching through space, but we also have to search through time.

Attending in time.

This is section four.

The researchers use a method called RSVP, which does not stand for please respond.

No, no.

It stands for rapid serial visual presentation.

Imagine I show you a stream of photos or letters flashing in the center of the screen, one after another, extremely fast, like 10 items per second.

And just a blur.

I can't imagine seeing anything.

It is.

But humans are surprisingly good at it.

If I tell you look for the letter X, you can spot it even at that speed.

Wow.

However, things fall apart if I ask you to spot two things.

This is the attentional blank.

It's a great name for it.

Here's the setup.

You watch the stream of letters.

You have to spot a white letter.

That's target one.

And then later in the stream, an X.

That's target two.

Okay.

White letter, then X.

Got it.

If the X appears immediately after the white letter, no problem.

You see both.

But if the X appears about 200 to 500 milliseconds after the white letter, you miss it.

I just don't see it.

Like, at all.

You don't see it.

It's as if your mind blinked.

Your eyes were open.

The image hit your retina.

But your brain was busy.

You have no conscious awareness of it.

Marvin Chun has a fantastic metaphor for this in the text.

He compares it to fishing.

It's the perfect analogy.

Imagine you are standing in a rushing river.

You have a net.

You are looking for fish.

You spot fish number one.

That's the white letter.

You dip your net in and scoop it up.

Success.

I got my fish.

But now you have a fish in your net.

You have to lift the net out of the water, take the fish out, and put it in your bucket.

That takes time.

And while you are doing that, while you are processing,

fish number one, fish number two, swims by.

And I can't catch it because my net is busy.

Exactly.

That is the attentional blink.

It's a refractory period where your cognitive resources are tied up processing the first target.

But wait, why do I catch it if it comes immediately after the first one?

Ah, because if fish number two is right on the tail of fish number one, you catch them both in one scoop.

Oh, I see.

You process them as a single batch.

It's only when there is that slight gap that you have to reset the net.

And that's when you miss out.

Now, are we all doomed to be bad fishermen?

Or can we get better at this?

The text mentions video game players.

Yes, the gamers get a win here.

Specifically, players of first -person shooters.

The chaotic ones, the really fast -paced ones.

Yes.

Research shows that these players have a significantly smaller attentional blink.

Their net is faster.

They can reset and scoop again much, much quicker than non -gamers.

So all those hours playing Call of Duty were actually cognitive training.

My parents were wrong.

There is legitimate evidence for that.

They even did a training study.

They took non -gamers and made them play shooters for weeks.

Their attention improved.

Wow.

They tried it with Tetris players and their attention did not improve.

Sorry, Tetris.

You're good for spatial reasoning, but not for the blink.

Okay, let's peel back the skull.

We've talked about behavior.

What is happening physically in the brain?

This is section five.

The physiological basis of attention.

This is where we see that the spotlight is real.

It's not just a metaphor.

Using fMRI, we can look at the visual cortex.

We know the visual cortex is mapped out spatially.

Part of it handles the left visual field.

Part handles the right.

So it's a literal map of the world in your head.

It is.

And figure 7 .18 shows that when a person shifts their attention to a specific location, even without moving their eyes, the corresponding part of the visual cortex lights up.

It becomes more active.

So the brain is literally boosting the signal from that patch of the world.

It's turning up the volume.

Yes.

And the text describes a priority map in the brain.

It's like a command center.

It takes input from your eyes, what's bright, what's red, what's moving, and input from your goals, looking for keys.

It combines them to create a map of where should I look next.

Where is this command center located?

It's a network.

The text highlights a couple of key areas.

The LIP, lateral intraparietal area in the parietal lobe, and the FEF, frontal eye fields in the frontal lobe.

They act like the directors telling the visual cortex where to shine the spotlight.

And this effect trickles down to how we see complex objects too.

We've talked about the face area and place area of the brain before.

Right, the FFA, fusiform face area, and the PPA, parahippocampal place area.

There is a brilliant experiment shown in figures 7 .20 and 7 .21.

They show a participant a composite image.

It's a photo of a face superimposed on top of a photo of a house.

So it looks like a ghostly face floating over a ghostly house.

A bit weird.

Exactly.

The retinal image is identical in both conditions.

But they tell the participant, pay attention to the face.

When they do, the FFA lights up like a Christmas tree and the PPA stays quiet.

And if they say, pay attention to the house.

The PPA fires up and the FFA goes quiet.

That is mind -blowing.

The input to the eyes hasn't changed at all.

The only thing that changed is the thought.

Look for the face.

Attention literally modulates the volume knob of specific brain regions.

It chooses what gets processed.

And it goes even deeper than that, down to the single cell.

The shrink wrapping effect.

I love that name.

It's a great description.

This is figure 7 .23.

We used to think a neuron's receptive field, the part of the world it sees, was fixed, static.

But this research shows that if you attend to a specific object, the neuron's receptive field will actually shrink and shift to hug the contours of that object.

It effectively blocks out the neighbors.

It puts blinders on.

Yes.

It isolates the target at the cellular level.

It's incredible precision engineering.

But like any complex machine, it can break.

And when the attention system breaks, things get really, really strange.

Section 6.

Disorders of visual attention.

This is usually the result of a stroke or trauma to the parietal lobe, specifically the right hemisphere.

And it causes a condition called neglect.

Neglect is so hard to wrap your head around.

It's not blindness, right?

The eyes are working.

No, the eyes work fine.

The visual cortex works fine.

But the patient behaves as if the left side of the world simply isn't there.

It has ceased to exist for them.

Give us the examples from the text.

They're so powerful.

There's the line cancellation test.

You give a patient a sheet of paper with lines drawn all over it.

You say, cross out all the lines.

The patient will diligently cross out every single line on the right side of the page and leave the left side completely untouched.

And they think they're done.

They put the pen down and smile.

As far as they're concerned, they finished the task perfectly.

Or ask them to draw a clock.

They will draw a circle and then they'll crowd all the numbers 1 through 12 onto the right half of the clock face.

Because the left side of the clock doesn't exist to them.

Exactly.

But here is the experiment that completely changed how we understand this.

The barbell experiment by Tipper and Berman.

This is figure 7 .27.

This is the one that proves attention isn't just about left and right in the room.

Right.

It's a game changer.

So you have a neglect patient.

They ignore things on the left.

You show them a barbell.

Two balls connected by a bar.

They see the right ball.

They neglect the left ball.

Sandra's stuff so far.

That's what we expect.

But then, watch this.

While the patient is watching, you rotate the barbell 180 degrees.

So the ball that was on the left moves over to the right side of space.

And the ball that was on the right moves to the left.

So physically, the invisible ball is now in the good zone.

They should be able to see it now, right?

You would think so.

That's the logical conclusion.

But they don't.

They still neglect the ball that started on the left, even though it is now sitting on the right side of the room.

Wait.

What?

How is that possible?

The neglect stuck to the object.

So they aren't neglecting the left side of space.

They're neglecting the left side of the object.

Precisely.

It proves that attention is object -based.

Once attention locks onto an object and tags it, it tracks it.

And if your attention system is damaged, that blind spot can travel with the object as it moves through the room.

That is deeply unsettling.

It implies that left and right aren't fixed coordinates in the world, but relative coordinates that we attach to things.

And it gets even more subtle with a condition called extinction.

This is like a mild form of neglect.

Imagine a doctor holds up a fork on your bad side, the left.

You can see it.

You say, that's a fork.

So I'm not neglecting it.

My attention's working.

Not yet.

But now the doctor holds up a fork on your left and a spoon on your right at the exact same time.

What happens?

You only see the spoon.

The object on the good side wins the competition for attention and completely extinguishes your awareness of the object on the bad side.

You literally can't see the fork anymore just because the spoon is there.

It's a winner -take -all battle in the brain.

Exactly.

Before we move on, we should probably touch on ADHD because the text mentions it.

People often ask if ADHD is a disorder of these visual mechanisms.

Yeah.

Is it broken spotlight?

Surprisingly, no.

When you give people with ADHD these visual search tasks, finding the T among L's, they perform very similarly to neurotypical people.

Their raw visual attention system is intact.

Oh, that's interesting.

The struggle in ADHD is more about executive control impulsivity, staying on task, managing distractions, rather than the basic visual processing itself.

That's a really important distinction.

Okay, we've been very focused on objects finding the keys, finding the T.

But we don't just see objects.

We see places.

We see scenes, Section 7, perceiving and understanding scenes.

This is where we zoom out.

The text proposes that we have two pathways for this.

The selective pathway.

That's the spotlight we've been talking about.

It's slow, it's detailed, and it recognizes individual objects.

And the other one.

The non -selective pathway.

This is the background processor.

It takes in the entire field of view at once.

It doesn't identify individual objects.

It computes ensemble statistics.

Ensemble statistics?

That sounds complicated, like an average.

Exactly like an average, figure 7 .3 shows a school of fish.

When you glance at it, you don't look at fish number one, then fish number two.

You instantly perceive the average size and the average direction of the group.

You get the gist of the school without attending to any single fish.

And this happens incredibly fast.

Brightly fast.

Potter did these studies showing you can understand the gist of a scene.

This is a picnic.

This is a city.

In about 19 milliseconds.

A single eye blink is like 100 milliseconds.

Exactly.

It's subliminal speed.

You can verify if an animal is present in a picture in 120 milliseconds.

That is way too fast for you to have scanned the image object by object.

So how is the brain doing it if it's not looking at objects?

What's the shortcut?

It's looking at spatial frequency.

This is the work of Oliva and Taralba.

Break that down for me.

Spatial frequency sounds like physics.

It is, basically.

Think of it as the texture of the image.

Imagine squinting your eyes until everything gets blurry.

OK, everything is just blobs of light and dark.

Right.

A beach scene has distinct horizontal blobs.

The horizon, the sand, the water.

A city scene has lots of vertical spiky blobs, the buildings.

A forest is a chaotic mess of high -frequency texture.

So the brain analyzes these broad patterns of light and dark, the sine waves, and guesses the category based on that texture.

Yes.

It calculates properties like openness, roughness, expansion.

It knows this is a city before it has identified a single building.

It provides the context, and then the spotlight goes in to fill in the specific details.

Which brings us to the final and perhaps most humbling section, section eight, memory and the limits of perception.

Because despite everything we've just said about bottlenecks and limits, I still feel like I see everything all the time.

That is the grand illusion.

And our memory contributes to it in a huge way.

We have shockingly good picture memory.

The text cites the standing experiment.

They showed people 10 ,000 images.

10 ,000.

That's hours and hours of slideshows.

And when tested later, people had 85 % accuracy in recognizing which ones they'd seen before.

Another study by Brady showed we can even remember the specific state of an object, like remembering that we saw a coffee cup that was half full, not empty.

So we are geniuses.

Case closed.

Our brains are amazing.

If only.

The paradox is, if our memory is so good, why are we so blind to changes happening right in front of our eyes?

This is the phenomenon of change blindness.

The flicker task.

This is a fun game you can play online.

And by fun, I mean infuriating.

It's maddening.

Figures 7 .35 and 7 .36 show the setup.

You see a photo of a harbor.

Then for a split second, a blank gray screen appears.

Then the photo comes back.

Then the blank screen.

It flickers back and forth.

And something has changed in the photo.

A huge object.

A boat railing is missing.

Or a massive jet engine on a plane has vanished.

And you stare at it.

And you scared it.

And you cannot see it.

It can take seconds, sometimes minutes.

And the reason is the blank screen disrupts the motion signal.

Usually, if an object vanishes, there is a transient, a flash of motion that summons your exogenous attention.

It screams, hey, look here.

But the blank screen masks that motion cue.

Exactly.

So now you are forced to use your slow serial spotlight.

You have to check object by object.

Is the cloud the same?

Yes.

Is the water the same?

Yes.

You are blind to the massive change until your spotlight happens to land on it.

This leads to the most famous experiment in this entire field.

The gorilla.

Inattentional blindness.

Simons and Chabris.

If you haven't seen this video, you have to go watch it.

They ask participants to watch a video of people playing basketball.

Your job is to count how many times the team in white passes the ball.

So you are focusing hard.

You're really concentrating.

One, two, three.

In the middle of the game, a person in a full gorilla suit walks into the center of the frame, faces the camera, thumps their chest, and walks off.

It takes about nine seconds.

About half the people never see the gorilla.

They swear it wasn't there.

I've seen videos of this.

They get angry.

They are flabbergasted when you show them the replay.

Their attention was so bound to the white shirts and the ball that the gorilla was filtered out as irrelevant noise.

But surely this is just regular people.

Experts wouldn't make this mistake in their field.

That's the comforting lie we tell ourselves.

But let me tell you about the expert blindness study mentioned in figure 7 .39.

Tracton Drew and colleagues tested radiologists.

These are people trained for years to spot tiny anomalies in x -rays.

The best of the best.

They gave them CT stands of lungs and asked them to look for cancer nodules.

On the last slide, they inserted a picture of a gorilla shrunk down to about the size of a matchbox into the lung scan.

A tiny gorilla in the lung.

It was 48 times larger than the average cancer nodule.

It was huge by comparison.

And did they see it?

Please tell me they saw it.

83 % of the radiologists missed it.

83%.

Eye tracking showed they looked right at it.

Their fovea landed on the gorilla.

But their brain didn't see it.

They were attending to cancer, not primates.

That is terrifying.

But it really drives home the point of this entire chapter.

What we see isn't the world.

It's a construction.

Perception is an inference.

We take the gist from the non -selective pathway, I'm in a kitchen.

We use our spotlight to verify a few key objects.

There's the coffee maker.

And we just assume the rest stays the same.

The text calls the world an external memory.

I love that phrase.

Right.

Why store the exact location of every fork in your brain?

That takes energy.

If you need a fork, you just look at the drawer.

The world stores the information for you.

We only perceive what we attend to.

And we hallucinate the rest as a complete stable picture.

So the grand illusion is that we are aware of everything.

When really we are looking through a soda straw at a vast complex world, while our brain furiously paints in the wallpaper around the edges to make it feel whole.

Well, on that slightly extensional note, I think we have reached the limit of our own attentional resources for today.

I think so.

My net is definitely full.

We've learned that our spotlight is narrow.

Our search can be slow.

Our fish are just like us.

And we should definitely double check our blind spots,

especially if we are radiologists.

Always look for the gorilla.

Thank you so much for listening.

This has been the Last Minute Lecture Team, helping you hack your learning.

Go give your attention a rest.

You've earned it.

Goodbye, everyone.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Attention operates as a selective filtering system that enables the brain to manage the overwhelming volume of sensory information by prioritizing certain inputs while suppressing others, a necessity arising from the brain's limited processing capacity. Rather than a single unified mechanism, attention functions as a collection of distinct cognitive processes that can be directed inward toward mental representations or outward toward environmental stimuli, and can involve obvious behavioral responses like eye movements or subtle shifts in mental focus that leave no visible trace. Classic experimental paradigms such as Posner's cueing task demonstrate that peripheral cues appearing at stimulus locations automatically capture attention through exogenous pathways, whereas symbolic cues require more deliberate endogenous processing and produce different patterns in reaction time costs and benefits. The spatial deployment of attention has been conceptualized through multiple metaphors, including the spotlight model suggesting a focused beam that illuminates attended locations and dims unattended ones, though zoom lens and other alternative frameworks propose different mechanisms for how attention scales across visual space. Visual search efficiency depends critically on the complexity of the target representation: searches for simple features like a red object among green distractors occur rapidly in parallel across the entire visual field, but searches requiring the integration of multiple features demand serial examination of individual items and show substantial increases in reaction time as set size grows. Real-world search is not random but guided by knowledge of object features, learned associations from experience, and structural regularities within scenes that provide predictable spatial relationships between objects. The binding problem—the neural challenge of linking disparate visual attributes such as color, shape, and location into a coherent object representation—requires focused attention to prevent the formation of illusory conjunctions where features become incorrectly combined. Temporal dynamics of attention emerge through rapid serial visual presentation paradigms, revealing the attentional blink phenomenon in which detection of one target temporarily impairs the perception of a second target due to bottlenecks in attentional resources. Neurophysiologically, attention modulates the firing rates of individual neurons, generates priority maps in parietal and frontal regions, and reshapes the spatial receptive fields of sensory neurons to enhance processing at attended locations. Clinical populations with neglect or extinction following parietal damage reveal the devastating consequences when attention fails to encompass entire regions of space or sides of objects. Scene perception relies on parallel pathways: a selective route supporting detailed object identification and a nonselective route enabling rapid extraction of scene structure and statistical regularities. Despite exceptional long-term memory for visual scenes, phenomena such as change blindness and inattentional blindness expose profound limitations in conscious awareness, demonstrating that perception represents a constructive interpretation rather than a faithful recording of the sensory world.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 7: Attention & Scene Perception

Related Chapters