Chapter 2: Iconic Storage & Verbal Coding in Visual Cognition

0:00 / 0:00
Report an issue

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome back to the Deep Dive.

Today we are dusting off a heavyweight.

We certainly are.

We are cracking open the classic edition of cognitive psychology and we are going to laser focus on

iconic storage and verbal coding.

Exactly.

Now, I know what you're thinking.

That sounds like something you'd read in a manual for a computer server from 1980.

It does sound a bit dry on the surface, doesn't it?

It really does.

But honestly,

as I was reading this, I realized this isn't just about storage or coding.

No, not at all.

This is the blueprint for how we experience reality.

This chapter is effectively the user manual for your own eyes.

That is a perfect way to frame it.

It is foundational.

If you want to understand how the mind processes information, which is the core mission of cognitive psychology, you have to start here.

Right.

This chapter lays out the architecture of how we get information from the out there to the in here.

And what I love about this text is that it immediately comes out swinging.

It completely shatters this assumption that we all, you know, walk around with.

Oh, and we don't even know we have.

Right.

The text calls it naïve realism.

Right.

Naïve realism is the default setting for human beings.

We tend to believe that our eyes are just clear, passive windows.

We think the world is out there, the light hits our eyes, and boom, we see it exactly as it is the moment it happens.

It feels instantaneous.

It feels effortless.

Like we're just a mirror reflecting reality.

If a bird flies by, I see the bird, end of story.

Exactly.

But the cognitive approach, which is the specific lens we are using for this deep dive, says not so fast.

And I mean that literally, not so fast.

This chapter argues that perception is not a passive mirror.

It is an active, constructive process.

It's like a construction site inside your brain.

Wow.

And crucially, that construction takes time.

So when I look at a coffee mug on my desk, I'm not just seeing it.

My brain is actively building the image of the mug.

Yes.

And that construction process has distinct stages.

The central thesis we're exploring today is that information reaches your eye, but before you can say that's a coffee mug, that input has to go through a complex analysis.

That's a whole assembly line.

A whole assembly line.

You get stored briefly, scanned, selected, and then translated into language.

And the really wild part, the part that kept me up thinking last night, is that because this takes time, this chapter suggests we are essentially living in the past.

We are always seeing the immediate past.

We are lagging behind Okay, let's unpack that because that is a huge concept.

Our mission for this deep dive is to track that journey.

Right.

We're going to follow a visual input from the split second it hits the retina, the icon, to the moment it turns into words in your head.

And along the way, we're going to talk about ghost images, how to erase a memory before it even happens, and whether subliminal messaging is actually real.

It's a full roadmap of the visual systems architecture.

It really is.

So Let's start with the tools.

This research relies heavily on a device with a fantastic retro -futuristic name,

the Tekistoscope.

The Tekistope.

It sounds like something out of a Jules Verne novel, but it's actually the classic instrument of mid -century psychology.

So what is it exactly?

Think of it as a very precise high -speed slide projector.

It's designed to flash an image, a letter, a word, a shape, for an incredibly short amount of time.

We're talking milliseconds.

So it's a way to feed the eye a specific byte of information and see what happens.

Exactly.

And for decades, researchers ran into a specific frustrating problem with the Tekistoscope.

They'd flash a group of letters, say a grid of 12 letters,

for a tiny fraction of a second,

and the subjects would get annoyed.

Annoyed?

Why annoyed?

Because the subjects would insist, I saw more than that.

They'd report maybe four or five letters, but they would claim that a split second, they saw the whole grid.

They felt like they had the whole picture, but it faded away before they could grab it all.

It's like trying to describe a dream as you're waking up.

That's a perfect analogy.

You have the whole complex image in your head, but as you start speaking, it just dissolves like smoke.

That is a perfect analogy.

And for a long time, psychologists just thought, well, that's the limit.

That's just how it is.

The span of apprehension, how much we can grasp at once,

is four or five items.

That's all the brain can handle.

They thought the limit was on seeing.

Until 1960, entered George Sperling.

Right.

Sperling changed the game.

He suspected that the subjects weren't lying.

He believed them.

He did.

He realized that maybe the problem wasn't that the subjects didn't see the letters, but that the memory of them was fading too fast to report.

The mouth was too slow for the eye.

Exactly.

So he designed an incredibly clever experiment to prove it.

I love this setup.

Paint the picture for us.

How did he isolate the memory?

Okay,

visualize a rectangular array of letters.

Let's say three rows of letters, four letters in each row.

Okay, got it.

Sperling flashes this grid on the screen for just 50 milliseconds.

That is a 20th of a second.

It's a literal blink.

If you asked me to name all 12 letters, I'd fail miserably.

You would.

You'd get maybe four or five.

Right.

That's the whole report method.

You try to grab everything and you fail because the image fades.

But Sperling introduced the partial report.

He flashed the letters, the screen went blank, and then immediately after the image was gone, he played a tone.

A sound signal?

Yes.

Yeah.

A high pitch meant, tell me the top row.

A medium pitch meant middle row.

Low pitch, bottom row.

But wait, the image is already gone when the tone plays?

The screen is black?

Exactly.

The stimulus is over.

But here's the breakthrough.

The subjects could do it.

No way.

If the tone was high, they rattle off the top row perfectly.

If it was low, they got the bottom row perfectly.

But they didn't know which tone was coming until after the letters were gone.

Precisely.

The fact that they could report any row perfectly on command proved that for a brief moment, they had access to the entire grid.

Wow.

They were effectively reading from a fading photograph in their mind.

That is fascinating.

So the limit wasn't on seeing, the limit was on reporting.

The information was all there sitting in a buffer.

It was all there.

Sperling proved that the bottleneck wasn't the eye, it was the translation into speech.

And this fading buffer.

This is what the chapter calls the icon.

That's the term Neisser uses.

He calls it iconic memory.

He distinguishes it from after images, which are just retinal quirks like when you stare at a light bulb and see a green spot.

I see.

The icon is different.

It's a transient visual memory.

It is defined behaviorally by what the subject can actually do, like in Sperling's experiment, and introspectively.

Subjects report that they are still looking at the letters, even though the screen is blank.

They have the sensation of reading it off a screen that isn't there.

But it doesn't last long.

It's very leaky bucket.

It is.

Sperling mapped this decay curve out beautifully.

The chapter shows a graph of it.

Describe the graph for us.

Imagine a graph where the vertical axis is accuracy and the horizontal axis is time.

Okay.

Accuracy is nearly a hundred percent immediately after flash at time zero.

But as you delay that signal tone, 100 milliseconds, 300 milliseconds, the line drops sharply.

Like a cliff.

A very steep cliff.

By the time you get to one full second, the advantage is gone.

The curve flattens out.

The icon is faded.

So we have this one second window where the world is still live in our brains before it disappears.

Roughly one second.

Yes.

And here's where it gets really interesting for the environment you're in.

That duration isn't fixed.

Oh.

It depends on what happens next.

What do you mean?

What happens in the real world?

It's what the chapter calls the post -exposure field.

This comes from McWirt's findings.

If you flash the letters and then plunge the room into total darkness.

The dark field.

Right.

Then the icon can last much longer.

Up to five seconds.

It glows in the mind's eye because nothing new is overriding it.

But we rarely live in total darkness.

Usually there's light.

Exactly.

Usually the visual field remains bright.

If the screen stays bright white after the letters vanish, a light field, the icon gets washed out in less than half a second.

That explains why we don't walk around seeing trails of everything we look at.

The next image washes out the previous one.

Precisely.

We need to clear the buffer so we can process the next moment.

And that leads us directly into the dark side of iconic memory.

Backward masking.

This was the part of the chapter that felt like a magic trick.

How can a second image erase the first one?

It sounds like we're violating the laws of physics.

It's counterintuitive, isn't it?

We call it backward masking because the second stimulus seems to work backward in time to obscure the first one.

But it's not time travel.

No, it's not time travel.

It's about the finite processing speed of the brain.

So break down the types for us.

The chapter distinguishes between type A and type B

Type A is simpler.

It's often called brightness masking or noise masking.

Imagine you flash a faint letter A.

Then immediately you flash a massive bright burst of light.

And the A disappears.

It gets buried.

Erickson explained this with a theory called summation.

The visual system integrates energy over a short period.

It adds them together.

It adds the light of the flash to the light of the letter.

It reduces the contrast.

It's like trying to see a star when the sun comes up.

The signal is drowned out by the noise.

Okay, that makes sense physically.

You're overwhelming the sensor.

But type B, that's the weird one.

Type B is meta contrast.

This isn't about brightness.

It's about structure and contours.

The classic experiment here uses Werner's disk and ring.

I want the listener to really visualize this and walk us through it.

Okay.

Imagine a white screen.

First, you see a solid black disk appear for a split second.

Then after a short delay, a black ring appears.

A ring that fits around it?

Yes.

The ring is sized perfectly, so it fits right around where the disk was.

They don't overlap.

The inner edge of the ring touches where the outer edge of the disk was.

So disk delay, then ring.

What do I see?

If the timing is right, usually a delay of about 75 milliseconds, you see the ring.

But you do not see the disk.

The center is just white.

The ring erases the disk.

It suppresses it completely.

And what's fascinating is the U -shaped function described in the graphs in the book.

What does that mean, U -shaped?

It means masking isn't strongest when the delay is zero.

If the ring appears immediately, zero delay, you see both the disk and the ring.

And if it appears way later, like 200 milliseconds later, you see both.

But at that specific delay, around 75 milliseconds, masking is at its peak.

That's the bottom of the U -shape on the graph.

That's so strange.

Why does that happen?

Why does a ring appearing later wipe out the disk that was already there?

The leading theory comes from Werner.

He argued that contours edges aren't instantaneous.

The brain has to construct them.

It has to draw them.

It takes time to draw the edge of that disk in your neural cortex.

So the brain is sketching the disk.

It starts sketching the disk.

But before it can finish the sketch, the ring appears.

The ring shares that same border.

The new information interrupts the construction of the old information.

Wow.

The brain effectively abandons the disk to process the ring.

Werner called it contour absorption.

That is wild.

It implies that seeing is a job.

It's labor.

And if you interrupt the worker, the job doesn't get done.

It is work.

And the chapter makes a great point about Avrabach and Coriel's erasure experiment, which is similar.

What did they do?

They found that if you flash a letter,

and then about 100 milliseconds later, you flash a circle right around where the letter was.

It erases the letter.

It made it harder to read than if you just pointed a simple bar at it.

The surrounding shape literally ate the previous shape.

So why does this matter?

Is this just a lab trick or does it happen in real life?

It happens constantly.

Think about how your eyes move.

We don't just glide them.

No, we jump.

These jumps are called saccades.

Between jumps, our eyes stay still.

They fixate for about 200 milliseconds.

200 milliseconds?

That number isn't random.

This chapter suggests 200 milliseconds is the safety buffer we need to construct a scene before our eyes move and the image changes.

So we have enough time to finish the job.

Right.

If we moved our eyes faster, the new scene might mask the old scene before we understood it.

We'd be erasing the world as we looked at it.

Exactly.

Masking ensures clear, discrete snapshots of reality.

Now, this brings up a really controversial topic.

If something is masked, if I technically saw it, but the mask erased it from my conscious awareness, did it still get in?

Are we talking about subliminal perception?

Ah,

yes.

The subliminal debate.

This was a very hot topic in the 50s and 60s,

and NICER devotes a significant section of this chapter to dismantling it.

Because the claim was that advertisers or, I don't know, governments could flash words like buy or obey so fast or so perfectly that we wouldn't see them, but our unconscious would obey them.

Or in the context of psychoanalysis, that threatening words would be repressed.

There was a famous study by Smith, Spence and Klein, the Angry Happy experiment.

Walk us through that one.

They showed subjects a neutral face, just a drawing without emotion.

But right before the face, they flashed the word angry or happy, followed by a mask so the subjects couldn't consciously see the word.

And their result?

The subjects tended to describe the neutral face as more hostile if it was preceded by the hidden word angry.

So they saw an angry face?

They described it that way.

And the researchers claimed this proved unconscious perception of meaning.

They argued the semantic meaning of angry leaked into the brain.

But our expert here, NICER, isn't buying it.

He is not.

He invokes the Clever Hans effect.

Oh, I love the Clever Hans story.

Tell the listeners about the horse.

Clever Hans was a horse in the early 1900s who could supposedly do math.

Right.

You'd ask Hans, what is two plus two?

And he'd tap his hoof four times.

Everyone was amazed.

Even scientists were baffled.

But he wasn't doing math.

No.

It turned out Hans didn't know math.

He was reading the body language of his trainer.

The horse was reading the human, not the numbers.

Exactly.

When the horse reached the right number of taps, the trainer would suddenly relax or exhale often without realizing it.

Hans sensed that tension release and stopped tapping.

It wasn't math.

It was social cues.

And NICER thinks the same thing happened in the Angry Happy experiment.

He does.

He points out a major flaw.

The interrogator, the person asking the questions, usually knew the sequence of the words.

They knew when angry was coming.

So they might have subtly frowned or shifted their tone?

Or leaned forward.

Those are what we call demand characteristics.

Subjects want to please the experimenter.

They pick up on the expectation.

They do.

If the experimenter expects a hostile description,

the subject often unconsciously provides one.

NICER argues that when you control for this, when the experimenter is truly blind to the condition,

these subliminal meaning effects usually disappear.

So no secret mind control through masked images?

Not in terms of complex meaning like buy popcorn or this guy is angry.

NICER argues we don't process the meaning of a word before the visual construction is complete.

Okay.

However, and this is a big however, he does concede one very important point.

Which is?

Farrer and Rabe's reaction time findings.

This is crucial.

They found that even if a shape is masked and invisible, the subject swears they didn't see it.

Their reaction time to press a button was just as fast as if they had seen it.

Wait, so I can't tell you what I saw.

I don't even know I saw, but my finger presses the button anyway at the same speed.

Yes.

This suggests a split between the motor system and the sensory system.

A split in the brain.

The motor system gets the signal that something happened and triggers a reaction before the sensory system finishes constructing the shape.

That's unnerving.

My body knows before I know.

It creates a layered model of cognition.

Something happened is a fast signal.

What is it?

Is a slow construction.

You can react to the first without the second.

Okay.

Let's move further down the assembly line.

We've had the icon that raw visual flash.

We've managed to avoid masking.

Right.

Now we need to turn this ghost image into something we can use.

Words.

Verbal coding.

This is the bottleneck.

The icon is a vast reservoir of visual data, but it's leaking.

We have to scan it, select information, and turn it into a verbal code to keep it.

And the chapter mentions auditory confusions.

This is the smoking gun for verbal coding, right?

It is.

When subjects make mistakes in these experiments, they don't usually swap letters that look alike.

So they don't mistake an O for a Q.

Rarely.

They swap letters that sound alike.

So they might see a D and report a B.

Exactly.

Visually, D and B are quite different.

One has a curve.

One has a straight back.

But phonetically,

D and B are twins.

This proves that by the time the error happened,

the visual image was gone.

The subject was relying on a verbal rehearsal loop in their head.

The error was auditory, not visual.

They were talking to themselves.

We all are, constantly, when we perceive.

We are translating the world into a script.

Now the standing process itself, how we choose what to read from that fading icon, is interesting because it seems to be culturally learned.

Yes.

The reading habit.

If I flash a row of letters, you, as an English speaker, are going to be much more accurate on the left side than the right side.

Because I read left to right.

I start there.

Right.

For a while, people thought maybe the left side of the retina was just sharper.

But Heron and others proved it's about scan order.

You prioritize the left because that's how you've learned to process information.

And the proof comes from Yiddish.

The Michigan and Four Gays experiment was so clever, they tested English readers and Yiddish readers.

As you know, Yiddish is read right to left.

And what did they find?

The Yiddish readers were better at identifying words on the right side of the flash.

Their scanning habit was reversed.

So we literally see the world differently based on the language we read.

And it's a kistoscopic flash.

Yes.

We prioritize different parts of the visual field based on our training.

We learn how to look at our own iconic memory.

It's a learned skill.

There's also this detail about eye movements that I found almost poetic.

I think I know the one you mean.

The text says our eyes move to where the letters were after they're gone.

It is poignant.

Brighton and others found that if you track the eyes, about 150 milliseconds after the flash, the eyes jump to the location of the letters.

But they aren't there anymore.

The letters are gone.

The screen is blank.

But the subject is scanning the icon in empty space.

They're physically looking at the memory.

Chasing the ghost image.

It shows how tightly coupled our attention and eye movements are, even when the stimulus is internal.

This idea of scanning brings us to the concept of set, getting set,

expectation.

Can we program ourselves to see better?

We can program the order of encoding.

This is the key takeaway from the Harrison Haber experiments.

OK.

What do they do?

They showed subjects cards that had multiple attributes, say color, shape, and number of items.

So a card might have two red circles.

Right.

Now you can code that in your head two ways.

You can use an objects code.

Two red circles.

That's a natural phrase.

Yeah.

Or you can use a dimensions code.

Red circle two.

Just a list of attributes.

Does it matter?

They contain the same information.

It matters huge amounts.

If I tell you beforehand, pay attention to color.

OK.

So I'm prepped to look for Heller.

If you're a dimensions coder, you can just put red at the front of your list.

Red circle two.

You prioritize the color in your verbal cue.

I grab it first.

You grab it from the fading icon first.

Yeah.

But if you are an objects coder, you're stuck.

Why am I stuck?

Because you can't easily say red.

Two.

Circles.

The syntax of natural language fights you.

You have to say two red circles.

The color word is in the middle.

So the grammar of our internal language dictates what we can rescue from the visual memory.

That's a great way to put it.

SET works by prioritizing which part of the icon you translate into speech first.

If you try to grab everything at once, or if your coding strategy is clumsy, the icon fades before you get to the important stuff.

This leads us to the final limit.

The magical number.

The span of apprehension.

The classic finding.

We can only grasp four to five items in a single glance.

Why four or five?

Why not ten?

Why not three?

It comes back to speed.

It's a race.

The icon is decaying.

We have to read from it.

OK.

The chapter discusses two ways we handle numbers.

For small numbers, one day oh, one, two, three, we subitize.

Subitize.

That's a great word.

It comes in the Latin for sudden.

It means we recognize the pattern instantly.

Three dots make a triangle.

We don't count them.

We just see threeness.

But I can't do that with seven dots.

No.

Once you get past five or six, the pattern isn't obvious.

We have to start counting.

And counting takes time.

The reaction time data suggests counting takes about a hundred milliseconds per item.

So if I have ten items,

that's a full second of counting.

And what happens after one second?

The icon is dead.

It's faded.

Gone.

You count one, two, three, four, five, and then the rest of the image has faded from your visual buffer.

Stop.

That's the limit.

So it's not a memory limit.

It's a speed limit.

It's not that your brain is full.

It's that your eyes ran out of time.

So naturally, people tried to cheat the system.

The chapter mentions super coding.

The octal system.

This was a great attempt.

Clemmer tried to teach people to group binary dots.

How does that work?

So instead of seeing dot dot blank, which is three things, you learned to call that three.

You turn three bits of information into one word.

Compressing the file.

In theory.

But it failed miserably in the tachistoscope.

Why?

If I have a better code, shouldn't I be faster?

Because you still have to visually recognize the pattern dot dot blank before you can apply the code word three.

They stopped to see it first.

Exactly.

The visual recognition is the speed limit.

You can't use a verbal shortcut to bypass a visual bottleneck.

The brain still has to do the construction work on the raw pixels before you can label it.

So let's zoom out.

We've covered a lot of ground.

We've gone from the raw light hitting the eye to the ghost like icon through the dangers of masking past the scanning habit into the verbal coder and finally hitting the limit of the span of apprehension is quite the machine.

It is.

What is the big takeaway for you from this chapter?

What's the one thing you want people to remember?

For me, it's the definition of the cognitive approach itself.

This chapter proves that perception is an event spread over time.

Right.

We tend to think of vision as a camera taking a snapshot, but it's more like a painter staring at a scene and frantically trying to paint it on a canvas before the sun goes down.

And if someone turns on the lights too fast masking, the painter loses the scene.

Or if the painter gets distracted by the wrong detail set, they miss the main subject.

We are active participants in our own vision.

We construct our reality milliseconds at a time and we are always technically living in the past by about a fifth of a second.

Yes.

I want to leave the listeners with that thought from the masking section, the idea that our motor system, our physical body can react to a stimulus that our sensory system hasn't finished building yet.

The fair and red finding.

Yeah.

Yeah.

It suggests that there's a version of you that is faster than you.

There's a pilot in the cockpit who hits the brakes before you even realize there's a car in front of you.

And you might never consciously see the car if the masking is right.

Right.

You just be sitting there, heart racing from the brake, wondering why you stopped.

That is both comforting and terrifying.

That's cognitive psychology for you.

Well, on that note, thank you for joining us on this deep dive into the architecture of your own vision from the last minute lecture team.

Keep your eyes open even if you can't always trust them.

See you next time.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers
Iconic memory represents the initial stage of visual information processing, characterized by the brief retention of visual information after a stimulus disappears from view. Rather than accepting the notion that perception passively mirrors external reality, this examination demonstrates that visual perception actively constructs meaning and requires measurable processing time. Tachistoscopic experiments reveal a striking dissociation between what observers can consciously report from a brief visual exposure and the total volume of information actually available in their visual system. Through partial report techniques, where selective cues like auditory tones or spatial pointers direct attention to specific regions before the visual trace fades, researchers documented that far more information is initially registered than subjects can articulate. The persistence and quality of these visual icons depend significantly on factors including light intensity and the characteristics of stimuli that follow the original exposure. Backward masking presents a particularly important phenomenon in this domain, especially metacontrast, wherein a subsequent visual stimulus interferes with or entirely prevents conscious awareness of a preceding one. Though some interpretations suggest such masking enables perception of meaning outside conscious awareness, the evidence more strongly supports explanations grounded in basic visual properties, such as shape angularity, or methodological artifacts including experimenter expectations. The transition from this ephemeral visual state to more durable memory storage operates through verbal coding, the process by which visual information becomes translated into internal language representations. This encoding mechanism accounts for position-dependent memory advantages, such as superior retention of leftward display elements attributable to established reading conventions. Perceptual set also influences this stage, enabling subjects to selectively encode particular visual attributes like color or form before the icon becomes inaccessible. The traditional concept of span of apprehension requires reinterpretation not as a fixed limitation of visual capacity itself, but rather as a constraint on the speed at which observers can name or enumerate items from the decaying visual trace. Visual cognition ultimately emerges as a multilayered system incorporating parallel processing streams, where basic motor responses to stimuli can be initiated before the mind consciously completes analysis of the stimulus's intricate details and contours.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥