Chapter 3: Pattern Recognition: How the Mind Identifies Visual Forms

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome back to The Deep Dive.

We have a fascinating stack of research on the desk today.

We really do.

It's all centered around a question that sounds, I don't know, almost deceptively simple.

Oh, absolutely.

In fact, if I asked you this at a dinner party, you'd probably, you know, look at me like I'd completely lost my mind.

Right, like it's a trick question.

But as we've been reading through chapter three of NICER's cognitive psychology,

it turns out this question is actually one of the most complex riddles in, well, in the history of studying the human mind.

It really is.

I mean, this is a problem that baffled philosophers for centuries.

Wow.

It frustrated the early computer scientists to no end.

And well, technically, we're still refining answers to it today.

The question is simply this.

How do you know that an A is an A?

It sounds so trivial.

It feels incredibly trivial.

You look at the page, you see the shape, your brain says A.

Done.

Next problem.

Exactly.

But our mission today is to unpack the mechanics of that split second decision.

We're looking at pattern recognition.

Or what the text also calls stimulus equivalence.

Right.

And we have a lot of ground to cover.

I mean, from the early theories of mental templates to this chaotic pandemonium model.

And then all the way down to the biology of all things, frog eyes.

It's going to be a wild ride.

The goal here really is to bridge the gap between the raw data that's hitting your eyeballs.

It's just light.

It's just meaningless flashes of light and dark.

We want to bridge that gap to the actual concept in your mind.

How does a splash of photons become a letter, a face, or a word?

It's really the gateway to understanding visual cognition as a whole.

It is.

So let's start with some of the visual evidence provided in the source material just to prove why this is even a problem in the first place.

Okay.

We're looking at figure 10 from the text.

And for those of you listening, it's a collection of hand -printed A's.

And they are an absolute mess.

They really, really are.

I mean, some are slanted so far to the right, they're basically falling over.

Some have open tops.

Yeah.

And some are just scribbles or like blurs of ink.

There's one here that looks like a triangle that just had a really bad day.

A very bad day.

And yet the text makes a crucial point.

In the experiment where these were collected, not a single subject

misidentified them.

Not one.

Everyone looked at these messes and saw an A.

And this highlights the core problem immediately.

You can't just say, well, it looks like an A.

Because they don't.

Right.

They don't look like each other.

An A in a fancy wedding invitation script has almost no physical geometry in common with a big block A on a billboard.

Which looks nothing like the chicken scratch you'd write on a post -it note.

Exactly.

Yet your brain somehow categorizes them all as identical in meaning.

That's the puzzle.

So, okay, we need to understand the mechanism.

The text begins with a concept that I hadn't heard named before, but it makes perfect sense.

The Hufting step.

Yes.

Named after the Danish psychologist Harald Hufting.

He identified this

massive logical gap in early association theory.

The old idea was super simple.

You see bread and you think of butter.

Because they're associated in your memory.

Bread leads to butter.

Simple.

But Hufting pointed out a missing link.

The light bouncing off the bread and hitting your retina doesn't know anything about butter.

Of course not.

It doesn't even know anything about bread.

It's light.

Right.

So before the visual of the bread can trigger the thought of butter, that visual input has to first make contact with the memory of bread.

You have to recognize the bread as bread first.

Exactly.

That is the Hufting function or the Hufting step.

It's the handshake you could say between the eyeballs picture and the brain's library.

A handshake.

I like that.

Without that step, that moment of recognition just have meaningless shapes and no associations can fire.

You just stare at the bread and see a brown lump.

So the entire deep dive today is effectively investigating that handshake.

How does it happen?

And the first major theory the text walks us through is probably the most intuitive one.

I think if you stop someone on the street and ask how their brain recognizes letters, this is what they'd come up with.

I agree.

It's called template matching.

Template matching.

This is the theory that just feels right to most people at first.

Think of it like a detective trying to match a fingerprint found at a crime scene.

Oh, I love a good procedural drama analogy.

Right.

So the detective, they pull a file print from the database, that's the template, and they superimpose the crime scene print right over it.

If they overlap perfectly or close enough,

it's a match.

So the theory is our brains work the same way.

We have a stored template for the letter A.

Yep.

And when we see a new shape, we mentally slide it over our stored template if it fits.

Click.

We recognize it.

That's the idea.

And figure 11A in our source shows this very clearly.

You have an input A made of little dots, and it lines up perfectly with the stored template A in the system.

It's very clean.

It's clean.

It's simple.

It feels like how a computer would do it.

It is simple.

But simplicity is often where theories go to die in psychology.

Uh -huh.

I can imagine.

As the text points out pretty quickly,

template matching is arguably way too simple to function in the real world.

Because life isn't tidy.

Not at all.

And the source immediately starts poking holes in this with the next few figures, 11B, C, and D.

These are the classic failures of the template model.

Let's walk through them because they really expose the limitations here.

Well, first, just consider position.

If I have a template for an A in the exact center of my visual field.

Okay, like a target zone.

Exactly.

But the A I'm actually looking at is off to the left.

It misses the template.

Completely.

No overlap, no recognition.

To a brain using the system, it would just be an unknown blob because it didn't land on the designated A spot.

Or what about size?

Same problem.

If my template is a big capital A, and I'm looking at a tiny little footnote A.

The tiny one is just going to float inside the big template without ever touching the sides.

So no match.

And the most critical one, I think, is rotation.

This is the biggest killer.

If the A is tipped over on its side, it won't align with an upright template at all.

So if our brains work purely on templates, we'd need a separate stored template for every letter.

In every possible size.

In every possible position on our retina.

At every single possible angle of rotation.

Which would require an infinite amount of storage space in our brains.

It's just impossible.

You'd need a template for A at 45 degrees, A at 46 degrees, A at 47 degrees.

It's incredibly inefficient.

Wildly inefficient.

And we know for a fact this isn't how it works because of a very vivid example the text gives.

Which I think is just brilliant.

I love this one.

The trace on the back example.

Yes.

It effectively debunks the whole idea that recognition is tied to the retina.

It does.

So if I close my eyes and I ask you to use your finger to trace a letter on my back.

You would instantly recognize it.

You'd say that's a B or that's a Z.

No problem.

But think about what that implies.

I have no eyes in my back.

None that I'm aware of.

I have no retinal templates stored for the skin between my shoulder blades.

The pattern is appearing in a completely novel location through a completely different sensory modality touch, not vision.

Right.

And I still recognize the pattern.

That proves that pattern recognition has to be independent of where the input happens.

It's not about a specific location on the retina matching a specific stored image.

It's about the relations within the pattern itself.

Exactly.

The zenith is in the zigzag motion, not in the location.

OK.

So the proponents of template matching, they didn't give up easily.

I have to respect their tenacity.

They didn't.

They tried to save the theory by borrowing ideas from computer technology.

Right.

They introduced the concepts of pre -processing and normalization.

Which to me sounds a lot like using Photoshop before you run a recognition program.

That's a very apt analogy.

It really is.

Pre -processing is like the cleanup phase.

OK.

If you have a grainy image of a letter, you might write a program to fill in the little gaps or erase stray dots, all that visual noise, before you try to match it.

So if I scribble a messy A, my brain effectively smooths out the lines before checking the file.

That's the hypothesis, yes.

But normalization is really the heavy lifter here.

And what's that?

Normalization suggests that before we even try to match a template,

we have an internal mechanism that standardizes the image.

It centers it, it scales it to a standard size, and it rotates it to a standard upright orientation.

Hold on.

So no matter how I see the A, upside down, tiny, off to the left, my brain grabs it, drags it to the center, blows it up, spins it upright, and then checks the template.

Hypothetically, yes.

It converts the input into what they call a canonical form, a standard form.

That's a clever fix.

But does it hold up to reality?

Well, because the text moves on to this section called empirical reality check, and things start getting a little complicated when we look at how humans actually handle rotation.

Right.

We have to look at whether people actually do this mental rotation automatically.

And the text cites a famous study by Rock in 1956.

The chef dog figure.

The chef dog figure, yes.

This is figure 12 in the text.

And for everyone listening, it's an ambiguous blob.

If you look at it upright, it looks like a dog.

But if you turn it 90 degrees, it looks like the profile of a chef wearing a big hat.

Exactly.

Now Rock found something really interesting.

We can recognize rotated figures,

but our perception depends heavily on what we consider to be up.

What do you mean by that?

It's about phenomenal orientation.

What your mind perceives as the top of the object, not just how it's oriented on your retina.

So like if I tilt my head sideways, the image hits my retina sideways, but I still see the world as being upright.

Exactly.

And Rock found that if you tilt your head, recognition is fine because your brain compensates.

You know where up is.

But if you tilt the image without me knowing which way is up, recognition can fail.

This suggests that normalization isn't just an automatic mechanical process like a computer program.

It requires us to actively understand the orientation.

It seems so.

And this gets even stranger when we talk about reading.

There was this study by Kohler's on reading transformed text.

This section genuinely surprised me.

You're a figure 13.

Yeah.

So they took a sentence and they transformed it in different ways.

One way was to flip individual letters upside down or backwards.

Another was to rotate the entire sentence 180 degrees.

So the whole line is upside down.

Logically, you'd think flipping the whole line is a more drastic, more difficult change.

That's what I thought.

But the result was the complete opposite.

It's much easier to read a sentence that's been rotated 180 degrees than it is to read one where the letters are reversed left to right or inverted individually.

Why on earth is that?

It seems so counterintuitive.

It comes down to relationships.

When you rotate the whole sentence 180 degrees, the relationship between the letters stays consistent.

A B still looks different from a D in the same way relative to the line of text.

The internal logic of the pattern is preserved.

So the brain cares more about the pattern relative to itself than it does about its absolute orientation in space.

Precisely.

But here is where the plot thickens.

We see a real split between how adults and children handle this.

The developmental differences.

This is something any parent has seen.

I have a niece who draws pictures upside down sometimes and she doesn't seem to care at all.

It's a classic observation.

Preschoolers seem indifferent to orientation.

They'll draw a face sideways or upside down.

No problem.

But paradoxically, when you actually test them, as a study by Gibson did, they are actually worse at distinguishing rotated shapes than adults are.

Right.

That feels like a contradiction.

They don't care about rotation.

Shouldn't they be good at recognizing that an upside down A is still an A?

You would think so.

It's a great question.

But the text explains that children aren't mentally rotating the image the way adults do.

They're not.

So what are they doing?

They are just failing to break that down for me.

Failing to discriminate.

So when a child sees a triangle and an upside down triangle, they might treat them as the exact same object, not because they've mentally flipped one to match the other, but because they are just focusing on the features that didn't change.

A triangle has three corners and three lines.

Upside down, guess what?

Still three corners and three lines.

Exactly.

So the kid just sees sharp pointy thing and thinks, match?

Pretty much.

They are looking at the features, the sharp points, a crossbar, a closed loop.

An adult, on the other hand, uses what's called active compensation.

We mentally spin the object to check if it matches.

The child just checks the list of parts, the ingredient list.

Right.

The adult is doing work.

The child is just looking at the ingredients.

That is a crucial distinction.

And this whole discussion of ingredients leads us perfectly into the next major problem for templates, the messiness of reality.

Ill -defined categories.

We mentioned earlier that handwriting is messy,

but the text goes deeper.

It talks about things like beauty or cats.

Or games or even handwritten letters.

These are all categories that don't have strict mathematical boundaries.

There is no single mathematical formula that defines every single cat in the world.

Right.

And templates fail so hard here because you can't have a single template for a concept that is inherently fuzzy.

Impossible.

What's amazing is how our brains manage to fill in the gaps anyway, which brings us to expectancy.

This is the influence of set,

your mindset, or your expectation at that moment.

And the text describes these fantastic experiments by Leaper.

They showed subjects a blurry image.

It's figure 15.

If you just look at it cold, it's a mess of splotches.

You can't make anything out.

If I whisper in your ear, it's a musical instrument.

Suddenly you see a violin.

Snap!

Just like that.

And once you see the violin, you can't unsee it.

Your expectation has organized all that messy visual data into a coherent object.

There's also the rat man example, figure 16.

A classic ambiguous drawing.

It looks a bit like a rat and looks a bit like a man with glasses.

It really does.

And they found that if you showed people a series of animal pictures first.

They see the rat.

But if you show them a series of human faces first.

They see the man.

This tells us something profound.

Recognition isn't just bottom up, meaning it's not just data coming from the eyes up to the brain.

It's also top down.

It's top down.

The brain sends expectations down to organize the incoming data.

We literally see what we expect to see.

Which is, I think, the nail in the coffin for simple template matching.

Because a template doesn't care what you were looking at five minutes ago.

A template is a static file.

Expectancy is dynamic.

Exactly.

So if templates are out,

or at least they're deeply insufficient,

what replaces them?

Where do we go from here?

The text shifts gears here to the dominant theory in modern cognitive psychology.

Feature analysis.

This is the idea that we don't recognize the whole object all at once.

We recognize the parts, the features, and we assemble them.

But before we get to the cool models, we need to talk about how the text proves this with something called search experiments.

Right.

And this brings us to Nicer's own work on visual search.

Okay.

Imagine I give you a sheet of paper that's just covered in random letters, rows and rows of them.

And I tell you, find the letter Z.

Okay, I'm scanning.

Now, there are two ways your brain could theoretically do this.

One is called serial processing.

One by one.

Exactly.

You look at the first letter, is it a Z?

No.

You look at the second letter, is it a Z?

No.

You go one by one, sequentially.

Like checking a Z template against every single letter on the page.

Yes.

The alternative is parallel processing.

This means you are taking in a whole chunk of the visual field at once and processing all the information simultaneously.

So how do we know which one we actually use?

Nicer did something very clever.

First, he asked people to search for one target, say Z.

Then he asked them to search for multiple targets at the same time.

Like find a Z or a K.

Exactly.

Now, if I'm checking serially one by one, checking for two things should logically take longer.

Yeah, you have to do twice the work for every letter.

Is it a Z?

Is it a K?

That is exactly what the sequential serial theory would predict.

But that is not what happened.

What happened?

With experienced searchers, it took the same amount of time to search for 10 different targets as it did to search for one.

Wait, what?

Searching for 10 different letters took no longer than searching for just one.

Correct.

That seems impossible.

It seems impossible if you think in terms of templates.

But if you think in terms of features, it makes perfect sense.

This suggests parallel processing.

We aren't checking for a whole Z template and then a whole K template.

The brain is processing the whole input for features simultaneously.

You aren't looking for Z.

You are looking for its features like slanted lines or horizontal lines.

So if you were looking for Z, K, and X, you are still just scanning for slanted lines.

It doesn't add any extra work.

That is a huge aha moment.

We are feature detecting machines.

We scan for traits, not for whole objects.

We are.

And this leads to some very creative models to explain how this feature analysis might work.

The most famous one mentioned is the pandemonium model.

By Oliver Selfridge.

I love this model.

It's the most metal thing in psychology.

It describes the mind as a literal pandemonium filled with shouting demons.

Metaphorical demons.

Let's be clear about that.

Of course.

But let's paint the picture.

You have these layers of demons.

At the very bottom you have image demons.

All they do is record the raw image.

They're like the retina.

Then they pass that information up to the computational demons.

These are the feature analyzers.

Right.

So one demon might be responsible only for vertical lines.

Another for curves.

A third for acute angles.

And these demons are loud.

If the vertical line demon sees a vertical line in the image, he starts shouting, hey, I see one.

I see one over here.

Exactly.

And above them, you have the cognitive demons.

These represent the letters, the alphabet, the A demon, the B demon, and so on.

And what are they doing?

They're listening.

They're listening to the shouting from the computational demons down below.

So the A demon is listening for specific shouts.

He's waiting to hear from the slanted line demon and the crossbar demon.

If he hears them both screaming, the A demon starts screaming too.

It's me.

It's an A.

Pick me.

Meanwhile, the O demon is totally quiet because the curve demon isn't shouting at all.

Right.

And finally, at the very top, you have the decision demon.

His job is simple.

He just listens to all the cognitive demons and picks the one that's shouting the loudest.

Okay.

The A demon is screaming his head off, so it must be an A.

It sounds comical, but it's an incredibly robust model.

Why does it work so well?

Well, for one, it explains parallel processing.

All the computational demons can scream at the same time.

Right.

It also explains why we can read messy handwriting.

The A demon might still scream the loudest, even if the slanted line demon is only whispering a little bit because the line is crooked.

It allows for fuzzy matches.

It's flexible.

It's not a pass fail system like a template.

It's more like a weighted vote.

Exactly.

And the text makes a useful distinction here between types of parallel processing.

There's spatially parallel analyzing the whole retina at once and operationally parallel, which is what pandemonium is.

The curve demon doesn't need to know what the vertical line demon is doing.

They all work independently and at the same time.

Now you can contrast that with the other model mentioned, the decision tree or EPAM.

Right.

EPAM.

It stands for elementary perceiver and memorizer.

This is a sequential model.

So it's not parallel?

No.

It's more like a game of 20 questions.

Okay.

So is the shape round?

No.

Does it have a straight line?

Yes.

It's the line vertical.

Yes.

And so on down the tree.

It's very efficient if everything goes perfectly right, but it has a big weakness.

Which is?

If you make one mistake, if you answer no to, is it round?

When you're looking at a messy,

oh, you're sent down the wrong branch of the decision tree and you can never get back.

You will misidentify the letter.

Guaranteed.

Whereas in pandemonium, if one little demon misses a cue, the others might still shout loud enough to win the vote.

So pandemonium is messy, but it's robust.

EPAM is efficient, but it's brittle.

A very good way to put it.

So you have the theory, feature analysis, but you know, theories are just stories until you can actually cut something open and find the proof.

Which brings us to the biological evidence, the hard science.

The frog's eye.

This is an incredible study.

Letvin and his colleagues, they actually recorded signals from the optic nerve of a frog.

And what did they find?

They found individual nerve fibers, single neurons that responded to very, very specific things.

They called them bug detectors.

They did because these particular neurons would fire like crazy only if a small, dark object moved into the frog's field of view.

So like a fly.

Exactly like a fly.

They didn't care about the background brightness changing.

They didn't care about stationary shadows.

They only cared about small, dark moving dots.

So the frog's eye isn't just a camera sending a raw picture to the brain.

Not at all.

The eye itself is doing the analyzing.

It's basically saying, I don't need to tell the brain everything.

I just need to say bug at two o 'clock.

The feature analysis is built right into the hardware of the eye.

Precisely.

And Hubel and Weisel found something very similar in cats.

They found cells in the cat's visual cortex that respond only to lines at specific orientations.

So there is literally a vertical line demon in the cat's brain.

A neuron that does that job.

In a sense, yes.

There are neurons that fire like crazy when they see a vertical line and they go completely silent if you tilt that line just 45 degrees.

That's physiological proof that feature analysis is real.

It is.

It's concrete evidence.

That is mind -blowing.

We actually have the hardware for the demons.

But there's one last piece of evidence the text covers that I found the most, well, the most unsettling.

The stopped image fragmentation.

This one is a bit spooky.

I agree.

Okay.

So what is it?

We know that the eye is constantly making kind of jittery movements.

They're called saccades.

Right.

If you use a special contact lens rig to stabilize an image on the retina so the image moves perfectly with the eye, the image stops moving across the photoreceptors.

And when that happens, the neurons get, what, tired?

And they stop firing?

That's the idea.

The image disappears.

It fades away.

But the text points out that it doesn't just fade randomly like a bad TV signal.

It fragments into meaningful parts.

Yes.

And this is the spooky part.

So this is figure 23.

They showed a subject, a profile of a face.

When it faded, the face didn't just turn into gray slush.

No, it broke apart.

The front of the face might disappear, but the curly hair at the back would stay.

Or just the eye would stay.

Or with the word beer.

This was the best one.

The word beer.

As it faded, it would turn into peep.

Or bee.

Or mate bee.

It faded into other different meaningful units, other words.

It didn't fade into like B -E partial R.

It reorganized itself into something that still made sense.

And McKinney's finding re -infused this.

Meaningful shapes, like whole letters, hold together longer than meaningless lines do when they fade.

So the brain is fighting to maintain the pattern even as the sensory input is dying.

It's clinging to the concept.

It's clinging to the A -ness.

Or the beer -ness.

Wow.

Okay, so let's unpack what we've learned today.

This has been quite a journey.

It has.

We started with the mystery of the A.

How we recognize it despite the infinite variety of shapes it can take.

We looked at the huff -ting step.

That essential handshake between seeing and memory.

Then we dismantled the simple template matching theory.

It's just too rigid.

It can't handle rotation or size changes or messy handwriting.

Not without an impossible amount of storage, anyway.

Right.

But we did learn that we use some normalization.

We mentally rotate things, provided we know which way is up.

We saw how our own expectations shape reality.

If you think it's a violin, you see a violin.

That's top -down processing.

And we finally landed on feature analysis.

The idea that we process the world in parallel, hunting for specific traits lines, curves, angles.

Just like the demons in Pandemonium.

And finally, we saw that biology backs all of this up.

Frogs and cats have specific feature detectors hardwired right into their nervous systems.

It all connects.

So what does this all mean for us, for the listener?

What's the big takeaway?

I think it means that our perception of the world is a construction.

We don't just passively record reality like a video camera.

Right.

We take messy, incomplete data.

We filter it through our built -in feature detectors.

We organize it based on what we expect to see.

And from all that, we build a coherent picture.

So we are active participants in our own vision.

We are.

The A isn't really on the paper.

The A is a conclusion that your brain reaches after a very complex, very fast argument between a lot of shouting demons.

That is a fascinating thought to leave everyone with.

The world you see is the world you build.

Indeed it is.

Well, we want to thank you for joining us on this deep dive into cognitive psychology.

It's been a wild ride through the mechanics of the mind.

Thank you for listening.

This is the Last Minute Lecture Team signing off.

Keep looking closely.

You never know what features you'll find.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Pattern recognition represents a fundamental cognitive capability that enables humans to identify objects and symbols despite variations in presentation, context, size, and orientation. The foundation of this process rests on the Höffding step, the critical moment when a current sensory experience connects with stored memory traces to produce identification. Two major theoretical frameworks attempt to explain how the visual system achieves this feat. Template-matching theory proposes that incoming stimuli are compared against stored prototypes or mental templates, yet this approach encounters significant limitations when confronted with changes in retinal position, image size, or rotational angle, as well as with loosely defined categories that lack sharply defined boundaries. Feature-analysis theory offers an alternative by proposing that recognition occurs through the detection and integration of basic visual features rather than whole-form matching. The normalization concept addresses the template-matching difficulties by suggesting the mind standardizes images before comparing them to stored representations. Empirical investigation distinguishes between sequential processing, where attention moves serially from one stimulus to another, and parallel processing, where multiple targets can be scanned simultaneously without speed loss, particularly among practiced observers. The Pandemonium model exemplifies hierarchical feature-detection architecture, employing independent feature analyzers sensitive to specific attributes such as angles, curves, and discontinuities. Physiological investigations using microelectrode recordings in animal visual systems reveal the existence of innate feature detectors tuned to complex properties including motion direction and edge orientation. Developmental studies demonstrate that perceptual learning involves gradually discriminating distinctive features rather than simply storing prototypes or templates. The phenomenon of perceptual fragmentation, wherein stabilized retinal images spontaneously organize into coherent meaningful units rather than random segments, provides behavioral evidence for underlying feature-based processing mechanisms. Collectively, behavioral experiments, computational modeling, and neurophysiological data converge on a framework where visual recognition operates through a sophisticated hierarchy of feature analyzers, enabling the cognitive system to resolve the inherent ambiguity and variability present in natural visual environments.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 3: Pattern Recognition: How the Mind Identifies Visual Forms

Related Chapters