Chapter 9: Visual Imagery & Spatial Thinking

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome back to the Deep Dive.

Today, we're taking on one of the most fundamental, yet surprisingly controversial topics in the study of the mind,

the mental picture.

It sounds so simple, but it's a real battleground in cognitive science.

It really is.

And I want to start you off with a quick cognitive task.

So just settle in for a moment.

Picture your kitchen, the one in your primary residence, the one you know inside and out.

Got it.

Okay.

Now, starting from your stove and moving clockwise around the room, I want you to mentally scan that image and count the number of cabinet doors you have.

Just take a moment.

Try to get a real count.

Exactly.

Now, if you're like most people, you didn't have that specific number just, you know, stored away as a fact in your memory.

No, absolutely not.

I definitely had to sort of walk through it.

You had to stop.

You had to perform an act of internal visualization, right?

Moving through the scene and mentally ticking off the locations.

That seemingly simple process, that internal representation we call a visual image and the effort required to use it is the central mystery we're unpacking today.

We are diving right into the core of what we call cognitive representation.

And these internal representations, they don't just exist for sight.

They could be auditory images like imagining the sound of your phone ringing.

Oh, yeah.

Or even olfactory images like the smell of coffee.

But, you know, visual images are the ones cognitive psychology has really fixated on.

Why is that?

Why the focus on vision?

Well, it's primarily because we spend so much of our academic effort and, quite rightly, trying to understand visual perception in the first place.

It's our dominant sense, so it's a natural starting point.

Okay, let's unpack this a bit because for decades,

the study of mental imagery was, well, it was strictly off limits.

It wasn't just neglected.

It was actively rejected.

The dominant school of thought at the time, behaviorism, looked at that internal picture you just made of her kitchen and basically said, no, thank you.

Why was that?

Why was it so problematic for science?

The problem really is one of scientific rigor.

A visual image is a private subjective experience.

If you tell me you're imagining a green elephant, I have no way to verify that.

Right.

I could be lying.

You could be.

I can't see its existence, its color, its clarity.

I can't see it, count it, or control it externally.

And for behaviorists who only trusted publicly observable stimuli and responses, this was a critical flaw.

It had to be measurable.

It had to be.

If the experience could only be reported by the person having it, it risked distortion, bias, or even outright fabrication.

So for them, it simply wasn't a valid subject for scientific investigation.

That sounds, you know, from a pure scientific methodology standpoint, that sounds reasonable.

If you can't measure it, how can you trust it?

But clearly, the interest didn't just die out.

Not at all.

By the 1960s, imagery staged a huge comeback.

What changed?

What brought it back into the fold?

What changed was the data.

It always comes back to the data.

Researchers began accumulating evidence from memory tasks, from spatial reasoning, from complex problem solving, and it just could not be explained if the mind only operated using abstract, you know, linguistic symbols.

So words weren't enough.

Words weren't enough.

The only way to account for people's performance on tasks, like your cabinet counting example, was to accept that people must be using some form of internal, non -verbal spatial representation.

The mind had to be storing something more than just words.

And this isn't just some lab curiosity anymore.

This is where it gets really interesting on the application side.

I mean, think about the sports world.

We always hear about athletes visualizing a perfect run or a flawless shot.

It's a huge part of high -level training.

That practice of mental imagery.

It's not just, you know, a motivational trick.

It's a proven cognitive mechanism.

Studies in sports psychology have shown again and again that visualizing a smooth, well -timed performance can improve real -world results later.

It acts almost like a mental rehearsal.

It's like you're priming the necessary motor and cognitive circuits without ever moving a muscle.

And the application extends even further into emotional regulation.

There's some really powerful research showing that if you ask participants to visualize the cool aspects of an emotionally negative event.

So for instance,

focusing on their physical surroundings during a moment of rejection.

So not the feeling itself, but the like the wallpaper in the room.

Exactly.

They were better able to reduce hostile feelings afterward compared to those who were told to focus on the visceral emotions, the hot aspects.

So you're turning a passive memory trace into an active tool for self -regulation.

That's incredible.

So our mission today is to take a deep dive into the very nature of these internal pictures.

We're going to unpack how they boost memory.

We'll look at the ingenious experiments designed to prove that mental images behave like physical objects.

Explore that fierce philosophical and scientific debate over the whole images picture metaphor.

And finally, connect all of this to the larger system of spatial cognition.

The complex way we mentally map and navigate the real world.

To begin, let's look at the oldest and I think most practical use of visual imagery, memory aids.

If you really want to dramatically improve your ability to recall information, you use what are called mnemonics.

Right.

These are techniques specifically designed to aid recall and the most powerful ones, the ones that give you these incredible memory feats.

They rely heavily on constructing mental pictures.

And the classic example has to be the method of loci.

Or the method of places.

Its origin story, which dates back to the Greek poet Simonides around 500 BC,

is it's just legendary.

Tell us that story.

It's a good one.

Well, Simonides was attending a banquet when he stepped outside for a moment.

And just moments later, the roof of the hall collapsed, crushing all the other guests, making them completely unrecognizable.

A total tragedy.

But Simonides saved the day, so to speak, because he realized he could identify the bodies by recalling the specific location or locus where each person had been sitting.

He used his memory of the space.

And the mechanism behind it is wonderfully straightforward.

You use a series of well -known established places like the rooms in your house or landmarks on your commute.

And then you mentally picture the items you need to remember interacting vividly with those specific locations.

Let's use your office route as a concrete example.

Say you have five items for a meeting, a tablet, a pen, some printouts, a book, and a calculator.

You would mentally place them along that route.

So you might imagine the tablet propped against your office door.

That's locus one.

Then the pen is jammed into the potted plant on your colleague's desk.

That's locus two.

And the printouts are fluttering from the fire alarm pole station.

Exactly.

And so on.

When you need to recall the items, you just mentally walk the route and see what's there.

And the power of this is just undeniable.

There's a study in 1968, Ross and Lawrence, that showed college students trained in this method could recall an astonishing 38 out of 40 words after only one presentation.

That level of recall, I mean, particularly after a single viewing, is just so far beyond what you can achieve with simple rote memorization alone.

Which brings us to the critical rule, right?

The thing that makes it all work.

A researcher named Bauer formalized it, and it's principle six.

Use interactive images.

This is the absolute P.

It is not enough to just place a goat near a pipe in your mind.

Right.

You must picture the goat actively smoking the pipe, maybe with a little top hat on and coughing.

The active, bizarre and unusual interaction is what makes that memory link stick.

So the magic isn't just in the image itself, but in the organizational effort you put into making it weird.

Precisely.

And this leads directly to the technique of interacting images.

We know from very early work back in the 1890s that just telling participants to form images of concrete nouns improves recall.

But the real game, the big leap, comes from that interaction.

And Bauer showed this really clearly in a 1970s study on cared associate learning, didn't he?

He did.

He had groups who were asked to form interactive images, like our goat smoking a pipe.

They were called almost double the number of pairs compared to the control group, who were just told to rely on rote repetition.

And what about a group that just imagined them side by side?

That's the crucial comparison.

If they just imagined a goat next to a pipe, the memory benefit largely vanished.

The image forces the two items to be bound together conceptually, but only if they're interacting.

A related technique, which is a little more structured, is the peg word method.

So instead of using geographic locations, this method relies on a pre -memorized rhyming list as the pegs.

Exactly.

It's a classic.

One is a bun, two is a shoe, three is a tree, and so on.

The new items you need to remember are then pegged to these cues via an interaction.

So if I need to remember salt and pepper, I might picture a giant bun, for one, completely covered in salt.

And then for two, a shoe kicking a giant pepper shaker across the room.

Got it.

And studies confirm this works really well as long as the participant is given enough time, I think it was four seconds or more per item, to actually construct those interactive mental scenes.

The construction takes time.

It's not an automatic process.

Now, just to be thorough, we should note that not all mnemonics rely on pictures.

There are other methods.

Absolutely.

You have methods like recoding, where you take the first letter of words to form a new word or phrase.

Homes for the Great Lakes is the classic example.

Or every good boy deserves fudge for the musical staff.

Right.

These Don imagery techniques act as what we call mediators.

They're internal linguistic codes connecting the item to the desired response.

But you're right, the real power players often seem to be the imagery -based ones.

Which brings us to the big philosophical question here.

Why does imagery work so well?

I mean, what's actually going on under the hood?

And we have two powerful theoretical explanations that really tried to settle this back in the 1970s.

The first one is the dual coding hypothesis, pioneered by Alan Pivio.

Right.

Pivio's big idea was that long -term memory doesn't use just one code.

It uses two distinct parallel systems.

There's the verbal system for linguistic meaning and abstract concepts.

Like the word truth.

Exactly.

And then there's the imagery system for mental pictures and sensory data.

Like the word table.

Precisely.

And the key idea is retrieval redundancy.

A concrete word, like table, gets both a verbal code, the definition, the sound of the word, and an imagery code, the mental picture of a table.

But an abstract word, like truth, usually only gets the verbal code.

And having two separate independent paths to retrieval, verbal, and visual makes that concrete word much, much easier to recall.

You have two chances to find it in your memory.

And the evidence for this is, well, it's crystal clear.

Pivio ran an experiment with four types of noun pairs.

Concrete, concrete, like book table.

Concrete, abstract, like book truth.

Abstract, concrete, like beauty table.

And finally, abstract, abstract, like beauty truth.

And the results were exactly what you'd predict.

Recall was dramatically highest for concrete -concrete pairs.

That makes perfect sense.

Two strong, visually anchorable concepts.

And the recall for abstract -abstract was predictably the lowest.

But the really interesting part was the mixed pairs.

The concrete -abstract pairs were recalled better than the abstract -concrete pairs.

Okay, hold on.

Why is that distinction important?

Both of those pairs have one concrete noun.

Why did the order matter so much?

That's where Pivio introduced this really elegant idea of the conceptual peg.

He argued that the first word, the stimulus noun, acts as the mental anchor, or the peg.

If that first noun is concrete, it provides a strong visual image that can act as a hook, a conceptual peg, onto which the second item, even an abstract one, can be tied.

But if the initial peg is abstract, like beauty, it provides a really weak foundation, and the concrete word that comes after it struggles to stick as effectively.

Exactly.

The visual power of that first word is so strong that it almost doesn't matter how abstract the second word is.

It really confirms the importance of that visual system.

But Pivio's theory, as powerful as it was, wasn't the only game in town.

No, there was another one, the relational -organizational hypothesis, proposed by Gordon Bauer.

And it offered a very different explanation for why imagery is so successful.

So Bauer didn't deny that we form images.

Not at all.

But he argued the benefit wasn't because the image is some inherently richer dual code.

Instead, he claimed that imagery improves memory because it forces you to create more associations or hooks between the items.

The image is just a mechanism to facilitate better organization.

To test this against Pivio's dual -coding prediction,

Bauer ran this really crucial experiment comparing three conditions.

First, rope memorization.

Second, non -interactive imagery.

So just picturing things side by side.

And third, the interacting imagery are goats smoking a pipe.

Right.

Now, if dual -coding was the whole story, both of those imagery groups should have performed significantly better than the rote group, simply because they had two codes available to them.

But the results told a much more nuanced story.

Rote memorization yielded about 30 % recall.

The non -interactive imagery group.

It was almost identical, about 27%.

Wow.

So just making a picture did nothing.

It did almost nothing.

However, the interacting imagery groups surged a 53 % recall.

That's a huge difference.

It's massive.

And it showed pretty definitively that the key insight isn't just make a picture, it's that the image works because it forces you to think about how A and B are linked together.

The act of generating that bizarre interactive scene forces organization and connection, and that relational organization is what boosts memory, not just the mere presence of a visual code.

So the mnemonics research strongly suggests that images exist and they're useful, but, you know, it doesn't really tell us what they are.

Do they behave like language or do they behave more like physical pictures?

And that's the question that leads us into the classic lab experiments, which were designed to truly test the nature of the mental image.

Lee Brooks' seminal work in 1968 provided some of the earliest, most compelling evidence that images and verbal materials are, in fact, distinct cognitive systems, likely using different processes.

This was a critical step in moving beyond that behaviorist rejection of the whole topic.

So Brooks used two main tasks.

First, there was the F task.

Participants were asked to imagine a capital letter F and mentally trace it, moving clockwise.

And for each corner they reached, in their mind, they had to indicate whether it was the extreme top or the extreme bottom of the letter.

Right.

So this is fundamentally a visual spatial task.

You have to see the F.

The second task was the sentence task.

Participants memorize a sentence, something like a bird in the hand is not in the bush.

And then for each word, they had to indicate whether it was a concrete noun or not.

So a purely verbal linguistic task.

Exactly.

Now, here's where the brilliance of the experimental design really comes in, the response mode.

How they give their answers.

Yes.

Sometimes they responded verbally just by saying yes or no.

Other times they responded spatially by pointing to Y or N labels that were scattered irregularly on a sheet of paper.

And the finding was just so powerful.

If you are holding a visual image in your head, like the F, and you try to respond spatially by pointing, your response time slows down dramatically.

We're talking two and a half times longer.

Conversely, if you are holding a verbal memory, like the sentence,

responding verbally is what causes the most interference and the greatest slowdown.

Wait, I have to play devil's advocate here for a second.

That interference is huge.

But is there any possibility that they were just getting tired or bored with the pointing task?

I mean, why couldn't simple fatigue be the reason for the slowdown?

That's a fair challenge, and it's one researchers have to consider.

But the experimental design really controls for that.

If it were just mere fatigue, the pointing response should have been slow for both tasks, right?

Okay, yeah.

It was only dramatically slow when the response modality, the spatial pointing,

matched the internal code being used, the visual image of the F.

It demonstrates that the visual image and the spatial pointing are drawing on the same limited visual spatial buffer or mental workspace.

Which confirms that these cognitive systems are distinct from the verbal system.

Exactly.

They interfere with each other because they're competing for the same mental resources.

That clarity helps.

We also see images behaving like pictures when we have to retrieve information about them.

I'm thinking about the symbolic distance effect discovered by a researcher named Moyer.

Right.

So if I ask you, which is larger, a house cat or a hog?

I quickly summon a mental picture of the two animals and I sort of read the size comparison off the image in my head.

A hog is bigger.

Moyer found that people are much, much faster to compare two items when they differ greatly in size, say a whale versus a cockroach, than when they're very similar in size, like the hog and the cat.

And the critical insight, which Pyeview showed later, is that this exact same speed difference, this symbolic distance effect, is also obtained when participants are looking at actual photographs of the objects.

And that's the key.

If our mind were just retrieving abstract verbal facts, like a dictionary definition that says, whale is much bigger than cockroach, the degree of size difference shouldn't slow us down so dramatically.

Right.

It would just be a fact lookup.

Exactly.

The fact that it does suggest the mental image is functioning spatially, like a picture where you have to visually inspect and compare.

So moving beyond these static images, the next big question became,

can we transform these images dynamically?

Can we move them around in our heads?

And this is where we encounter the classic, truly foundational work on mental rotation.

Shepard and Metzler, 1971, they created these perspective line drawings of three -dimensional, kind of strange asymmetrical objects.

Like Tetris pieces made out of cubes.

Yeah, exactly.

And participants were presented with two of these figures at different angles, and they just had to judge.

Are they the same object rotated, or are they mirror reversals?

And the finding, which is honestly one of the most famous linear relationships in all of cognitive science, was that the amount of time required for the decision was directly and linearly proportional to the angular difference between the two drawings.

So if the figures were separated by 140 degrees, it took about twice as long to decide as if they were only separated by 70 degrees.

Precisely.

This strongly implies that participants were performing a mental rotation continuously, passing through all the intermediate orientations, just like you would if you were rotating a physical object in space.

And importantly, this linearity held true whether the rotation was in the flat picture plane or rotated deep into the third dimension.

They were manipulating a full 3D internal representation.

And follow -up work just kept confirming the dynamic nature of this process.

Cooper and Shepard, a couple years later, used more recognizable stimuli, like letters.

Like a letter R.

Exactly.

And they show that if you informed participants of the orientation of the test stimulus a full second before it appeared, they could pre -rotate the image in anticipation.

So they got their mental R into position ahead of time.

And they got it ready.

And as a result, their response times were consistently fast, regardless of the angle of the test stimulus.

They were ready when the test came.

And the complexity of the object didn't even matter, which is another crucial detail.

Right, Cooper.

In 1975, used these irregular polygons with a varying number of points.

But the rate of rotation, so the slope of that time over angle line, was exactly the same, whether the polygon had six points or 24.

Which argues that the entire object is being rotated as a single coherent unit, rather than being mentally analyzed piece by piece.

It's a holistic rotation.

Now, the modern debate around this is pretty critical for understanding how we recognize objects in the real world.

Right.

When we see a chair that's rotated at some unusual angle, do we always have to mentally rotate it back to a standard upright view to know it's a chair?

That's the question.

Or, as some researchers suggest, can we sometimes recognize objects based purely on their component parts, what are called genons, which are essentially fundamental geometric shapes like blocks, cylinders, and arches, without needing to rotate the whole image.

So that's the debate.

Is recognition based on rotating the whole view, or just analyzing these basic geometric building blocks, these genes, regardless of their orientation?

That's the core of it.

But regardless of that debate, the sheer fact that reaction time is so tightly tied to angular distance provides immense support for the spatial, picture -like nature of the mental image.

Next, we move into the domain of spatial organization with Stephen Costland's work on imaginal scanning.

And the hypothesis here was so elegant.

It really was.

If mental images truly preserve spatial relations, then the time it takes to scan between two points in the image should be directly proportional to the physical distance between those points.

Costland first showed this using simple elongated drawings, like a boat or a flower.

If participants had to mentally scan from one end of the object to find a specified part, say, from the anchor of a boat to the motor at the back.

The longer the mental trip, the longer the reaction time.

Exactly.

But wait, a natural challenge arose from another researcher, Leah, who pointed out that maybe the increased time wasn't about the distance at all.

What was the alternative explanation?

Leah suggested it might be the number of intervening items that the participant mentally had to pass over while they were scanning.

Maybe there were more things on the longer routes that slowed them down.

That makes sense.

It's a good confound to point out.

It's a great confound.

And this challenge demanded a definitive test, which Costland, Ball, and Reiser delivered in 1978 with the famous island map experiment.

I remember this one.

They had participants memorize a fictional map of an island that had seven distinct objects on it.

Oh, well, a tree, a hut.

And crucially, the distances between the objects varied widely, from two centimeters all the way up to 19 centimeters on the map.

But the paths were specifically designed to not contain any intervening items.

So that took Leah's critique right off the table.

Completely.

And the result was overwhelmingly in favor of the spatial hypothesis.

The reaction times for scanning from one object to another were extremely highly correlated with the physical distance on the original map.

Which really reinforces the idea that the mental image is functionally a spatial representation, preserving relations like distance and location.

So the image preserves space.

But here is the crucial caveat, the thing that prevents us from calling it a perfect internal photograph.

And this is Barbara Tversky's work on systematic distortions.

It provides really strong counter evidence that our mental maps are far from accurate.

Far from it.

So if you rely on your mental map,

you'll almost certainly get these two classic questions wrong.

First,

is Seattle farther north than Boston?

My gut says Boston is north of Seattle.

And second,

is San Diego farther east than Reno?

My gut again says Reno is east of San Diego.

And your gut is wrong on both counts.

Reality is that Seattle is significantly further north than Boston, and San Diego is actually east of Reno.

That's so counterintuitive.

Why are our mental maps so wrong?

Tversky argued that our mental maps are systematically distorted because we rely on simplifying heuristics, or rules of thumb, to make the map neater and easier to store.

We're basically cleaning it up in our heads.

We are.

We mentally align things orthogonally on perfect north -south and east -west axis.

We think of the continents as neatly stacked, so we imagine South America is directly south of North America, even though it's physically quite far to the southeast.

These distortions show that the image is an interpretation that prioritizes simplicity over physical accuracy.

Another crucial finding that challenges that perfect picture notion came from a study that used the classic ambiguous figure, the duck rabbit.

Chambers and Reisberg, 1992.

They showed participants the figure very briefly, asked them to form a clear image, and then they took the picture away.

And they found that almost none of the participants could spontaneously reverse the image in their mind to see the other figure.

That's right.

If they saw the duck first, they couldn't just will themselves to see the rabbit from their mental image.

They had be shown the actual picture again.

Yet if you're looking at the physical drawing, reversal is easy.

So what's the takeaway from that?

The takeaway is profound.

The meaning or the construal that you give to the stimulus determines the image that gets formed.

And once that meaning is locked in, the image resists spontaneous reinterpretation, unlike a true ambiguous picture.

Finally, we should introduce the really crucial finding that imagery is not always a good thing.

It can sometimes be a burden.

It can be a cognitive load.

Knopf and Johnson Laird study how people solve what are called three -term series problems.

So things like Tandy is furrier than Bussy, and Bussy is furrier than Scruffy, who is the frittiest.

Exactly.

And they varied the relational terms used in the problem.

They found that problems requiring purely visual relations,

like judging clean or dirtier, where you have to visually imagine the dog in the mud.

Oh, that's a very visual one.

It is.

And that significantly slowed performance compared to problems that used either visual -spatial relations, like a back of in front of, which is easier to map spatially, or abstract control problems, like better or worse.

So what does that imply?

It implies something very important.

The mental effort you use to construct a detailed visual image, imagining the precise amount of dirt on those dogs,

actually consumes cognitive capacity.

And that capacity is then unavailable for the logical reasoning required to solve the problem.

So in that context, imagery becomes a cognitive load, not an aid.

Precisely.

It gets in the way.

To synthesize all this overwhelming evidence that supports the picture -like nature of imagery, Ronald Fink, back in 1989, proposed five foundational principles of visual imagery.

Right.

These principles really aim to describe its fundamental properties.

The first principle is implicit encoding.

And this means that mental imagery can help us retrieve information that was never explicitly or intentionally stored.

Our opening task, counting the cabinet doors, is the perfect illustration of this.

We never intended to store the number 16 or whatever it was, but that information was implicitly encoded along with all the other visual details, which allowed us to construct and scan the image later.

The second principle is perceptual equivalence.

This states that imagery is functionally equivalent to perception.

Meaning it activates similar, if not identical, mechanisms in the visual system.

We saw early evidence of this with a study by Perky way back in 1910, where participants literally couldn't tell the difference between their faint mental image and an actual faint picture being projected onto a screen.

And a researcher named Farah provided the controlled modern evidence for this in 1985.

Imagining a letter, say an H or a T, acted as a prime for the visual system.

What does that mean, it acted as a prime?

It means it significantly improved the participant's ability to detect a very low -contrast actual letter that was presented just a moment later.

The imagery here acts like a kind of perceptual rehearsal.

It gets the visual system ready to see.

Fink's third principle is spatial equivalence.

This is the idea that the spatial arrangement location, distance, and size within the mental image corresponds to the actual physical arrangement.

Costlin's scanning experiments are the primary behavioral evidence here, showing those strong correlations between distance and time.

But the novel evidence for this is even more striking because it proves the spatial component is distinct from any visual input.

You're talking about the Kerr study.

Exactly.

Kerr in 1983 conducted map scanning studies on congenitally blind participants who had only ever learned a map by touch.

And they showed the exact same correlation between distance and scanning time as sighted participants did.

Which demonstrates that spatial properties are preserved in mental images, even when vision has never been available.

It confirms that spatiality is a core feature of the mental representation itself.

The fourth principle is transformational equivalence.

This principle is drawn directly from all the mental rotation and transformation studies we've already discussed.

Right.

It posits that imagined transformations and physical transformations obey the same dynamic characteristics and laws of motion.

They're continuous, they pass through intermediate stages, and they're time -dependent based on the degree of transformation required.

And the final principle is structural equivalence.

This suggests that the structure of an image corresponds to the structure of a perceived object.

So images are coherent, well -organized, and built up in pieces.

And we see evidence of this when we look at the time it takes to actually create an image.

Costlin and his colleagues found that the time required for image generation increases directly with the complexity of the object being imagined.

But it's not just physical complexity, right?

It's also how you think about the object.

Exactly.

The assembly time depends on the conception of the object.

So imagining a pattern that you conceive of as five squares in the shape of a cross took longer than imagining the exact same physical pattern when you conceived of it as two overlapping rectangles.

The mental structure dictates the assembly time.

Okay, so if we have these five robust principles, all backed by decades of behavioral data, why is this field still so controversial?

Well, the controversy largely centers on the very powerful critiques that were put forth by Zenon -Polition, which challenged the core assumption that we're even dealing with a distinct visual code.

His first critique centers on tacit knowledge and demand characteristics.

Right, so Polishin argued that the findings of those image scanning studies, where scan time is linear with distance,

don't necessarily reflect actual image manipulation.

He said, instead, participants might just be accessing their tacit knowledge.

So you're saying that participants just know that physically scanning a long distance takes more time than scanning a short distance.

And they consciously or unconsciously slow down their mental scanning because they think the experimenter expects that outcome?

That's the argument.

And that is the definition of demand characteristics, where the experimental situation itself cues the participant on how they should behave.

So the results become artificial.

Potentially.

And this critique is strengthened by findings like those from Innocent Peterson in 1983,

who showed that experimenter expectancy effects, these subtle unintentional cues from the person running the study, could actually influence the results of mental rotation experiments.

So this presents a real methodological hurdle.

It suggests that maybe these findings aren't about how the image works internally, but about how people think images are supposed to work.

The second critique argues that the picture metaphor is flawed.

Paylesian argued that a picture and a mental image are fundamentally different things.

How so?

Well, a picture can be viewed without any prior knowledge.

An image, on the other hand, is an internal construction that's formed with intention and pre -existing knowledge.

And we noted the duck rabbit example earlier.

Pictures allow for spontaneous reversal, but images do not.

Plus, images are highly susceptible to interpretation.

There was that classic study from the 1930s where the label given to an ambiguous drawing -like eyeglasses versus dumbbells systematically distorted how participants later recalled and drew the figure.

The images filter through meaning.

And if images were really these perfect internal photographs, we should be able to recall minute details of objects we see every single day.

Right.

But we can't.

Nickerson and Adams famously showed that people make numerous errors when they try to reproduce an image of a collinuous penny.

Details like which direction Lincoln is facing or what the mottos say are often completely forgotten.

It shows there's far less information stored than in a clear photograph.

And the third and most fundamental critique is propositional theory.

This is the big one.

This theory rejects the idea of two distinct codes,

visual and verbal entirely.

It posits that all mental representation is stored in a single, abstract, non -visual, non -verbal propositional code.

So not a picture, not a word, but something else entirely.

Something more like the language of thought or logic.

Propositions are statements that specify relationships, often captured in a formal way like city New York or west of New York, Boston.

And Polishin argued that any result from an imagery experiment could be explained by people accessing and manipulating these underlying propositional representations.

Which would make the whole concept of a mental picture unnecessary and, in his view, misleading.

So how did the imagery camp respond to that powerful challenge?

I mean, if everything is just abstract propositions, how do you explain the speed differences that are tied to physical size and distance?

Kosslin provided a great counter -rebuttal in 1976.

He tested verification time based on association strength versus visual size.

Okay.

So without being told to use imagery, verification was faster for high association words.

For example, verifying that a cat has claws, which is a high association but a small visual feature, was fast.

And that result aligns perfectly with the propositional prediction.

But, and I sense a but coming.

A very big but.

When participants were explicitly forced to use imagery, the results flipped.

Verification was faster for visually larger parts, like verifying that a cat has a head.

That has a lower association value than claws, but it's a large visual feature.

So the requirement to engage imagery switched the cognitive priority to a physical property, visual size.

Exactly.

And propositional theory really struggled to explain why manipulating abstract code would suddenly prioritize physical characteristics like that.

This deep and ongoing controversy explains why the move into neuropsychology was so vital for this field.

Absolutely.

If we can monitor the physical brain while images are being formed and manipulated, maybe we can finally resolve the debate over their true nature.

And the findings here have been incredibly robust.

They show a really clear physiological link between mental imagery and actual vision.

Roland and Freiburg, back in 1985, they monitored cerebral blood flow, which is a proxy for brain activity, during three different tasks.

Mental arithmetic, auditory memory scanning, and a visual imagery task.

Which was imagining a walk through a familiar neighborhood.

Right.

And the comparison was just stark.

Only the visual imagery task showed this massive activation in the occipital lobe and other posterior regions.

The very parts of the brain that are dedicated to processing actual visual input.

So the brain was acting as if the person were literally seeing the scene.

It was.

And Costland and his colleagues took the specificity even further a decade later.

Using PET scans, they asked volunteers to imagine line drawings at different sizes.

Small, as if held in their hand.

Or large, as if it were filling their entire visual field.

And while the visual cortex activated during all the imagery tasks, the critical detail was that the specific area of the occipital lobe showing maximal activation depended on the perceived size of the image.

When the image was small, the activation peaked in the posterior part of the visual cortex.

When the image was large, the activation peaked more anteriorly.

This directly mirrors how the visual cortex processes actual retinal input based on size and location.

Proving that imagery engages the actual retinotopic or map -like organization of the visual system.

And this shared neural substrate isn't just limited to vision.

Other studies show that imagining a pitch change in a song activated the secondary auditory cortex in the temporal lobes, just as actually hearing the song did.

And perhaps the most specific evidence came from a 2000 study using FMRI, which showed highly targeted activation.

Right.

When participants imagined faces, the fusiform face area, or FFA, was active.

When they imagined a place or a scene, the parohippocampal place area, or PPA, was active.

And these are the exact specialized regions that are activated when you're viewing actual photographs of faces or scenes.

It's an incredible level of specificity.

So let's bring this neurodata back to the two main critiques from Polition.

How does this overwhelming brain evidence challenge his arguments?

Well, it delivers a really powerful blow to both.

Regarding the demand characteristics critique, Farah pointed out that the blood flow results are virtually immune to that.

How so?

For a participant to artificially produce these specific patterns of localized, size -dependent activation, they would need to have conscious, detailed knowledge of their own brain anatomy.

And then voluntarily control their cerebral blood flow to precise regions of the visual cortex.

Which is an extremely unlikely scenario.

The activation appears to be involuntary and genuine.

And for propositional theory, the evidence is nearly a death blow.

It really is.

Propositional theory claimed a single, abstract, non -visual code.

The fact that the most fundamental parts of the visual processing system, the occipital lobe, the FFA, the PPA, all become active during image formation,

strongly reinforces the existence of a distinct functional visual spatial code.

So the conclusion, which Farah synthesized back in 1988, is vital.

Yes.

Imagery is not necessarily visual in its sensory input.

We know this from the congenitally blind participants.

Rather, it is visual in the sense that it uses the same neural representational machinery as vision.

It's the visual hardware being repurposed for a new job.

Exactly.

Let's broaden our view now.

Beyond the simple mental image to the system that uses it to navigate the world.

Spatial cognition.

This is defined as the whole process of how people acquire, store, and use mental representations of spatial entities to get around.

So this is the system that allows you to point accurately toward location.

The building where you first took your cognitive psychology exam, even if you can't see it from where you are.

And the knowledge you're accessing is stored in this complex internal structure, what Tversky prefers to call a cognitive collage rather than a perfect cognitive map.

Why a collage?

Because it reflects the systematic distortions we've been talking about.

It's pieced together.

It's not a perfect seamless photograph.

Tversky also emphasized that we don't process space uniformly.

Our brain uses different organizations and reference frames, depending on the scale of the space involved.

She distinguished three crucial kinds of space.

First is the space of the body.

This is the most localized knowledge, where your body parts are, how they are oriented, what they're interacting with, knowing your elbow is resting on the armrest or your left foot is slightly cramping.

And this is crucial for directing fine motor actions, like reaching for a cup or ducking under a branch.

Second is the space around the body.

This is the immediate, perceivable environment, the room you're in, the region you can easily act upon.

And we organize objects in this space based on three axes radiating from our body, front, back, up, down, and left, right.

And research has shown a surprising hierarchy in this immediate space.

A very surprising hierarchy.

When people are asked to locate or verify objects along these axes,

retrieval times for objects along the up, down axis, head, feet, are consistently the fastest.

And objects along the left -right axis are the slowest.

That's so counterintuitive.

Why would up -down be prioritized over left -right?

It suggests an inherent organizational priority.

Your up -down relations are fixed relative to gravity in your own body structure.

They are stable.

But left -right relations are highly unstable.

They're constantly changing as we turn or move.

Exactly.

Left -right requires constant mental tracking against external cues, while up -down is fixed.

So retrieving up -down information might just be computationally easier and faster for the brain.

And the third kind of space is the space of navigation.

This involves vast, large -scale spaces, cities, countries, a large campus, that are too big to be perceived from a single vantage point.

These representations have to be integrated from various memories and experiences, creating those cognitive collages.

And we use two primary perspectives here.

The route perspective relies on sequential landmarks.

Turn right at the gas station, then left at the big tree.

And the survey perspective, which is the bird's -eye view, the overall layout.

But the problem is, because this space is built from so many different pieces, the representation is highly susceptible to those systematic errors and distortions.

And this systematic distortion is beautifully illustrated by a study on the Carleton College campus.

Students who are asked to arrange cut -out shapes of buildings on a map systematically distorted reality.

They didn't remember the actual diagonal positions of the buildings at all.

No.

Instead, they mentally forced them to align along neat, orthogonal, perfect north -southeast -west lines.

This just powerfully reinforces Tversky's findings.

In the vast space of navigation, our cognitive system uses these heuristics for simplicity,

making our mental collages neater and easier to access, even if they're physically inaccurate.

We prioritize organization over raw physical fidelity.

It's a trade -off.

Ultimately, another researcher, Montillo, concluded that navigation itself involves two processes.

Locomotion, which is the physical movement over terrain.

And wayfinding, which is the planning and decision -making about where to go.

So spatial cognition is really the grand synthesis of everything we've discussed.

Perception, attention, memory, and knowledge representation, all working together to keep us oriented and moving forward effectively.

So let's try to recap.

What does this all mean?

We started with a simple act, counting cabinet doors.

And we've followed that trail all the way to the core of cognitive representation.

We saw that visual imagery is central to those powerful memory techniques, like the method of loci.

And it seems to be driven by that organizational linking described by Bauer.

We established that images are distinct from verbal processing, thanks to Brooks's interference tasks.

And we saw that they function dynamically, like pictures, capable of continuous mental rotation and imaginal scanning that's proportional to distance.

Fink's five principles, supported even by the evidence from the congenitally blind,

really solidify that image's mental picture metaphor, in terms of its functional equivalence and its spatial properties.

And yet, we acknowledge the intense controversies, particularly Polition's powerful challenge, that the results might be explained away by tacit knowledge or a simple propositional code.

But the weight of the evidence,

protect with the neuropsychological findings, showing that robust size -dependent activation in the exact visual cortex areas provides a powerful, involuntary counter -argument to those critiques.

So the conclusion, therefore, feels pretty firm.

Imagery is visual in its mechanism and its neural representation, even if it isn't always visual in its sensory origin.

And finally, we wrapped our discussion by understanding spatial cognition, learning that our navigation system operates across three distinct scales, the body, the immediate space, and the space of navigation, where we systematically distort large -scale representations to create simpler, more organized mental collages.

So here's a final provocative thought for you to explore as you go about your day.

Given the finding by Knopf and Johnson -Naird that forming a detailed visual image -like imagining how dirty a dog is can actually consume valuable cognitive capacity and impede higher -level logical reasoning, should we always encourage visualization in education or in specialized training?

That's a great question.

If a task requires high -level abstract logic or proposition manipulation,

might the effort spent on constructing a detailed visual image actually be detrimental?

We always tell people to visualize for memory, but for certain complex logical or technical tasks.

Maybe the best method is one that explicitly avoids engaging the resource -heavy visual cortex.

Right, thereby preserving that cognitive capacity, solely for processing the abstract propositions.

Something to mull over as you navigate your way through your own cognitive collages today.

Thank you so much for joining us for the Deep Dive.

We hope this has given you a helpful new perspective on how your brain pictures and maps the world around you.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Mental imagery refers to the creation of internal visual representations that closely resemble actual sensory experiences, allowing the mind to construct and manipulate mental pictures without external stimuli. The human capacity to generate these images serves as a powerful tool for memory enhancement, particularly through mnemonic strategies like the method of loci, which anchors new information to visualized spatial locations, and the pegword method, which associates items with rhyming cues to strengthen retention. Dual coding theory proposes that people encode information through both verbal and visual systems, creating multiple pathways for retrieval, while the relational organizational hypothesis emphasizes that imagery's effectiveness stems from the rich associative networks it builds among encoded items. Empirical research on mental rotation demonstrates that individuals can manipulate three dimensional objects in their minds using processes analogous to physical rotation, and imaginal scanning studies reveal that mental representations preserve metric properties such as distance and spatial layout. Finke's five principles offer a comprehensive framework for understanding how visual imagery operates through implicit encoding and achieves functional equivalence to actual perception. The debate between imagery advocates and propositional theorists centers on whether mental representations are fundamentally visual or whether all knowledge exists in an abstract, language like code, with methodological concerns such as demand characteristics potentially influencing experimental outcomes. Neuropsychological investigations using brain imaging provide empirical support for the imagery perspective by demonstrating that visualizing objects recruits the same visual cortex regions activated during genuine perception. Beyond individual imagery processes, spatial cognition extends to how people navigate and understand their environments at multiple scales, from bodily awareness to immediate surroundings to large scale navigation. Cognitive collages function as integrated spatial representations that synthesize diverse pieces of environmental information to guide movement and wayfinding. The chapter also acknowledges that mental images are subject to systematic distortions influenced by organizational heuristics and conceptual knowledge, as illustrated by ambiguous figures, highlighting that imagery is not a perfect internal photograph but rather a reconstructive and interpretive process.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 9: Visual Imagery & Spatial Thinking

Related Chapters