Chapter 8: Motion Perception in Vision

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement, not replace, the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome back to the Deep Dive today.

Today we are going to try to break your brain a little bit.

Just a little, but you know, in a good way, a constructive way.

We're looking at something that feels like the most obvious thing in the world.

I move my hand, I see it move, a car drives by, I see it go, it just, it feels instant.

It feels automatic.

Right.

It feels like you're just opening a shutter, your eyes, and letting the whole of reality pour in perfectly formed.

But we've got a stack of research here, and for this deep dive, we are focusing strictly on chapter eight of Sensation and Perception, sixth edition.

And it suggests that what's actually happening is, well, it's kind of a miracle.

It really is.

It's a biological and computational miracle built layer by layer.

The central mystery that caught my eye immediately is this contradiction.

Think about your eyes.

They are not cameras on some steady professional tripod.

They are twitchy.

They are constantly darting around, saccading, what, three, four times a second?

Constantly.

Even when you think you're staring perfectly still at a single point on the wall, your eyes are making these tiny jittery movements.

So if you think about it, the image hitting your retina should be a blurry, chaotic mess.

It should look like you're running down a staircase with a GoPro strapped to a jackhammer.

That's a vivid image.

Yeah.

But yes, the raw feed, the actual input, is incredibly chaotic.

But the world doesn't look chaotic.

Yeah.

It looks stable, solid, and then you flip that problem on its head, we go to the movies, we look at a screen, and nothing on that screen is actually moving.

It's just a bunch of still photographs flashing one after another.

24 of them per second, yeah.

Just static images.

But we perceive it as smooth, continuous, fluid action.

Exactly.

So the stationary world looks stable despite our moving eyes, and static images look like they're in motion even though they aren't.

It's a complete paradox.

And our mission today is to follow the chapter and unpack exactly how the biological machinery in our heads solves that paradox.

And we're going deep.

We're starting with individual neurons, the absolute basic wiring, and we're going all the way up to how a baseball player can possibly hit a 90 mile an hour fastball.

But we have to start where the text starts, with a ladybug.

The humble ladybug.

So paint the picture for us.

This is figure 8 .1 in the text.

It's a very simple concept.

Super simple.

Imagine a green leaf,

there's a ladybug sitting on the left side of that leaf.

Let's call that position A.

Okay, position A.

A moment later, let's say a few seconds pass, you look again, and the ladybug is now on the right side of the leaf.

That's position B.

And here's my first question, and it's one the book poses right away.

Why do I need a special sense for that?

I have eyes, I have a brain, I saw the bug at A, now I see the bug at B.

My brain can just say, hey, logic dictates the bug must have moved.

Why do I need dedicated motion sensors in my brain for something that simple?

And that's a perfectly reasonable question.

It's what we could call the logical inference model.

It was here, now it's there, here go, it moved.

And for a slow meandering ladybug, you could probably get away with that.

Your higher level brain can conceptually bridge that gap.

Right, it's not a hard problem to solve with logic.

But the text immediately asks you to raise the stakes.

You aren't just a person watching a ladybug on a sunny afternoon.

Let's go back in time.

You're a primordial vertebrate.

You're a lizard or a tiny early mammal.

And something is moving in the tall grass.

Could be a predator.

Or if I'm the predator, it could be a snack.

Exactly.

Now, if you have to wait to see the rustling at position A, then wait again to see it at position B.

And then your brain has to run a cognitive calculation.

OK, what's the distance between A and B?

What was the time elapsed?

Distance divided by time equals velocity by the time you get the answer.

You are either dead or you're starving because your lunch got away.

Precisely.

The math is just too slow.

Evolution couldn't rely on figuring it out or inferring change.

It had to be hardwired.

We needed to be able to detect motion as a raw, fundamental quality of the universe.

Just like we detect the color red or the pitch of a sound.

It needs to be immediate.

So you're saying motion isn't a conclusion we draw after the fact.

It's a primary sense like sight or hearing.

That's the argument the chapter makes right from the start.

And we see this in modern humans all the time.

Think about a hockey goalie.

A puck is flying at them at 100 miles an hour.

Or the example the book loves, the baseball batter.

Specifically, a batter facing a knuckleball.

Right.

A knuckleball is chaos.

It dances and dives unpredictably.

A batter has something like 400 milliseconds, less than half a second, to see the pitch, judge its path and decide to swing.

There is absolutely no time for conscious thought there.

None.

If they were using that position tracking logic, okay, the ball is at 60 feet, now the ball is at 50 feet, let me calculate the new trajectory,

they wouldn't just miss, they'd be humiliated.

They are bypassing higher thought entirely.

They're tapping into a much older, faster, more fundamental system.

A system that is literally wired into the hardware of the brain.

Correct.

And the most compelling evidence that it is hardware is that we can trick it, we can fatigue it, we can break it, just like a muscle.

And this brings us right to the biology of motion section, and my favorite part of the intro, the waterfall illusion.

The motion after effect, or MAE, it's a classic, almost like a party trick, but it reveals so much about how the brain is actually put together.

So the story goes back to 1834 with a guy named Robert Adams.

Yeah, let's set the scene.

He's in Scotland, visiting the Falls of Foyus, it's this dramatic landscape, water pouring down, and he just stands there for about 15, maybe 30 seconds, he just stares at the water crashing down.

He's locked in on that constant downward motion.

OK, so key detail here.

His eyes are fixated on one spot.

He's not tracking a piece of water down the falls.

He's staring at a point and letting the motion rush through his field of vision.

Exactly.

That's a crucial part of the setup.

Then after that time, he looks away.

He shifts his gaze to the rocks right next to the waterfall.

And these are solid, heavy, absolutely stationary stones.

But what does he see?

He sees them drifting upward, floating.

The rocks look like they're defying gravity and moving up.

I've done this with those YouTube videos, the spiral illusions.

You stare at the center of the spinning spiral for a minute, then you look at your hand and your skin looks like it's crawling or melting.

It's the exact same mechanism.

But the question is, why does that happen?

What is actually going on in the brain?

Is your brain just confused?

Is it broken?

The text calls it an opponent process, which sounds fancy, but walk us through the mechanics of what that means.

OK, so imagine your visual system is a democracy or maybe more like a parliament.

You have a bunch of neurons and they're all voting on what's happening in the world.

You have a specific set of neurons whose only job is to shout down when they see downward motion.

OK, so that's the down party.

They have one platform.

And right across the aisle, you have another set that it shouts UP.

They're the opposition party.

Normally, when you're looking at a stationary object like a rock, both parties are pretty quiet.

The down guys are firing at a low baseline rate, just kind of whispering.

The up guys are whispering in their baseline rate, too.

They effectively cancel each other out.

So the brain, the speaker of the house, sums the inputs and hears nothing.

Zero net movement.

Silence.

Stability.

But you go and stand at that waterfall for 30 seconds.

Those down neurons are screaming their heads off.

Down, down, down.

They're firing at a massive rate.

They're doing a ton of work.

They are working too hard.

And just like a muscle, they get tired.

Neural fatigue sets in.

They adapt to the constant stimulation and their firing rate slows way down.

They've essentially run out of steam.

OK, so the down team is exhausted.

They're panting in the corner.

Then you shift your eyes over to the stationary rocks.

Now, physically, there is no motion.

So logically, both teams should go back to just whispering.

But the down team is so exhausted, they can't even manage that.

They are firing below their normal resting rate.

They're basically silent.

But the up team, they're fresh as a daisy.

The up team hasn't been doing anything.

They've been resting this whole time.

So they're just whispering at their normal baseline rate.

But now that was to is louder than the exhausted silence from the down team.

And the brain just compares the two.

It hears more noise from the upside than the downside.

And it concludes, against all logic,

the rocks must be moving up.

It's an illusion created by an imbalance in the system.

That is it's kind of hilarious.

The brain is just listening to whoever is shouting the loudest at any given moment.

It's all about the relative balance of signals.

But here's where we get to the really clever detective work.

We can use this silly illusion to pinpoint exactly where in the brain this whole process is happening, because there's a big to be had.

Is this happening in the eye itself, in the retina, or is it happening deeper in the brain?

OK, so how do we figure that out?

With a very elegant trick called interocular transfer.

It sounds complex, but you can do it at home.

Listener, if you're not driving, you can try this later.

You cover your left eye completely.

You stare at the waterfall or a spiral video on your phone with just your right eye.

So I'm fatiguing the down detectors in my right eye only.

My left eye is in the dark.

Its neurons are totally fresh.

Correct.

You adapt your right eye for 30 seconds.

Then you switch.

You close your now fatigued right eye and you open your fresh left eye to look at a stationary wall.

OK, let me think this through logically.

If the fatigue is happening in the eyeball in the retina, then my left eye, which was closed, shouldn't have any tired neurons.

They should be perfectly balanced.

So I shouldn't see the illusion at all.

That would be the logical assumption, yes.

If motion was processed entirely in the retina, the effect would be locked to one eye.

But that's not what happens.

You do see the illusion.

Wait, I tire out the right eye, but my fresh rested left eye hallucinates the upward motion.

Yes, the after effect transfers from the adapted eye to the unadapted eye.

So that can only mean one thing.

The fatigue isn't happening in the eyeball.

It must be happening somewhere the two eyes have already merged their signals.

Somewhere deeper.

Somewhere after the two optic nerves have combined their information.

It's happening in the brain's cortex.

And we now know this happens in a specific region called the middle temporal area, or MT.

It's sometimes also called V5.

So just by covering one eye, we've proven that motion is fundamentally a brain thing, not an eye thing.

Precisely.

We've located the site of the adaptation.

MT is like the motion center of the visual brain.

But, and this is a big, but MT is looking at the big picture.

We need to zoom in.

How does an individual neuron, a single cell, know that something moved in the first place?

This takes us into the computing motion section.

And I'll be honest, when I first looked at the diagrams here, figure 8 .3 in the text, I got a little intimidated.

It looks like an electrical engineering schematic.

It essentially is.

It's a neural circuit.

It's called the Reichardt detector, named after Werner Reichardt.

And it's the proposed solution to a really, really difficult problem.

The problem being that a single neuron is basically looking at the world through a straw.

That's a perfect way to put it.

A neuron in the early visual system has what's called a receptive field.

It only sees a tiny, tiny patch of space.

Imagine you are that neuron.

You're watching one minuscule spot on the wall.

Suddenly, a bug appears in your spot.

Then it disappears.

Okay.

What happened?

Did the bug move to the left?

Did it move to the right?

Did it just flicker into existence and then vanish?

You have no idea.

All you know is something was here, now it's gone.

So a single receptor is useless for detecting motion.

On its own, yes.

So Reichardt realized you need to link them up.

You need a circuit.

So let's start simple.

Imagine two receptors.

We'll call them receptor A and receptor B.

They're right next to each other in the retina.

A is on the left.

B is on the right.

A bug flies past.

It triggers A.

Then a split second later, it triggers B.

Simple enough.

A then B.

But here's the catch.

How do you wire them up?

If we just connect both of those receptors to a single motion cell,

that cell will fire when the bug hits A and it will fire again when the bug hits B.

It will also fire if two different bugs happen to land on A and B at the exact same time.

Right.

It just becomes a something is happening out there detector.

It doesn't tell you anything about direction or sequence.

It's all noise.

Exactly.

So Reichardt proposed adding two clever components to the circuit to make it smart.

The first component is a delay unit.

A delay.

Just imagine the wire coming from receptor A takes a little detour.

It's a slightly longer path so it slows the signal down just a little bit.

Like putting a little traffic jam on one of the roads.

A perfect analogy.

And the second component is a multiplication cell.

Now the term is technical but I like to think of this as a very, very strict doorman at a club.

A doorman.

This doorman, let's call him cell X, has a simple rule.

He will only open the door and let a signal pass through if two guests arrive at his door at the exact same time.

If one arrives even a millisecond early and the other arrives late,

the door stays shut.

He needs perfect coincidence.

Okay, I think I see where this is going.

So walk me through the bug flying past from left to right.

The bug flies from left to right.

It hits receptor A first.

Receptor A immediately sends a signal.

I saw it.

That signal from A hits the traffic jam.

It gets put into the delay unit.

Right.

So the signal from A is now taking the slow scenic route.

Meanwhile, the bug keeps flying.

A split second later, it hits receptor B.

And receptor B sends its own signal.

And because B doesn't have a delay, that signal shoots straight to the doorman cell X.

And if the bug is moving at just the right speed.

The delayed signal from A finally coming out of its traffic jam.

And the fresh immediate signal from B arrive at the doorman's post at the exact same instant.

And the doorman says, two guests, perfect timing.

I'm opening the door,

the cell fires, and boom, the brain perceives motion.

That's the circuit.

And notice the genius of it.

What happens if the bug flies the other way from right to left?

So it hits B first.

B sends a signal straight to the doorman.

The doorman sees one guest arrive.

He waits.

Nothing.

The door stays shut.

Then a moment later, the bug hits A.

That signal gets sent.

But it goes into the delay unit, so it's even more delayed.

It rides way, way too late.

The doorman never sees them together.

The circuit is completely silent.

So this one circuit is blind to leftward motion.

It is specifically a rightward motion detector.

You've got it.

It's inherently direction selective.

And what about speed?

Because some bugs fly fast, others fly slow.

That's the other beautiful part of the model.

The delay unit determines the speed tuning.

If you have a circuit with a very short delay, it will only fire if the signals from A and B arrive very close together in time.

So that catches fast -moving bugs.

Exactly.

And a different circuit with a long delay will catch slow -moving bugs.

Your brain is just packed with millions of these right -cart detectors, all tuned to different directions.

Up, down, left, right, diagonal, and all tuned to different speeds.

It's basically a massive parallel bank of these little coincidence detectors.

That's what motion perception is at its most fundamental level.

Detecting specific spatial -temporal coincidences.

Which is brilliant, but it also seems like a system that you could hack pretty easily.

Oh, it's incredibly easy to hack.

In fact, you pay money to have it hacked every time you go to the cinema or turn on your TV.

This moves us into apparent motion.

Because, going back to our mission statement, we need to explain why movies work.

They're not real motion.

Think about what we just described with the right -cart detector.

Does that doorman, cell X, actually see the bug moving in the space between A and B?

No.

He just sits in his little booth.

He just knows that A rang the bell and then, at the right time, B rang the bell and they arrived together.

Exactly.

He doesn't care about the journey, only the arrival times.

So if I get rid of the bug entirely, and I just flash a light bulb at position A, with a split second, and then flash another light bulb at position B.

The doorman can't tell the difference.

Receptor A fires, its signal gets delayed.

Receptor B fires.

If my timing is right, the signals arrive at cell X simultaneously, it fires.

Your brain says motion, even though absolutely nothing moved between those two points.

It's a loophole.

A feature, not a bug, you might say.

And that loophole is the entire basis of all animation, all television, all movies.

Daffy Duck isn't actually running across the screen.

We are just exploiting the delay units in our own visual cortex to create an illusion.

I love that.

We built a multi -billion dollar global entertainment industry based on a biological glitch in our low -level visual processing.

We really did.

And it's not a new discovery.

The X mentions it was studied way back in 1875 by a physiologist named Sigmund Exner.

He was using electrical sparks in the dark.

He showed that if you time two sparks just right, people didn't see two separate sparks.

They saw one single spark moving through space.

And it's not just visual, right?

The text mentions that sound can influence it.

Yes.

There's a quick note on that.

If you have a visual flash that's ambiguous, and at the same time you play a little beep sound that seems to move from left to right through headphones, it can actually cause you to perceive the visual flash as moving left to right.

It shows our senses are all talking to each other.

But you know, it's not always that simple.

Sometimes the brain gets really confused.

The text brings up a really headache -inducing concept called the correspondence problem.

Yeah, this is where the simple A then B model starts to run into trouble in a more complex world.

So figure 8 .5 shows this really well.

Imagine I'm looking at a movie frame and on the screen there are two dots, one on top, one on the bottom.

Okay.

Frame one, two dots, vertically aligned.

Then frame two flashes.

And now there are two dots again, but they're on the left and right, horizontally aligned.

Now your brain has a serious problem, a correspondence problem.

Which dot from frame one corresponds to which dot in frame two?

Right.

Did the top dot move diagonally down to the right and the bottom dot move diagonally up to the right or did they both just move horizontally?

Or did the top dot move to the left and the bottom to the right?

There are multiple equally valid interpretations.

And all the little Reikart detectors in your brain are firing for all these different possibilities.

You have detectors for diagonal motion shouting diagonal and detectors for horizontal motion shouting horizontal.

It creates ambiguity.

And this leads to an even bigger, more fundamental issue called the aperture problem.

The aperture problem.

The text uses the analogy of looking at the world through a straw and it's essential to understanding this.

So this goes back to what you said about V1 neurons having tiny receptive fields.

They're all looking through their own little keyhole.

Exactly.

Let's use that blind man and the elephant analogy you mentioned earlier.

It's perfect for this.

Every neuron in your primary visual cortex V1 is a blind man.

It can only feel one tiny piece of the elephant.

One neuron is touching a tusk and shouts, it's a spear.

Another is touching the tail and says, it's a rope.

None of them can see the whole elephant.

So let's look at figure 8 .6 in the text.

We have a big sheet of paper with vertical black and white stripes on it.

Like a barcode.

And let's say the whole sheet is moving diagonally up into the right.

Okay, that's the global motion, the true motion of the object.

But imagine you cover that sheet with a piece of cardboard that has one small circular window cut in it.

You can only see the stripes moving through that one little aperture.

Because the stripes are uniform, just black and white lines.

If I'm looking through that tiny window, I can't actually tell that they're moving diagonally.

The edge of a stripe just looks like it's moving straight up.

Or maybe straight to the left.

The motion is ambiguous.

That's the aperture problem.

The local detector, our V1 neuron, is blind to the true global motion.

It only sees the component of motion that's perpendicular to the edge it's looking at.

So V1 is just this cacophony of blind men shouting confusing and often contradictory things.

It's moving up.

No, it's moving left.

I see it going down.

So who listens to all them?

Who is the king in this story who talks to all the blind men and says, relax, idiots, it's an elephant moving diagonally?

That brings us right back to our friend Area MT, the global motion detector.

So V1 computes all this confusing local fragmented motion, sends all those signals up the chain to MT.

And MT's job is to integrate it all and say, OK, given all this conflicting input, what is the most likely single coherent motion of the entire object?

Precisely.

MT integrates the local motion signals to solve the aperture problem.

And we have incredible experimental evidence for this from William Newsom and his colleagues working with monkeys.

This is one of the most famous and important experiments in all of visual neuroscience.

This is the correlated dot motion study.

I want to make sure we get this right, because the invocations are just wild.

OK, so they train monkeys to look at a screen full of moving dots.

Imagine old school TV static,

but the dots are moving randomly.

Utter chaos.

But hidden in that chaos, they can program a certain percentage of the dots to all move in the same direction.

That's the correlated motion.

So for example, 100 % correlation would mean all the dots are moving perfectly together to the right.

That would be super easy to see, like a whole sheet of paper moving.

Trivial.

But what if only 50 % are moving right, and the other 50 % are random, or 10%, or 2 %?

They found that normal monkeys, and humans for that matter, are amazing at this.

They can reliably detect the overall direction of flow, even if only 2 % or 3 % of the dots are moving together.

That's an incredibly sensitive signal to noise detector.

It is.

But then came the crucial step.

They created tiny targeted lesions.

They carefully destroyed area MT in these monkeys.

They took out the manager.

They got rid of the king.

And the ability to perceive this global motion just collapsed.

The monkeys suddenly needed 10 times as much correlation, maybe 20 % or 30 % of the dots, to see the same motion they could previously see at 2%.

But, and this is the key part, the text highlights, they could still see other things, right?

It wasn't like they went blind.

Not at all.

Their acuity for stationary patterns, for shape, was perfectly fine.

The blind men in V1 were still working.

They could still see the individual dots.

But the king was gone.

They couldn't integrate all that local motion into a single coherent picture.

And to really nail it down, to prove that MT isn't just involved, but is actually causal in motion perception, they did the reverse experiment.

They used microstimulation.

The poking the brain with an electrode method.

So they go into a healthy monkey's brain and they find a column of neurons in MT that's specifically like rightward motion.

That's their favorite thing.

Correct.

So they've got this tiny electrode sitting on a rightward neuron.

Then, on the screen, they show the monkey a display where the dots are moving slightly to the left.

So the eyes are seeing leftward motion.

The evidence from the retina is left.

Unambiguously.

But while the monkey is watching this leftward motion, they pass a tiny unnoticeable electric current through the electrode, stimulating the rightward neurons.

They're essentially telling those neurons to shout, right, right, right.

And what did the monkey report?

The monkey reported that the dots were moving.

Right, that is just, it's unbelievable.

The electrical stimulation completely overruled the visual reality coming from its own eyes.

It's like hacking the matrix directly at the source code.

Yeah.

It proves that our perception of motion isn't just a passive recording of what hits the retina.

It's a democratic vote happening in area MT.

And if you stuff the ballot box with a little bit of electricity, you can change the outcome of the election and change what the monkey perceives as real.

Speaking of when this system breaks down in a more catastrophic way, we have to talk about aconitopsia.

This is section four of the outline and it's a chilling condition.

The man who couldn't see motion.

Or in the most famous case, the 43 -year -old woman known in the literature as LM.

The text focuses on a 47 -year -old male patient, but the experience is the same.

Describe what the world looks like to someone with aconitopsia.

It's terrifying.

Imagine seeing the world not as a continuous movie, but as a series of frozen snapshots, like a bad slideshow.

There's no flow,

no continuity.

The description of pouring a cup of tea really stuck with me.

Right.

He would try to pour tea into a cup.

To him, the stream of tea didn't flow.

It looked frozen, like an icicle or a glacier hanging in midair.

He couldn't perceive it rising in the cup.

And then suddenly...

Suddenly the cup is overflowing and there's hot tea all over the table.

He couldn't see the transition.

He just sees empty cup.

Then the next snapshot is mess on the table.

He described it as being in a room where the only light is from a strobe light.

Exactly.

People would appear here, then suddenly be over there, with no sense of the movement in between.

Crossing the street was a nightmare.

He'd look, and a car would be a block away in one frame of his vision.

Then in the next frame, the car is right on top of him.

He had no way to judge its speed or approach.

And this is caused by specific damage to V5, to ARRI MT.

Yes.

It's a very rare condition, but it highlights just how absolutely critical the specific area of the brain is.

Without it, you lose the continuity of time itself.

Now, the chapter pivots here to something a bit more technical, but it's fascinating because it explains how we break camouflage.

The distinction between first -order and second -order motion.

Right.

This is a distinction that helps us understand what the visual system is actually tracking when it detects movement.

So first -order motion is what we've been talking about mostly.

A dark bug on a light leaf, a black dot on a white screen.

Exactly.

It's luminance -defined motion.

There is a clear difference in brightness or color between the object and its background.

The basic Reichart detector we discussed absolutely loves this stuff.

But nature is tricky.

Things are camouflaged.

They try not to create a clear luminance edge.

Think of a flounder on a sandy sea floor.

It can change its skin to match the sand almost perfectly.

Same average brightness, same average color.

So if it moves, a first -order detector might not see it because there's no big change in brightness moving across the receptive fields.

Exactly.

The object is effectively invisible to a simple luminance -based system.

But something is changing.

The texture is changing.

As the fish moves, the pattern of sand grains, it's made of shifts relative to the background sand grains.

This is second -order motion.

It's contrast -defined or texture -defined motion.

The text shows those diagrams, figures 8 .10 and 8 .1, where you have a background of random black and white dots and a bar of inverted dots moving across it.

Right.

If you were to measure the average brightness inside that moving bar and outside of it, it's exactly the same.

There's no dark object moving.

But the pattern, the texture is different.

That disturbance in the texture moves across the screen.

And the book argues we have a completely separate system for detecting this.

It seems we do.

The evidence for that is a classic neurological concept called double dissociation.

Which means what, exactly?

It means we can find two different types of patients.

Patient A has brain damage that makes them unable to see first -order motion.

They can't see the moving light bulb, but they can still see the moving texture pattern just fine.

Okay, so that suggests two different mechanisms.

But the double part is finding patient B with damage in a different brain area who has the opposite problem.

They can see first -order motion perfectly, but they are completely blind to second -order texture -based motion.

And finding both types of patients proves that the two abilities must be handled by separate independent neural systems.

Exactly.

It's like finding one person who can't read but can write, another who can't write but can read.

It proves those aren't the same process.

So Nature built a backup system, a camouflage breaker.

Eat or be eaten.

If you can't see the camouflage predator moving through the grass, you're lunch.

Absolutely.

Which is a perfect segue to Section 5, using motion to navigate the world and survive in it.

Because motion isn't just about watching movies or spotting bugs, it's about not walking into walls and not getting hit by a bus.

Enter J .J.

Gibson.

The book credits him with some foundational ideas here.

A giant in the field of perception.

He worked for the U .S.

Army Air Forces during World War II, trying to figure out how to better train pilots.

And landing a plane is a massive perceptual motion problem.

How do you know if you're coming in too fast or too steep?

How do you know exactly where you're going to touch down?

And Gibson's huge insight was the concept of optic flow.

Okay, break that down.

Picture yourself in the cockpit of a plane, coming in for a landing.

As you fly forward, the entire visual world seems to flow around you.

The image of the ground moves down and rushes under you.

The sky moves up.

The trees on the side of the runway move outward, away from the center.

It's like a radial expansion.

Everything is expanding from a central point.

But, and this is the key, there is one point in that entire visual field, one single point, that does not move.

It has no flow.

The focus of expansion.

The F .O .E.

The singularity.

And Gibson realized that that stationary point tells you, with perfect accuracy, exactly where you're heading.

If the F .O .E.

is on the runway numbers, you're going to land on the numbers.

If the F .O .E.

is on the control tower, you're going to have a very bad day.

A very bad day.

Gibson's insight was that pilots don't need to do complex geometry in their heads.

They don't need to calculate angles or speeds.

Their visual system gives them a simple heuristic.

Just place the stationary dot where you want to go.

It's a beautiful shortcut.

And we use similar shortcuts for judging collisions, right?

The time to collision problem.

Yes.

The text uses the example of a cricket ball.

But for our audience, let's say it's a baseball.

A ball is flying directly at your nose.

You need to know not just if it will hit you, but when to duck.

The math way, the way you do it in a physics class, would be estimate the absolute distance to the ball, estimate its absolute velocity, then divide the distance by the velocity to get the time.

Which is hopeless.

Humans are notoriously terrible at judging absolute distance and speed.

If we relied on that calculation, we get hit in the face every single time.

So what does the brain do instead?

It uses a clever variable called tau.

Tau?

Tau is based entirely on the rate of retinal expansion.

Think about it.

As a ball gets closer to your face, the image of it on your retina gets bigger.

It looms.

And it gets bigger faster and faster the closer it gets?

Exactly.

Tau is simply the ratio of the image's current size on your retina to the rate at which that size is expanding.

And it turns out this value, tau,

directly specifies the time remaining until collision.

So my brain doesn't need to know it's 10 feet away and moving at 50 feet per second.

It just needs to track this one ratio.

It just knows the rate of expansion just hit the critical limit, defigate, and now that.

Precisely.

It's an incredibly elegant solution.

You don't need to know what the object is, how big it really is in the world, or how far away it is.

You just track the looming rate.

The text notes that even pigeons and locusts use tau to avoid collisions.

It's ancient, powerful tech.

I love the idea that our brains are running these little subroutines that birds and insects perfected millions of years ago.

Now, what about identifying what is moving?

The chapter talks about biological motion.

This brings us to Gunnar Johansson's famous light walker displays.

If you've ever seen behind -the -scenes footage of actors in motion capture suits for movies, the black suits with the white ping pong balls on them, this is the origin of that entire technology.

This is figure 8 .42 in the text.

Imagine a person dressed entirely in black in a pitch -dark room.

You can't see them at all.

But you attach little lights to their major joints, their shoulders, elbows, hips, knees, ankles.

If that person's just standing still, what do I see?

You see a random -looking constellation of about a dozen dots, a meaningless clump of stars.

It doesn't look like anything.

But the instant they start to walk.

Boom.

You see a person.

Immediately.

The book says it takes less than 200 milliseconds.

Your brain connects the dots and perceives a coherent human figure walking.

And not just a person.

We can extract a surprising amount of detail from just these dots.

An incredible amount.

We can tell their gender with pretty high accuracy.

How on earth can you tell gender from a dozen moving lights?

It's about the center of motion.

Men tend to have broader shoulders, so the overall center of movement is higher up in the display.

Women tend to have a wider pelvic girdle, so you get more of a sway, and the center of motion is lower.

Our brains are exquisitely tuned to pick up on that subtle geometric difference.

We can also tell the mood, right?

Or the action they're performing, like happy walking versus sad walking, or dancing versus fighting.

And there's that great study mentioned about social cues and synchrony.

It takes two to tango.

If you show viewers two sets of these dot people, and they're meant to be, say, fighting, we recognize the action much faster if the two figures are moving in sync one parries when the other strikes.

It shows that our motion system isn't just a physics engine.

It's socially tuned.

It's looking for interaction and meaning.

Absolutely.

We are social animals.

We need to know if those dots moving in the distance are friends, foes, potential mates, or a threat.

And then, just to mess with us after showing how powerful motion perception is, the chapter introduces motion -induced blindness,

or MIB.

This is the dark side of the force.

This is a fun and slightly unnerving illusion to look up online.

If you fixate your gaze on a central target,

and in the background there's a large moving grid pattern, stationary yellow dots placed in your peripheral vision will just vanish.

They just get erased from your perception.

Your brain literally edits them out of your reality.

It's linked to attention and another phenomena called the Troxler effect.

The idea is that the motion signal from the grid is so strong and dynamic, and the signal from the stationary yellow dot is so constant and boring that the brain decides the stationary thing must be a glitch or noise and just deletes it.

That is deeply unsettling.

The brain is an aggressive editor of reality.

Which brings us to the final, and in my opinion, most mind -bending section of the chapter, section 6, eye movement.

Because this is where the editor really goes to work.

This brings us full circle, back to the stability problem we started with.

The text suggests a really simple experiment with a pencil that perfectly demonstrates the problem.

Everyone should try this right now.

Case A.

Hold your head still.

Keep your eyes looking straight ahead.

Now move a pencil across your field of view from left to right.

Okay, I'm doing that.

The result is, I see the pencil move.

Obvious.

Now, case B.

Hold the pencil perfectly still.

Now move your eyes to track from the left end of the pencil to the right end.

The result is, I do not see the pencil move.

The world seems stable, and I just feel my eyes moving across a stationary object.

But here is the kicker, and this is what you have to really internalize.

In both case A and case B, the image of the pencil swept across your retina in the exact same way.

The raw video feed, the input to the system, is identical in both scenarios.

Identical.

A moving image on the retina.

But the perception is totally different.

One is object motion, the other is self -motion.

So the brain has this fundamental task.

It has to distinguish between the two.

Before we solve how it does that, let's quickly list the types of eye movements, because I didn't realize there were so many.

There's a whole menu.

First, you have microsecades.

The jitters we talked about earlier.

Tiny involuntary movements.

They're absolutely crucial, because if your eyes were perfectly still, the world would fade to gray.

Your neurons adapt to constant stimulation and stop firing.

You need a constant jitter to refresh the screen.

The book notes, they are also useful for fine tasks, like threading a needle.

Then you have reflexive movements.

Right, like the VOR, the vestibulo -ocular reflex.

You can test this.

Stare at your thumb and turn your head from side to side.

Your eyes automatically snap in the opposite direction to stay locked on your thumb.

Or OKN, optokinetic nystagmus, that's watching scenery from a moving train.

Your eyes track a tree,

then snap back.

Track, snap, track, snap.

And finally, the voluntary ones we control.

There are three main types.

Virgins, that's crossing your eyes or uncrossing them to focus on something near or far.

Then there's smooth pursuit.

This is when you smoothly track a moving object, like your finger.

Interestingly, you can't make a smooth pursuit movement without a target.

If you just try to move your eyes smoothly across a blank wall, you can't do it.

Your eyes will make jerky little jumps.

And the last one, the big one, saccades.

The ballistic jumps.

This is what we do when we read or scan a room.

We do them constantly.

The book says three to four times a second, to the math.

That's over 172 ,000 times a day.

That number is staggering.

You're basically re -aiming your eyes 172 ,000 times every day.

It is.

And here's the truly scary part.

During each saccade, during the jump itself,

your visual system effectively goes blind for a moment.

This is saccadic suppression.

Yes.

Specifically, the magnocellular pathway, the pathway that carries most of the motion information, gets shut down.

The brain just pulls the plug for the 20 to 100 milliseconds of the jump.

Why would it do that?

To prevent catastrophic motion blur.

If you didn't shut it down, every time you moved your eyes, the entire world would smear across your vision.

It would be incredibly nauseating and disorienting.

So we are effectively blind for a huge chunk of our waking life, and we don't even know it.

We are.

The book suggests the mirror test to prove this to yourself.

Stand in front of a mirror and look at your own eyes.

Now, look from your left eye to your right eye.

Go back and forth.

You will never, ever see your eyes in motion.

Never.

You see a static image of yourself looking left, then there's a cut.

And you see a static image of yourself looking right.

The travel time, the blur, is completely edited out of the movie of your life.

Okay, so that handles the blur problem.

But that still doesn't explain the stability.

How does the brain know that the world didn't jump when my eyes did?

This brings us to the comparator.

This is the grand theory.

The solution to the stability problem, proposed by Von Helmholtz.

It involves something called the efference copy.

Or corollary discharge.

I like thinking of it as the CC line on an email.

That's a perfect analogy.

When your motor cortex decides to move your eyes, it sends an efference signal to the eye muscles.

Move 40 degrees to the left.

That's the two line on the email.

But it also sends a carbon copy, an efference copy, to a different part of the brain.

To a theoretical box we call the comparator in the visual system.

So the comparator gets a memo.

Just a heads up, the eyes are about to move 40 degrees left.

Expect a corresponding retinal shift.

Precisely.

The comparator then runs a quick simulation.

It calculates, okay, the B eyes move left by 40 degrees.

The entire visual image should shift right across the retina by 40 degrees.

It predicts the sensory consequence of the action.

Then the eyes actually move.

The retina records the massive rightward shift.

It sends that information, the retinal image signal, to the comparator.

And the comparator performs a simple subtraction.

It takes the actual retinal movement and subtracts the expected movement from the efference copy.

So if the retinal image shifted right by 40 degrees and the memo said to expect a 40 degree shift, 40 minus 40 equals zero.

The result is zero.

The brain looks at that zero and concludes.

The object is stationary.

I was the one who moved.

Cancel the motion signal.

And what if something is actually moving?

Let's say I'm tracking a moving pencil.

Okay, so your eyes are moving left to track it.

The motor command is move left.

The comparator expects a rightward retinal shift.

But because the pencil is also moving, the image stays mostly still on your retina.

So the retinal movement is zero.

So the math is zero retinal movement minus the expected rightward movement.

That leaves a non -zero answer.

It leaves a leftover signal.

The brain looks at that and includes.

The world didn't move as predicted.

Therefore, the object itself must be moving.

The text gives the ultimate undeniable proof of this.

The eye jiggle.

Yes.

Everyone, please be very careful.

But try this.

Close one eye.

Use your finger to gently, very gently push or jiggle your open eyelid from the side.

The whole world jumps around.

It looks like an earthquake.

It feels completely different from a normal eye movement.

So why?

Think about the comparator.

Okay.

My finger moved my eye.

My brain, my motor cortex didn't issue the command.

So did the motor cortex send an email?

Was there an efference copy?

No, there was no command to move.

The comparator got no memo.

So the comparator's prediction was zero movement.

But the retina, being physically jiggled, screamed massive movement.

So the mass becomes massive retinal movement minus zero expected movement equals massive perceived movement.

Exactly.

The brain has no record of you initiating the movement.

So it has no choice but to conclude that the entire world is shaking.

That is such a concrete perfect demonstration of this incredibly complex neural mechanism.

It's the entire difference between I moved and it moved.

It's the system that literally creates our sense of a stable reality.

We're constantly actively subtracting ourselves out of the perceptual equation.

The chapter wraps up with a quick look at the development of these systems and a really fun sidebar about insects.

All right, section seven, just to touch on it.

Infants aren't born with a fully mature motion perception system.

The reflexive stuff like OKN following a moving object is present at birth.

And V1 neurons have some direction sensitivity.

But that sensitivity to global motion, the democratic vote in area MT, doesn't really mature until they're three or four years old.

So toddlers are literally seeing a less cohesive, more fragmented moving world than we are.

It's likely, yes.

And things like biological motion perception take even longer to fully develop.

It's a slow build.

And finally, the praying mantis, the scientists at Workbox.

I love this study.

They asked a simple question.

How does a praying mantis, which sits perfectly still, so effectively catch fast moving prey?

And they put tiny 3D glasses on the mantises.

Basically, they glued tiny filters over their eyes that are using 3D glasses and showed them computer screens with simulated bugs.

And what did they find?

It turns out their entire visual system is tuned specifically for that ambush predator lifestyle.

Humans are actually better at seeing motion when we are also moving.

Mantises are optimized to detect very high velocity motion while they're sitting perfectly still.

Their contrast sensitivity is tuned to a totally different range of spatiotemporal frequencies than ours.

It's specialized hardware for a specialized hunter.

Exactly.

Every creature has the motion system it needs to survive his own particular niche.

So that brings us to the end of chapter eight.

It's been quite a ride from the fatigue in our retinal neurons to the doorman in the cortex to the pilot landing the plane using optic flow.

What's the big overarching takeaway from you?

For me, it's that motion is truly a primary sense.

It's not a calculation we do after the fact.

It's a raw input to the brain.

And it involves these incredibly complex, clever circuits, the Reikart detectors.

And it requires solving massive computational problems like the aperture and correspondence problems.

And it's absolutely vital for survival.

Navigation, collision avoidance, reading social cues from just a few dots of light.

But for me, the most profound thing is how actively and aggressively our brain edits our reality.

Psychatic suppression going blind 172 ,000 times a day.

So the world doesn't look like a blurry mess.

Right.

And that final provocative thought I want to leave the listener with.

Think about the comparator.

It really is something to mull over.

Our perception of a stable, solid, stationary reality isn't just a passive reflection of what is out there.

It is a constant millisecond by millisecond mathematical subtraction.

Your perceived reality is literally retinal image signal minus your own motor commands.

Well, we are subtracting ourselves out of the equation just to see the world clearly.

Beautifully put.

On that note, thank you for listening to this deep dive into visual motion perception.

It's always a pleasure.

And a warm thank you from the entire last minute lecture team.

We'll see you next time.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Motion perception represents a sophisticated visual capacity that rivals the importance of color and form detection for survival and navigation. The visual system accomplishes this through hierarchical processing that begins with elementary neural mechanisms and progresses to complex integrative functions across multiple cortical regions. At the foundation, local motion detectors employ strategies like the Reichardt model, which uses temporal delays and multiplicative interactions between adjacent photoreceptors to extract directional and speed information from visual input. These basic circuits generate familiar perceptual phenomena including apparent motion, the convincing sense of continuous movement created by sequential static images in film and digital displays, and the motion aftereffect, whereby prolonged exposure to motion in one direction causes subsequently stationary stimuli to appear moving in the opposite direction. However, individual neurons in primary visual cortex face a critical computational limitation known as the aperture problem: their restricted receptive fields capture only local motion signals, creating fundamental ambiguity about true direction and velocity. Resolution of this ambiguity requires higher-level processing in specialized regions including Area MT and the medial superior temporal cortex, which pool and integrate local motion signals to construct coherent global motion representations. Clinical observations of akinetopsia, an uncommon disorder in which patients lose the ability to perceive continuous motion despite normal static vision, reveal the critical role these regions play in motion awareness. Beyond luminance-based first-order motion detection, the system also processes second-order motion defined by changes in contrast or texture patterns, suggesting parallel processing pathways with distinct neural substrates. For navigation and spatial orientation, the visual system exploits optic flow patterns and the focus of expansion to determine heading direction, while the tau hypothesis enables estimation of collision time without requiring explicit distance calculations. Remarkably, humans extract rich semantic information from biological motion, recognizing identity, gender, and action from sparse point-light displays that contain only joint positions. Finally, the visual system maintains perceptual stability during eye movements through mechanisms including the comparator process and efference copies of motor commands, with saccadic suppression preventing the perception of blur during rapid eye jumps.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 8: Motion Perception in Vision

Related Chapters