Chapter 13: Joint Action and 4E Cognition

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Every single day we do these incredibly complex things that we just, well, we take them completely for granted.

Oh, absolutely.

I mean, think about just getting a full mug of coffee from the kitchen counter to your desk without spilling it.

A minor miracle on most mornings.

Or even buttering a piece of toast without the whole thing just crumbling into a mess.

These are individual acts, but the coordination required is just, it's astonishing.

It really is.

And cognitive science, philosophy, they've spent centuries trying to unpack just that intricate dance of a single person acting.

Right.

But here's where it gets really interesting.

What happens when you add another person to that picture?

What happens when two people are trying to move a sofa or play a duet or even just have a conversation?

The complexity doesn't just double, it explodes.

We're moving from individual action to this huge topic of joint action.

Okay, so let's try and unpack that.

Today, we are diving deep into the absolute cutting edge of cognitive philosophy.

We're focusing on one chapter, joint action and forecognition from the Oxford Handbook of Forecognition.

And our mission here, really, is to synthesize all of this for you, the learner.

We're going to bridge these huge philosophical debates about, you know, high level intentions with the really surprising low level mechanics of how we actually interact.

The nuts and bolts.

The nuts and bolts, the perceptual motor stuff.

And the central question of this entire deep dive is, well, does joint action require a shared mind?

Do we need some kind of dedicated internal shared mental plan?

Or can coordination just sort of emerge from bodies interacting with each other in the world?

And the authors, they show their hand right away.

They argue that joint action is just way too varied, way too complex for any single theory to cover it all.

So you can't just pick a side, the in the head camp or the in the world camp.

Nope, you can't.

They push for what they call an ecumenical approach,

or maybe a better term is an e -cognitive account of joint action.

That sounds a little jargony.

What does that actually mean?

Well, the insight is pretty profound, actually.

They're saying we have to abandon that binary.

This whole idea that joint action is either mental or physical, it's not.

It exists on a spectrum.

And the cognitive account.

That sounds like it's about integration.

What exactly are we integrating?

Two very different levels of analysis.

On one hand, you've got the high level approaches from philosophy and psychology.

That's all about intentional state shared goals, beliefs, commitments, mental stuff.

And on the other hand, you have the low level perceptual motor mechanisms.

This is what experimental psychology studies.

How our bodies actually align and synchronize in real time.

So this means joint action is a graded concept.

It's not a single thing.

Exactly.

It's not a single thing.

It can't have one single explanation.

It spans everything from, you know, a highly trained surgical team all the way down to basic animal coordination.

And this cognitive account has to be big enough to cover that whole spectrum.

That's the goal.

It aims to be fully ecological.

So taking the environment and physical factors seriously and fully cognitive.

So still accounting for the mental processes that guide our behavior.

And this is where the four E's come in.

This is exactly where they come in.

As we go, you'll see how embodied, embedded, extended and inactive cognition give us the tools we need to move beyond that traditional purely mental framework.

The four E approaches are a fundamental challenge to the idea that the mind just stops at the skull.

All right.

So to get a handle on this huge debate, we have to start with the basics.

What are we even talking about?

And right away, the literature is split.

There are two competing definitions of joint action.

Yeah.

Let's start with the most intuitive one, the broad definition.

This is the one that's often favored by, say, experimental psychologists like Sebenz, Beckering and Knoblitz.

For them, it's defined really simply as coordination in space and time that results in a change in the environment.

Can you give me a simple everyday example of that broad view?

Sure.

Successfully walking past someone on a crowded street without doing that awkward little sidestep shuffle we all hate.

I know it well.

Or two drivers merging smoothly into one lane.

It's all about the observable physical outcome.

You change the environment, your position relative to the other person through coordination.

That seems pretty reasonable.

It's focused on what you can see, the behavior.

It is.

But then you get the other camp, the narrow or intentional definition.

This is pushed mainly by philosophers and developmental psychologists.

People like Tomasello, Searle, Bratman.

And for them, passing someone on the street isn't enough.

Not even close.

They would call that mere social interaction, maybe co -action at a stretch, but definitely not true joint action.

Why not?

What's the critical difference for them?

The crucial distinction is the internal mental state that's required.

For this narrow view, joint action demands that the individuals coordinate specifically in order to achieve a common goal.

They have to have a shared, explicit intention to do the action together.

So if we're moving a heavy table, our goal to get the table safely across the room is inherently shared.

Ah, so it's not just about what happens physically, but why it happens.

Is the action driven by a truly shared goal?

Or is it just a coincidence of individual goals?

Precisely.

And this isn't just some academic hair splitting.

Which definition you choose dictates everything that follows.

It determines the cognitive states you even think are worth studying.

Which leads us right to the central debate.

Let's frame the two sides.

Okay.

So if you take the narrow definition, you're committing to the high -level approach.

People often call this the intellectualist view.

Right.

And that means your research is going to focus almost exclusively on internal representational states,

shared goals, explicit commitments, what you believe about your partner's belief.

And if you take the broad definition.

Then you're on the low -level approach, the perceptual motor view.

Your research is going to focus on things like alignment, mimicry, automatic coupling, and just the basic physics of how bodies move together.

So the ultimate framing question that drives everything is this.

Are those low -level processes like motor mimicry and visual alignment enough to count as joint action?

Or does it absolutely necessarily have to involve high -level cognitive states like belief and intention and shared goals?

The rest of this deep dive is all about navigating the space between that top -down control and that bottom -up emergence.

Okay.

Let's start by anchoring ourselves in that intellectualist paradigm.

This is the traditional home of philosophy.

And it all starts with a really foundational idea in the philosophy of mind.

The causal theory of action.

Yes.

We need to spend a moment on this because it really underpins their entire approach.

So what is it?

The causal theory of action is basically how philosophers draw a line between genuine actions and mere happenings.

Or just simple behaviors.

So if you sneeze or you trip and fall down the stairs, those are things that happen to you.

They're behaviors, but I don't feel responsible for them in the same way.

Exactly.

But if you deliberately walk to the fridge to get a beer, that's an action.

You're responsible for it.

Causal theorists argue that what makes it a true action is that it is caused by your mental states beforehand.

Specifically, your beliefs and most importantly, your intentions.

So my intention to get a beer causes my body to perform all the steps to get me to the fridge.

Correct.

The mental state is the director, the force that guides the physical behavior.

The body is the instrument and the intention is the driver, and it all lives inside the skull.

The classic mind in the head view.

It is.

And so when you try to extend this logic to joint action, the philosophical path is clear.

If an individual action is caused by an individual intention, then a joint action must be caused by a shared or joint intention.

Exactly.

The whole paradigm then becomes a search for that necessary internal shared representational state that's responsible for coordinating multiple bodies at the same time.

Which brings us to the philosophical heavy hitters who tried to pin down what that shared state looks like.

Right.

First up and maybe the most influential is Michael Bratman.

He developed this idea of shared agency.

Okay.

For Bratman, a joint action happens when each person involved intends that we J where J is the joint activity like move the table.

And these intentions aren't just running in parallel.

They're interdependent.

But just being interdependent isn't enough, is it?

We both have to know that we're on the same page.

And this is the key.

Bratman introduces this incredibly demanding condition of mutual common knowledge.

Okay.

That's a term we absolutely have to unpack for the listener.

It is.

It's not just that I know you intend to move the table and you know I intend to move the table.

It's a nested potentially infinite series of knowledge states.

So to really spell this out, I don't just need to believe you know the goal.

I need to believe that you know that I know that you know the goal and on and on.

The cognitive load is immense.

You need these highly sophisticated explicit meta -representational capacities, a real mind -reading ability to keep that shared intention afloat.

And this is usually the first point of attack from the 4E side.

How could something so complex be the foundation for all joint action?

Who else is in this camp?

Well, you have Ramo Tuamala who developed a similar idea around intentions held in the we mode or we attitudes.

Again, insists on interdependence, requires that common knowledge.

Then John Searle pushed it even further.

He argued for collective intentions or we intentions.

For him, the intention itself had to be collective from the start, not just a bunch of individual intentions pointing at the same thing.

Margaret Gilbert.

She really emphasized the role of joint commitments between people, the social normative agreements that bind us together in an action.

What's so striking about all of these accounts is that they all start with the individual mind.

Even when they're talking about a joint outcome, the focus is always internal on defining that specific high -level representational state that has to exist before the action even starts.

That's the core of the intellectualist view.

And Searle's famous example really makes it crystal clear.

He says, compare two situations where the physical movements are identical.

Scenario one,

a professional dance troupe performs a synchronized run across the stage.

Scenario two, a random group of strangers are in a park.

It starts to pour rain and they all run for the same shelter.

The physical movements, the timing, the pace could be exactly the same.

But for Searle, only the dancers are performing a joint action.

Precisely.

Because their movement is guided by a we intention, that explicit shared representation that they are doing this together as part of a commitment.

The strangers are just engaging in collective behavior.

Their actions are all individually motivated.

I intend to stay dry, you intend to stay dry, and we're just adjusting for each other's presence.

Exactly.

The intellectualist firewall is the absence of that shared representational goal -based intent.

Okay, so these philosophical accounts are rich, they define the what.

But empirical science and forecognition come back with four really devastating objections about the how.

Let's start with the problem of spontaneity.

This is often called the jazz trio objection.

Right.

The intellectualist accounts, especially Bratmans, are almost entirely focused on prior intention.

Things that are pre -planned, sometimes even contractually agreed upon.

The critique is simple.

This just completely fails to explain any action that's improvised.

Like a jazz trio performing.

The perfect example.

The trio might have a high level prior intention to say, play a blues standard in C minor, but that general goal gives them absolutely no guidance for the moment to moment coordination.

The split -second decisions.

The split -second decision by the bassist to change the rhythm, or the pianist to throw in a new harmony, which the drummer just instantly picks up on.

They are coordinating on the fly.

And that general intention, let's play together, just doesn't explain the rapid fine -grained motor synchronization that's actually happening.

The philosophical theory is to...

Skeletal.

It only explains the high level goal, not the low level execution.

Which leads to the second big objection.

Over -intellectualization and exclusion.

This is about that mutual common knowledge thing again, isn't it?

It is.

If you demand that as a prerequisite, you're basically putting up a massive cognitive barrier at the entrance to joint action.

And this high demand for sophisticated mind -reading means you effectively have to rule out huge groups of agents.

Young children, for instance, who are still developing those kinds of meta -representational skills.

And almost all non -human animals.

And that just seems deeply counter -intuitive.

Are we really going to say that a mother and her toddler building a tower of blocks together were two dogs perfectly coordinating to chase a squirrel?

That's not any form of joint action.

Just because they can't form an explicit we intention with mutual common knowledge.

It feels like the framework just artificially shrinks the very thing it's trying to study.

It does.

Then there's the third objection, which we've already touched on.

The execution problem.

This is all about the mechanism gap.

Even if we grant that these complex we intentions exist, the philosophical theories are silent on how they actually translate into coordinated movement.

Tollefson and Dale called them skeletal for a reason.

How does an abstract concept like joint commitment connect to the neural machinery that controls my muscles and plans their trajectory?

It's one thing to say we intend to move this couch.

It's another thing entirely to explain how that single thought translates into two different bodies, synchronizing their muscle groups, timing their pulls perfectly, and avoiding bumping into each other.

That gap between the mind in the head and the body in the world is where the intellectualist model really starts to creak.

And the final objection.

The operationalization challenge.

Over time this philosophical approach can kind of devolve into these increasingly complex internal debates.

You get counter examples which force revisions to the theory, which leads to another counter example.

And you end up with theories that are so complex they're almost impossible to actually test in a lab.

Exactly.

To make real progress we need to move beyond just refining definitions.

We need empirical results that can help us decide between these ideas.

We need to find and manipulate the underlying mechanisms, both cognitive and motor.

We need experiments that can show us when a high level representation is really necessary and when those low level alignment mechanisms are actually enough to get the job done.

And that necessity is what drives the next move from pure philosophy to empirical psychology.

So the intellectualists gave us the goal, the intention, but they couldn't provide the mechanism.

So cognitive psychologists step in to try and bridge that gap and they propose this sort of intermediate solution, the idea of shared task representations.

Right.

This is a psychological response and it's heavily linked with researchers like Sabance and Knoblich.

They basically said, okay, shared intention might be important, but for the action to actually happen, the participants also need a concrete shared task representation.

What does that look like in practice?

What is a shared task representation?

It means I don't just represent my part of the action.

I represent the whole task, including your part, and how my actions and your actions fit together.

Okay.

So it's for prediction.

It's all about prediction and coordination.

It lets me anticipate your next move and fit my own actions seamlessly into our common goal.

Now, it's still an internal private representation in my head, but its content is about the shared dynamic structure of the task.

That already sounds a lot more plausible than mutual common knowledge.

It feels more functional.

It is, and the psychological evidence for it, for the idea that we automatically form these representations, is actually pretty compelling.

Let's start with the experiment on shared affordances by Richardson and his team back in 2007.

This is a great one, the board lifting experiment, because it connects my internal planning to my partner's actual physical body.

Yeah, it's very clever.

They had pairs of people lifting wooden planks of different lengths off a conveyor belt.

The rule was simple.

You can only lift them by touching the ends.

So for short planks, one person could grab both ends and lift it alone.

But for long planks, they'd be forced to lift it together.

Exactly.

And the critical measurement was the transition point.

At what length did the pair spontaneously switch from acting individually to coordinating a joint lift?

And what determined that switch?

It wasn't the absolute length of the board.

The transition point changed depending on the pair's mean arm span.

So a pair with longer arms would switch to joint lifting at a longer board length than a pair with shorter arms.

Precisely.

And here's the big insight.

My individual decision -making, my internal plan about whether to act or not, was automatically taking into account the physical capacities, the affordances of my partner.

I was representing our collective ability to act on the world, not just my own.

That's powerful evidence for co -representation.

But even more famous is the work on the joint Simon effect.

Yes, by Savons and her colleagues.

This really seemed to lock in the idea that co -representation is automatic.

OK, let's walk through the setup for the listener.

The classic individual Simon effect shows that if I have to press a left key for a blue stimulus,

I'll be slower to respond if there's some irrelevant detail, like an arrow on the screen that's pointing to the right.

There's a conflict.

Right.

So in the joint version, they split the task.

Person A controls the left key for blue.

Person B controls the right key for red.

Now, what happens to Person A if they see their blue stimulus?

But on Person B's side of the screen, there's an irrelevant arrow pointing right.

Well, if I'm only representing my own task, that arrow should be totally irrelevant.

It's not in my space.

It's not my problem.

And yet, the experiments consistently showed that Person A still slowed down.

They still experienced a joint Simon effect as if they were processing the conflict from their partner's stimulus.

And the interpretation was that they must be automatically co -representing the entire task, including their partner's role and their partner's potential conflicts.

It seems like the perfect empirical bridge between intention and mechanism.

But, as often happens in science, the critical counter evidence was not far behind.

Recent work, especially by Dolk and his team, has really thrown a wrench in this, questioning whether the effect is genuinely social at all.

So they argue it might be driven by something else.

They argue it might be driven by general non -social attentional saliency effects.

How do you even de -socialize an experiment when there's another person sitting right next to you?

It's a great question.

They did it systematically.

First, they showed the effect still happens, even if the person next to you is just a passive observer not doing anything.

Then they showed it still happens even if that observer leaves the room.

Wow.

The key factor seemed to be just the presence of something attention grabbing in the spot where the partner's response would have been.

And the most conclusive evidence, the one the chapter really highlights, was using inanimate objects.

That's the one.

They managed to induce a significant joint Simon effect by placing entirely non -social attention -grabbing things, specifically a Japanese waving cat and a ticking metronome next to the response buttons.

Hold on.

So the presence of a little moving cat statue could create the same effect that was previously attributed to me representing my partner's entire metal state.

Exactly.

Which suggests the conflict might just be an artifact of environmental distraction.

The really big implication here is that we might not need to posit this special automatic social co -representation process after all.

The effect can be explained more simply by appealing to general principles of perception.

And that's a huge turning point.

The failure of that purely internal representational explanation for the joint Simon effect makes the pendulum swing hard in the other direction towards the external and the environmental.

And that brings us right to the heart of 4e cognition.

So the 4e movement embodied, embedded, extended, and inactive really came about as a challenge to that traditional mind -in -the -head view.

It's the ultimate bottom -up counter -argument to the intellectualist paradigm.

It is.

The core insight of the 4e approach, for our purposes here, is that high -level cognition isn't some separate thing that just directs the body from on high.

Instead, it's something that emerges from our basic processes of perception and action.

And those processes are tied to our interaction with the environment.

Deeply and inextricably tied.

Let's be clear on the terms.

Embodied means the actual shape and capabilities of my physical body are what structure, how, and what I can think.

And embedded.

Embedded means my cognition critically relies on the surrounding environment.

I offload cognitive tasks onto it.

I use it to manage information.

Think of a navigator using charts and instruments.

So when you apply this to joint action, the suggestion is that these basic perceptual and motor processes might be totally sufficient for many kinds of coordination without ever needing those explicit high -level mental states.

So the intellectualists start with an internal representation and ask how it causes movement.

The 4E approach starts with the movement itself and asks how complex thought emerges from it.

It's a complete philosophical reversal.

Researchers like Richardson and Dale argue that the old view assumes the brain's main job is to form these internal guides.

The embodied approach flips that.

It says the environment, including other people's bodies, plays the primary causal role in shaping our behavior.

And some of the most radical 4E positions take this even further, rejecting the need for any complex internal representations at all.

This is the dynamical systems perspective.

Right.

This approach borrows its tools from physics and math, and it describes groups not as collections of individual minds but as dynamical systems.

So the elements, which are people in this case, interact according to certain rules.

And that leads to these predictable aggregate behaviors that you can see at a higher level of organization.

Like a flock of birds or a school of fish.

I mean, the flock's behavior isn't the result of one led bird calculating the intention of every single other bird.

It's an emergent pattern.

That's the perfect analogy.

And from this perspective, joint action is just a human example of that general pattern.

It's achieved through continuous mutual adjustments or what we call interpersonal synchrony.

Synchrony.

That's a huge concept here.

How effortless is this process of entrainment?

It's often completely unconscious.

It's almost unavoidable.

Our behaviors just become dynamically matched or entrained over time.

The classic finding that really grounds this idea is the experiment with two people in rocking chairs.

You put two people in rocking chairs near each other, and within minutes, their rocking will spontaneously synchronize.

And they often don't even notice it's happening.

The same thing happens with postural sway, the tiny unconscious shifts we make to keep our balance.

It synchronizes when people are interacting, especially if they're trying to solve a problem together.

So this suggests that our behavior is subject to this sort of automatic physical social pull.

If joint agency can just arise from the physical interaction of bodies in an environment, then this is the extreme anti -intellectualist position.

No shared intentions required.

But now we have to apply that ecumenical rule and critique the pure synchrony view.

Because as powerful as it is, it's also too limited to explain complex human joint action.

Yeah, I can see the problem.

If our coordination is only based on synchrony and entrainment, then we can't do specialized complementary actions.

Exactly.

If we're trying to get a couch through a narrow doorway, I need to push the top while you guide the bottom.

We have to do different things.

And that requires a functional self -other distinction.

That is the core critique.

Simple alignment, like two rocking chairs, can't explain things like role specialization or adapting when a plan fails.

Joint action often requires understanding the other person's role in relation to the goal.

And research on language seems to back this up.

Work by Fusaroli on linguistic alignment shows that we don't just indiscriminately mirror each other.

Our alignment is goal -based.

We align our language just enough to achieve whatever communicative goal we have at that moment.

Which brings us back to this need for integration.

And the chapter uses conversation as the perfect example of how these high -level goals and low -level alignment systems have to work together.

It's perfect because conversation obviously has a high -level shared goal.

We want to understand each other.

But the moment -to -moment success of it is handled by a low -level alignment system.

So if I use a certain sentence structure, you might automatically mimic that structure in your reply, which makes the whole exchange cognitively easier for both of us.

Precisely.

And it goes beyond just words and grammar.

It extends to non -verbal cues.

Richardson and Dale did this amazing study in 2005 where they showed a tight, measurable coupling of visual attention during conversation.

People's eyes move together.

Their eye movements synchronize in time when they're talking about something visual like a piece of art.

And crucially, this isn't just some weird byproduct.

It's functional.

It is.

The better their visual alignment, the better they understood each other.

They achieved their shared goal more effectively.

So the big conclusion is that complex joint action has to involve both top -down intentional goals and bottom -up alignment systems working together.

OK, so we've established the body is crucial.

The social environment is crucial.

But the most radical part of 4E, the extended and inactive bits, forces us to look even further, to consider that the relevant variables might include non -living things like the materials and institutions around us.

And this is where we bring in material engagement theory, or MET.

This is a really deep extension of the 4E idea, often used in archaeology, developed by Lamberus Melliforus.

And what's the core idea?

MET argues that the human mind isn't just embedded in its environment.

It's actively constituted by its dynamic interaction with artifacts.

So artifacts, tools, technology, writing, social media, they don't just help us think.

They actually restructure the cognitive task itself and enable entirely new ways of thinking.

The mind and the material world are a single coupled system.

Which leads to a very strong, almost provocative claim about agency.

MET argues that agency isn't just a property of humans.

It's an emergent property of the whole process that involves both humans and things.

This connects back to actor network theory, or ANT from Bruno Latour.

ANT suggests that actions are carried out by networks or assemblages that include both people and non -human actors.

This must be a really tough pill for an intellectualist to swallow.

It is, because it completely de -centers human intention.

The classic example the chapter gives is the speed bump.

When a car slows down for a speed bump, the intellectualist says, the driver intended to slow down.

But the MET or ANT approach says, no, the speed bump itself is part of the network that acts.

It's the whole network, the driver, the car, the regulations, and the physical bump that acts to slow the vehicle.

Agency is distributed.

That's a fundamental rethinking of cause and effect.

And when this gets applied to joint action, it's used to directly challenge the need for those internal task representations we were talking about earlier.

Exactly.

Think back to that board lifting task.

The representational view says, the agents had to form internal beliefs about the board's length and their partner's arm span.

The EMI approach says that's an unnecessary step.

It argues that the boards themselves, the material reality of the object, dynamically and externally modulate the interaction, completely bypassing the need for you to form an explicit belief about it.

And this can scale up to huge collective actions where the artifacts are things like institutions and technology.

That's right.

The complex coordinated action of a group like the Occupy Movement can't be explained by saying millions of individuals all formed a shared we intention with mutual common knowledge.

It's just not plausible.

So what's the alternative explanation?

Their coordinated action was made possible by the complex dynamic interaction of individuals with social media platforms, with technology, with the geography of the protest sites.

The entire system of people and artifacts is what created the Joint Agency.

If we really accept this radical anti -representational view,

it starts to completely blur that line between Searle's dancers and the strangers running from the rain.

It does because if Joint Agency emerges from this complex interaction between individuals and their environment, both social and material, then those strict intellectualist criteria about shared intention becomes secondary.

The system is already guiding the behavior, which poses the big question, how far can we really push the boundaries of cognition and agency before the concepts just break?

That blurring of boundaries leads us right into probably the most famous philosophical idea from 4E cognition, the extended mind thesis.

So now we're moving from distributed action to distributed cognition itself.

Yeah, the groundwork for this was really laid by Edwin Hutchins back in 1995 with his work on naval navigation, cognition in the wild.

He showed that these complex navigational tasks weren't being done in one person's head.

The cognition was functionally distributed across artifacts like charts and instruments and multiple people.

Which led Clark and Jolmers in 98 to formally propose the extended mind thesis.

Right, and their whole argument hangs on what they call the parity principle.

We should definitely make sure we get this definition right.

Okay, what is it?

The parity principle is deceptively simple.

It says,

if a part of the world functions as a process, which were it done in the head, we would have no hesitation in recognizing as part of the cognitive process, then that part of the world is part of the cognitive process.

So the key is functional equivalence.

If an external thing plays the same functional role as an internal mental process, then we should say it's actually part of the mind.

And they illustrated this with their famous thought experiment about Otto.

Otto has memory problems, so he carries a notebook everywhere and writes down all the important information he needs.

Okay.

When Otto wants to go to a museum, he consults his notebook to find the address.

That process, they argue, functions identically to how a healthy person, who they call Inge, consults her biological memory.

And therefore, Otto's notebook, even though it's external, is a functional extension of his memory.

It's literally part of his mind.

Now that's a single agent with an artifact.

But our focus is joint action.

We have to go from the individual to the collective.

And that's exactly what Tolfen did in 2006 in her paper, From Extended Mind to Collective Mind.

She used a variation on the Otto and Inge story.

She did.

Her thought experiment introduces Olaf, another absent -minded professor, and he relies heavily on his partner Inge to remember things for him.

His meeting schedule, directions to places, facts about his students.

He just defers to her memory completely.

So the argument is, if Otto's notebook is an extension of his mind, then Inge's memory, when it's functioning as this reliable, trusted external hard drive for Olaf,

is part of a collective system that supports Olaf's own cognition.

This collective reading is what led to the social parity principle, articulated by Ludwig in 2015.

It says, if a group collectively performs a task that we would all agree is cognitive if one person did it in their head, then the group itself is performing that cognitive process.

The group becomes the cognitive agent.

That sounds neat in theory.

But what's the prerequisite for a group to actually count as a collective mind and not just a bunch of separate minds working together?

The key is cognitive integration.

There has to be substantive, two -way interaction between the individuals.

There has to be a genuine interdependence of their cognitive functions.

It can't just be me pulling information from you.

It has to be a dynamic, coupled process where our minds are influencing each other in real time.

And the strongest empirical evidence for this kind of group -level integration comes from the study of something called Transactive Memory Systems, or TMS.

Right.

TMS was pioneered by Daniel Wegener, and it describes a socially extended memory system where individual memories work together to produce a result that's often way better than what any one person could do on their own.

And importantly, this isn't just like a shared database.

It's an interactive, dialogical process of storing and retrieving information.

Let's use the chapters example, a university subcommittee that's revising a policy.

Perfect.

So you have three members, A, B, and C.

Member A knows all the history of the policy.

Member B knows all the legal stuff.

Member C knows what the faculty consensus is.

When they meet, the collective system knows who knows what.

And retrieval happens through conversation.

A question from A prompts B to recall a legal point, which C then uses to add context.

The retrieval is fundamentally conversational.

So the system has emergent properties.

The group as a whole remembers more and functions better than just the sum of its parts.

Exactly.

And just like an individual memory, TMS goes through the same stages.

There's encoding, where the group figures out who's going to remember what.

There's storage, where the info lives in individual's heads.

And then there's retrieval, where they use dialogue to access it.

Wegener argues this is a true emergent group -level property.

This is a powerful case for the collective mind, but it runs straight into the system objection, which is raised by critics like Robert Rupert.

Rupert is a major critic of the extended mind thesis.

He argues that for something to be a genuine cognitive system, it has to meet two strict criteria.

It has to be persistent over time, and it has to be realized in a physically bounded organism.

He uses this to dismiss things like cell phones, right?

Right.

He says a cell phone is intermittent.

It's not integrated into the functional flow of my mind in the same way my own vision or language is.

And if a temporary artifact, like a phone, fails this test, then a temporary context -dependent group, like a project team, has to fail even harder.

For Rupert, group cognition just isn't persistent enough to be called a mind.

That's the critique.

But the 4E response is to say, while small task groups can meet these integration conditions, especially if they work together over time, and they look for quantitative, measurable evidence of that integration.

How on earth do you quantitatively prove the two people have become a single integrated system?

Well, Tollefson, Dale, and Paxton did it using advanced statistical techniques like recurrence analysis on pairs of people who were solving physical puzzles together.

They found that the motor systems of the two people became so intricately coupled in time, so causally linked, that they really did form a single dynamic unit.

And there was other work using principal component analysis.

Yes.

Ramenzoni and Reilly showed that when two people were engaged in a true joint action, their combined behavior could be described using fewer dimensions than when they were acting alone.

What does that mean, fewer dimensions?

It implies they weren't just two separate systems acting in parallel.

They had merged functionally.

They had formed a synergy.

They shed their individual degrees of freedom and took on specialized integrated roles.

And this kind of quantitative evidence suggests it is empirically possible to describe two people as a genuine integrated system, at least for a time.

Okay, let's shift focus now to the developmental challenge.

We talked earlier about how intellectualism, with its high demands for shared intention, risks ruling out young children, even though they clearly engage in really sophisticated social interactions.

And Michael Tomasello, who's a leading developmental psychologist, he tackles this problem head on, but he does it while remaining firmly in the intellectualist camp.

So he wants to keep the high bar.

He does.

He argues that joint action, the kind that requires true joint goals, is a uniquely human thing.

And his work is all about showing that even very young children have this capacity.

To make that case, though, he first has to draw this line between human joint action and what he calls co -action in animals.

Right.

He uses the great example of chimpanzees hunting monkeys.

The chimps work together.

They surround the monkey.

They block its escape.

On the surface, it looks like a perfectly coordinated joint action.

But Tomasello says we should use a leaner interpretation of what's going on in their heads.

That's right.

He insists that each chimpanzee is ultimately just hunting for itself.

It's taking the behavior of the other chimps into account, sure, but only to maximize its own individual chance of getting the monkey.

So they're coordinating movements, but they don't have a truly joint goal.

For Tomasello, no, they don't.

That, for him, is co -action.

It's the same distinction Searle made between the dancers and the strangers in the rain.

Tomasello is just extending that intellectualist firewall to all non -human animals.

But he argues that young human children do cross that firewall.

He thinks they can form genuine joint goals from as early as 14 months old.

How do you prove a 14 -month -old as a joint goal?

That seems incredibly difficult.

The key evidence comes from these groundbreaking experiments by Warnican and Tomasello.

They'd get a young child to play a collaborative game with an adult, something that required coordination, like both of them having to pull a lever to get a toy.

And the crucial part was when the adult just suddenly stopped playing.

Exactly.

The adult would just arbitrarily stop participating.

And the child's reaction was what was so revealing.

What did they do?

They showed clear re -engagement behavior.

They didn't just wander off and find a new toy.

They got frustrated.

They would pull on the adult's sleeve, point at the game, try to physically drag the adult back into the task.

That strong reaction suggests they had an expectation, a kind of rudimentary shared commitment to the activity.

Yes.

They understood that an implicit social contract had been broken.

And this commitment gets much stronger around age three.

Older children in similar experiments would often prioritize keeping the joint action going over getting an immediate personal reward.

And when the task was done, they'd divide the spoils equally, showing they understood the obligations and entitlements that come from a joint commitment.

So Tomasello manages to include children, but he does it by arguing that this very sophisticated goal -based cognition is there much earlier than we thought.

He maintains that narrow intellectualist definition.

He does.

But that approach still risks over -intellectualizing the process.

It presupposes these sophisticated cognitive abilities in toddlers when maybe more practical embodied skills are all that's needed.

And that's where the inactivist approach offers a really important alternative.

The inactivist view says that social understanding is practical and embodied.

It avoids the need for complex mentalistic mind -dreading, like attributing beliefs and desires to other people.

And activists talk about intersubjectivity to explain development.

They argue our ability to understand others doesn't come from some abstract inference.

It comes from second -person embodied practices.

We directly perceive people's intentions and their movements and expressions.

And Fibich and Gallagher organize this developmentally, starting with what they call primary intersubjectivity.

Primary intersubjectivity just refers to those really basic early sensorimotor skills.

Things like immediate imitation, automatic gaze following, gesture recognition, the things that give an infant a practical, hands -on understanding of what other people are doing.

It's an embodied and embedded skill, not an intellectual one.

And from that basic, practical foundation, the more complex coordination emerges.

Exactly.

From primary intersubjectivity, you get the emergence of secondary intersubjectivity, which allows for more complex shared interactions, like intentional joint attention, where you have a shared goal to look at something together.

And the benefit of this approach?

The huge benefit is that it gives you this continuous spectrum of social interaction from the very basic to the very complex.

It avoids drawing that sharp exclusionary line between humans and animals based only on whether they have some abstract intentional capacity.

Okay, so we have surveyed this entire battleground.

We have the intellectualists on one side demanding internal intentions.

We have the psychologists looking for shared representations.

And then the 4E theorists arguing that coordination is a dynamic system of bodies and artifacts that doesn't need either one.

We've gone from the very top down to the radically bottom up.

And the authors of the chapter, true to their ecumenical approach, conclude that both sides are absolutely critical.

You simply cannot explain the full rich range of human joint action without integrating our internal cognitive capacities, our unique ability to plan and commit with all the external coordinated constraints from our social and material environment.

As they say, there's a reason two people can spontaneously learn a complicated dance, but two squirrels can't.

That ability to resonate with others, with artifacts, it requires specific high -level cognitive structures to even make the low -level entrainment possible.

Which means the future agenda for research in this area has to be completely integrative.

We need to get away from arguing about whether high -level or low -level processes are sufficient and start studying how they actually interact dynamically.

This means we have to integrate the individual and group levels of description moving smoothly between those scales.

And it involves exploring how predictive planning processes, which we know are vital for complex things like conversation work, alongside those automatic low -level alignment mechanisms we see in synchrony.

And one of the most exciting areas for this kind of integration is at the neural level, as long as it's done through a 4E lens.

That's right.

Researchers are now looking at the brain mechanisms involved in joint processes, but they're doing it in explicitly interactive, more naturalistic contexts.

This is the field of brain -to -brain coupling research, using things like hyper scanning to see how the brain activity of two interacting people synchronizes.

And the finding is that, yes, our underlying neural mechanisms can become tightly linked when we're actively coordinating.

But the chapter gives a really important warning here.

We can't repeat the intellectualist error.

This neural research must be done in an appropriately embodied and embedded context.

If you just study brain coupling with two people lying isolated in fMRI machines,

you're stripping away the very interactive and environmental reality that makes joint action possible in the first place.

You are.

So ultimately, the chapter's big message is that the study of joint action has to be integrative and driven by the data.

We have to stop trying to draw these sharp exclusionary lines between what counts as real joint action and what's just collective behavior.

Because that distinction might be obscuring the truth.

It might be.

Joint action is fundamentally a graded phenomenon.

It's a spectrum of complexity.

The most sophisticated forms, like playing in an orchestra or writing a paper together, they arise organically from less sophisticated, purely embodied forms of interaction.

The goal should be to understand the full social world as it actually works, not to predefine it based on our theoretical biases.

Wow.

This has been a monumental synthesis.

We've gone from the intellectualist demands for shared intention through the psychological evidence for shared representations to the radical bottom -up challenge from 4E cognition and the mind -bending implications of the extended collective mind.

The really critical takeaway for you, the learner, is that we have to look beyond the individual skull.

The study of joint action proves that our behavior is fundamentally shaped by this dynamic system created by our bodies, our goals, the environment, and the artifacts we use, all working together over time.

The boundaries of the self and the boundaries of cognition are so much more porous than we ever thought.

And if we really accept the premise of something like material engagement theory, that agency is distributed across people and things, that the speed bump is part of the network that slows the car, that Olaf and Inga can form a functionally integrated collective mind, then the implications go way beyond philosophy.

Which leads us to our final provocative thought.

If agency and cognition are becoming more and more distributed across these increasingly complex, integrated human technology networks,

think about autonomous cars, AI -driven medical diagnostics, collaborative robots on an assembly line.

What is the future of moral responsibility?

How do you assign blame or praise when the agent doing the acting is no longer a single bounded human being, but a dynamic socio -material system?

A fascinating question for future research and maybe for your own reflection on the nature of control and causality in our world.

Thank you so much for joining us on this deep dive into the true nature of acting together.

Until next time, keep exploring that synthesis of high -level intentions and low -level alignment processes.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Joint action emerges from the integration of high-level intentional processes and low-level sensorimotor dynamics, a tension that becomes especially visible when examined through 4E cognition frameworks that emphasize how thinking is embedded in environmental contexts, embodied through physical interaction, extended across social and material networks, and enactive in its reliance on lived engagement rather than passive representation. Traditional philosophical accounts, particularly those developed by Bratman, Searle, and Tuomela, ground joint action in representational mental states such as shared goals, we-intentions, and mutual knowledge, yet these internalist approaches face substantial challenges including the execution problem of translating abstract intentions into coordinated motor behavior, the exclusion of animals and young children from meaningful joint action, and the general charge that they over-intellectualize what are fundamentally embodied social processes. The evidence from phenomena like the Joint Simon Effect demonstrates that co-acting individuals automatically represent their partner's actions, though debate persists about whether this reflects genuinely social representation or instead emerges from non-social attentional mechanisms. Embodied and ecological perspectives propose instead that coordinated action arises bottom-up through basic mechanisms of perceptuomotor synchronization and dynamic systems that couple individuals without requiring explicit shared mental states. Material Engagement Theory and Actor-Network Theory extend this logic by distributing agency across networks of human and non-human actors, suggesting that cognition itself becomes genuinely distributed when material artifacts actively modulate interaction patterns. The extended mind thesis in social contexts raises the possibility that couples or teams form integrated cognitive systems satisfying the social parity principle, where group-level knowledge emerges as irreducibly collective rather than merely aggregate. Developmental research highlights the contrast between accounts emphasizing shared goals as distinctively human versus enactivist approaches grounded in embodied forms of primary and secondary intersubjectivity that unfold through direct perceptual and motor coupling. An ecumenical synthesis would honor both intentional and sensorimotor levels of analysis, potentially using brain-to-brain coupling measurements and other neuroscientific methods to reveal how individual minds and bodies genuinely coordinate without reducing either level to the other.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 13: Joint Action and 4E Cognition

Related Chapters