Chapter 10: Language Processing & Cognitive Structure

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome to the Deep Dive.

Today, we are immersing ourselves in, well, in the cognitive ability that really defines us as a species, but it's maybe the one we take for granted the most.

Language.

It really is.

I mean, right now, as you're listening, your brain is performing this incredible feat of computation just to decode what we're saying effortlessly.

It's this amazing paradox, isn't it?

We use it all day, every day, reading, speaking, writing, and we barely notice.

And at the same time, you have researchers building these, you know, incredibly sophisticated AI systems, and they find that just replicating the language ability of a four -year -old is this monumentally difficult goal.

Okay, so let's unpack that complexity.

Our mission for this Deep Dive is to get into the source material, a really foundational chapter on language from cognitive psychology, and just figure out how this whole system works.

From the smallest sound all the way up to the grandest idea.

And we're going to see how language is tied into, well, pretty much every other cognitive process, perception, memory, attention, how we think and reason.

Exactly.

And the system itself is so dynamic, sometimes the processing is what we call bottom up.

So it's driven purely by the data coming in, the sounds hitting your ears.

And other times it's top down.

Right.

It's influenced by your expectations, your knowledge, the context of the conversation.

And the amazing thing is most of the time, this all just happens automatically.

You don't even know what's happening.

So before we get into the structure, we have to draw a really critical line in the sand.

It's a distinction that cognitive scientists are very particular about.

And that's the difference between simple communication and true language.

They are not the same thing.

Not at all.

And the source material lays out two characteristics that are absolutely necessary for any system to be called a natural language, like English or Spanish or any other.

The first one is regularity.

Meaning it has rules.

It has to be governed by a system of rules, a grammar.

If it's not systematic, it's not a language.

Simple as that.

And the second which I think is just mind blowing is productivity.

Productivity is just.

It's this incredible ability to create and understand an infinite number of new sentences.

You can say something that has never ever been said before in human history.

And as long as it follows the rules, another speaker will understand you instantly.

It's this infinite expression from a finite set of tools, words, and sounds.

That's what really sets human language apart.

And beyond those two features, there are a couple of others that really help define it.

First one is arbitrariness.

Right.

Arbitrariness just means there's no inherent connection between a word and what it stands for.

The sounds D -O -G.

That there's nothing dog -like about them.

In Spanish, it's perro.

In French, it's chien.

The connection is purely symbolic.

A social contract.

Exactly.

And the other feature is discreteness.

Which means you can break it down.

You can subdivide it into smaller parts.

A sentence breaks into phrases, phrases into words, words into sounds.

And you can recombine those discrete parts in new ways.

Which, you know, gets you right back to productivity.

A great way to really get a handle on this is to compare it to animal communication.

Let's take the classic example from the textbook.

The honeybee dance.

Sure.

The waggle dance is, I mean, it's a brilliant form of communication.

A bee finds nectar, flies back to the hive, and does this little dance.

And that dance tells the other bees everything.

The direction, the distance, even the quality of the food source.

It does.

It conveys very specific information.

So why isn't it language?

Because it fails those key tests.

First, it fails on arbitrariness.

The dance isn't symbolic, it's iconic.

The angle of the dance literally points in the direction of the food relative to the sun.

So there's a direct physical resemblance between the sign, the dance, and the thing it's referring to.

Precisely.

And it also fails the productivity test.

Because it can only communicate one thing.

One thing.

Here's what the food is.

A bee can't come back and say, watch out for that predator by the big rock.

Or, I'm feeling a bit down today.

The system is completely closed.

It can't generate new ideas.

Okay.

So human language is this uniquely structured, rule -governed, endlessly creative system.

And it operates on four different levels at once, which is what we're going to walk through.

Right.

We have phonology, which is the sounds.

Syntax, the structure.

Semantics, which is meaning.

And pragmatics, the social use.

And the incredible thing is, to say one coherent sentence, your brain is running checks on all four of those levels simultaneously.

It's an astonishing cognitive juggling act.

Let's start our first big section then on the rule -governed structure of human language.

And we have to start with the word grammar.

We do.

And we have to be really clear about what we mean by grammar here.

Because the way linguists and cognitive psychologists use the term is very different from what you learned in English class.

Right.

This isn't about not ending a sentence with a preposition.

Exactly.

That's what we call a prescriptive rule.

A social convention.

Like saying you shouldn't use ain't.

In this context, grammar refers to descriptive rules.

The rules we actually use.

The implicit,

internalized rules that native speakers follow to produce sentences that are intelligible.

It's about what's legal in the language system, not what's considered proper or formal.

And the proof that these rules exist is that we all have this deep, implicit knowledge of them.

You do.

You can't write them all down, but you know them.

If I say ran the dog street down,

you know instantly, without thinking, that it's wrong.

It violates the rules.

I can't tell you why it's wrong.

Not in technical terms, but I feel it.

It's a violation.

And that feeling is your implicit knowledge of syntax at work.

Which brings us to another key distinction in the field.

Linguistic competence versus linguistic performance.

Right.

Competence is your underlying, idealized knowledge of the language.

The perfect set of rules stored in your mind.

And performance is what actually comes out of your mouth.

Exactly.

And performance is messy.

It's affected by everything.

If you're tired or nervous or distracted by a car alarm, you might stumble over a sentence.

You might make a grammatical error.

But that doesn't mean my underlying competence is gone.

It's just a performance error.

Precisely.

So when we study language, we're trying to get at that underlying competence by looking at performance while trying to account for all those real -world factors that can interfere.

Okay, let's start at the very bottom of the structure.

Level one.

Phonology.

The sound system.

So the base is phonetics, which is the physical study of speech sounds, how we make them with our lips, tongue, vocal cords.

But the cognitive part is the phonology.

Yes.

That's the systematic way those sounds are organized and combined in a language.

And the fundamental building block of phonology is the phoneme.

A phone is the smallest unit of sound that can change the meaning of a word.

Perfect definition.

If you change the B sound in bat to a P sound, you get pat.

A different word.

So D and P are distinct phonemes in English.

And this is crucial.

The set of phonemes is specific to each language.

Very much so.

The book gives the example of Cantonese, where the sounds L and R are not distinct phones.

They're considered variations of the same sound.

Which is why a native Cantonese speaker learning English might have trouble hearing or producing the difference between, say, rice and lice.

Exactly.

Their phonological system, their brain, has learned to treat that acoustic difference as irrelevant noise.

It's not a meaningful distinction.

So how do we categorize all these sounds we can make?

Well, the first big split is between vowels and consonants.

Vowels are pretty simple.

The airflow from your lungs is unobstructed.

The sound is just shaped by where your tongue is and the shape of your lips.

Consonants are where it gets complicated.

They all involve obstructing the airflow in some way.

Right.

And we classify them along three dimensions.

The first is place of articulation.

So where in your mouth the obstruction happens.

Exactly.

For B and P, you close your lips.

That's a bilabial place.

For S and Z, your tongue gets close to the roof of your mouth, the hard palate.

Different place, different sound.

Okay.

Second dimension.

Manner of articulation.

So how is the air being blocked?

Is it stopped completely, like in a T or D?

Does it go through your nose, like in an M?

Or is it forced through a small gap to make a hissing sound, like F?

And the third feature is voicing.

This is just whether your vocal cords are vibrating or not.

If you put your hand in your throat and say suss, you feel nothing.

That's unvoiced.

But if I said zzz.

You feel a buzz, that's voiced.

So S and Z are made in the exact same place with the exact same manner.

The only difference is that voicing feature.

The really amazing part of this, though, is that we have these complex, unconscious phonological rules for combining these foams.

The textbook example of English plurals is perfect.

It's brilliant because we all know this rule, but nobody ever taught it to us.

We just absorbed it.

So you spell the plural with an S, but how you pronounce it depends entirely on the last sound of the word.

Okay, walk us through the three parts of the rule.

Okay, scenario one.

If a word ends in a hissing or buzzing sound, like S in bus, or Z in buzz, or J in judge, you add a whole new syllable.

You say is or zed, buses,

judges.

Because you can't just stick two hissing sounds together.

It wouldn't work.

Your mouth can't do it.

So the rule forces a change.

Scenario two.

If the word ends in any other unvoiced consonant, like P -T -K -K, the plural ending is also unvoiced.

It's an S sound, so lip becomes lips.

And scenario three.

If the word ends in a voiced sound, like G in dog or any vowel,

the plural ending is also voiced.

It's a Z sound, so we say dogs.

And this is all automatic.

If I invent a new word, like a wug, you know the plural is wugs.

Instantly.

You apply the rule without thinking.

This is why languages sound so different.

It's not just that they use different phones, it's that they have entirely different rule books for how to legally combine them.

All right.

Moving up a level from sounds to words, we get to syntax.

This is the grammar of sentences, the rules for arranging words.

And the job of a syntactic theory is huge.

It has to describe every possible legal sentence in a language.

And at the same time, it has to rule out every single illegal one.

Which is an infinite set in both cases, a tall order.

The way linguists visualize the sentence structure is with tree diagrams.

They show how words don't just form a line, they group together in these functional units called constituents.

Right.

So in a sentence like, the poodle will chase the red ball.

The words the red ball all stick together.

That's a constituent.

A noun phrase or NP.

And will chase the red ball is another group, a verb phrase or VP.

And the tree diagram shows this hierarchy.

These aren't just arbitrary labels either.

They represent real psychological categories.

We know this because you can swap out one whole constituent for another of the same type.

So I can take out the NP, the poodle, and slot in another NP like my first tooth.

And you get, my first tooth will chase the red ball.

Which is semantically bizarre.

It makes no sense.

But it's syntactically perfect.

The grammar is fine.

Which is more proof that syntax and semantics are, at some level, separate systems.

One of the coolest ways to prove that these constituent structures are real is by using a linguistic operation called proposing.

This is when you move a phrase to the front of the sentence to add emphasis.

Exactly.

So if I have a sentence, I'm mad at my naughty dog, I can legally move that whole phrase and say, my naughty dog, I'm mad at.

That works because my naughty dog is a complete constituent, an NP.

But you couldn't say, naughty dog, my, I'm mad at.

The whole system just collapses.

You can only move whole complete phrases.

And this reveals hidden differences in sentences that look the same on the surface.

The book gives this great paradox with the word up.

Yes.

Okay.

Compare these two sentences.

Susan rang up Jodi,

and Aristophanes ran up the mountain.

On the surface, they look similar.

Both have verb up noun.

But something is very different structurally.

It is.

Because you can legally say, up the mountain, Aristophanes ran.

It sounds a bit poetic, but it's grammatical.

But you absolutely cannot say, up Jodi, Susan rang.

That's just word salad, why?

It's because of the deep structure.

In ran up the mountain, the phrase up the mountain is a complete constituent.

It's a prepositional phrase that describes direction.

So you can move it.

But in rang up Jodi, the words rang up function as a single unit.

The phrasal verb meaning to call.

The word up is what we call a V particle.

It's part of the verb itself.

So you can't rip a piece of the verb away from the rest of the verb and move it.

You can't.

So proposing acts like a chemical test.

It reveals the invisible underlying structure and proves that these labels, like NP and VP and V particle, are cognitively real.

It's just amazing that we manage all this with different types of rules working at once.

I mean, we have phrase structure rules, which are the basic templates.

Like, a sentence can be a noun phrase, followed by a verb phrase.

Then, lexical insertion rules for plugging in the words.

And transformational rules for moving things around, like we just saw with proposing.

And it all happens in a fraction of a second.

OK, so we have structure.

But the whole point of language is to convey meaning.

Let's move to semantics.

The study of meaning.

And for any theory of semantics to be considered adequate, it has to be able to explain some pretty tricky things about language.

Like what?

What's on the checklist?

Well, first, it has to explain anomaly.

Why a sentence like coffee ice cream can take dictation is grammatically fine, but semantically meaningless.

Right.

It violates our knowledge of what coffee ice cream is.

It also has to handle self -contradiction.

Like, my dog is not an animal.

The sentence just logically refutes itself.

Then there's ambiguity.

A huge one.

I need to go to the bank.

The financial one or the river one.

A semantic theory needs to account for how one string of words can have multiple distinct meanings.

And synonymy, how two different sentences can mean the exact same thing, like the rabbit is too young, is the same as the rabbit is not old enough.

And finally, entailment.

The idea that understanding one sentence means you automatically know something else is true.

If I say Pat is my uncle, you know, without me telling you that Pat is male, that fact is entailed.

And this gets much harder when you move from single words to whole sentences.

The textbook uses the verb exchange.

Sarah exchanged a dress for a suit.

The meaning of exchange is more than just Sarah gave a dress and got a suit.

There's a deeper semantic component.

There's an element.

Yes, that's the key.

Miller and Johnson Laird's definition includes this element of obligation.

Sarah giving the dress obligates the other person to give her the suit.

That's what makes it an exchange and not just two people giving each other gifts.

Which, again, shows how much semantics depends on syntax.

The structure, the professor failed the student, assigns rules that are completely reversed in the student failed the professor.

The meaning hinges on the grammar.

And at the deepest level, understanding the meaning of a sentence means you understand its truth conditions.

And what does that mean exactly?

It means you know what the world would have to look like for that sentence to be true.

For Sarah exchanged a dress for a suit to be true.

Sarah has to be the one doing the action.

A dress has to be what she gave and a suit has to be what she received.

If any of those conditions aren't met, the sentence is false.

So understanding meaning is really about connecting language to a verifiable state of the world.

Exactly.

It's the bridge between language and logic.

So we have structure and we have meaning.

But language doesn't happen in a vacuum.

It happens between people.

And that brings us to the top level.

Pragmatics.

The social rules of language.

All the unwritten conventions and etiquette we use to communicate successfully.

Everything from how you greet someone to how you phrase a request.

The philosopher John Searle came up with speech act theory to formalize this.

He said that when we speak, we're not just saying words, we're performing acts.

And the listener's job is to figure out what kind of act is being performed.

Searle identified five basic types.

The first is assertives.

Where you're just stating a belief, I think it's going to rain.

Right.

Then there are directives.

These are attempts to get the listener to do something.

Close the door, please.

Third are commissives.

Where you commit yourself to a future action.

I promise I'll be there.

Fourth,

expressives, which convey your psychological state.

I'm so sorry or thank you so much.

And finally, declarations.

These are the really interesting ones where the words themselves are the action.

The classic example being, I now pronounce you husband and wife.

The act of saying it makes it so.

Or a referee yelling, you're out.

And which of these acts we choose and how we phrase it is pure pragmatics.

It depends entirely on the context and the person we're talking to.

Absolutely.

The research by Gibbs in 1986 showed this beautifully.

If you want someone to close a window, you have options.

You could say, close the window, a direct order.

Or you could say, could you close the window?

A polite request.

Or you could even just say, wow, it's really cold in here.

An indirect hint.

It gives found that people are incredibly skilled at picking the right framing for the situation.

They choose the phrasing that best addresses the likely obstacle.

Like the example of asking a stranger for a pen in the library.

Right.

The biggest obstacle there is imposing on a stranger.

So people don't say, give me a pen.

They choose the phrasing that acknowledges and mitigates that imposition.

Excuse me, would you mind lending me a pen?

It's this incredibly sophisticated social calculation happening in an instant.

And this is the territory where advertisers play.

They exploit our tendency to use pragmatics to draw inferences.

The erratic cold cold pills example from the book is perfect.

It is.

The ad says, get through a whole winter without colds.

Take erratic cold pills as directed.

Notice what they don't say.

They don't say the pills cause you to not get cold.

They never make a direct causal claim.

But they put those two sentences next to each other.

Knowing that your brain, trying to be a cooperative listener, will draw the pragmatic inference that they're causally linked.

You fill in the gap for them.

And research by Harris showed that people are actually terrible at remembering later whether that link was stated explicitly or just implied.

It's a huge cognitive loophole.

We're built to assume coherence and make connections.

And that can be exploited.

It shows that comprehension isn't truly complete until you've processed the social context and all the implications.

That's the whole massive structure of language.

Now let's pivot to how this all happens in real time.

Section two, language comprehension and production.

And we have to start with the first and maybe biggest hurdle,

speech perception.

It feels completely effortless.

But cognitively, it is a nightmare of a problem.

George Miller identified the two fundamental challenges the brain has to solve.

The first one is the continuity problem.

Spoken language is a continuous stream of sound.

There are rarely clean, silent gaps between words the way there are white spaces on a page.

So when I look at a spectrogram, a visual picture of sound.

You see this continuous smear of energy.

The pauses you think you hear are often an illusion created by your brain.

Your cognitive system is actively carving up that continuous stream into what it thinks are words.

And the second problem is context dependence.

This is the idea that the same foam sounds acoustically different depending on the sounds that come before and after it.

The bee in baby is not physically the same as the bee in boondoggle.

Because your mouth is already getting ready to make the next sound.

It is.

Add to that all the variation between different speakers.

Pitch, accent, speed, and the raw acoustic signal is an absolute mess.

It seems impossible that you could reliably map that chaos onto a set of neat, discrete foams.

But we do.

So what's the solution?

How does the brain do it?

The solution is a phenomenon called categorical perception.

Okay, what's that?

It means our brain doesn't perceive a smooth continuum of sound.

Instead, it automatically and unconsciously forces the incoming sounds into discrete categories.

We learn to ignore the variations that don't matter in our language and amplify the ones that do.

The classic experiment that proved this was the voice onset time study by Lisker and Abramson.

This is a beautiful piece of research.

So voice onset time, or VO key, is just a tiny delay between when you release a consonant, like opening your lips for a B or P, and when your vocal cords start vibrating.

A short delay gives you a B sound.

A longer delay gives you a P sound.

Exactly.

So they used a synthesizer to create a series of sounds where the VOT varied smoothly, continuously, from a clear baa to a clear paw.

So physically it was a smooth gradient, like a color ramp from red to orange.

Correct.

But that is not what people heard.

They didn't hear a smooth change at all.

They heard baa, baa, baa, baa, baa.

And then suddenly, at a very specific point, around 30 milliseconds,

it flipped, and they heard baa, baa, baa, baa.

There was no in -between sound.

None whatsoever.

They perceived a sharp, categorical boundary.

The brain took this ambiguous, continuous input and forced it into one of two boxes.

It's incredibly efficient.

And this perception isn't just about the sounds themselves.

It's influenced by other things, like what we see.

Oh, massively.

The study by Massaro and Cohen show this.

They had people listen to an ambiguous sound somewhere between baa and daa.

At the same time, they watched a silent video of a person's mouth clearly saying either baa or daa.

And what people saw completely changed what they heard.

They saw lips making a daa shape.

They heard daa, even if the sound was ambiguous.

It proves that speech perception isn't just an auditory process.

It's multimodal.

And the linguistic context is even more powerful.

This brings us to the phone restoration effect.

A fantastic finding from Warren.

He took a sentence like, the state governors met with their respective legislatures.

And he cut out the sand in legislatures and replaced it with a cough.

So there was physically no s sound in the recording.

None.

But when he asked people what they heard,

almost everyone reported hearing the complete word legislatures.

They perceived the missing s.

The brain just filled it in.

He used the top -down knowledge from the rest of the sentence to restore the missing phoneme.

It heard what it expected to hear.

And a follow -up study showed that the context can actually determine what phone gets restored.

Yes.

Warren and Warren used an ambiguous sound fragment, eel, in different sentences.

When the sentence was, it was found that the eel was on the axle,

people heard wheel.

The eel was on the shoe, they heard heel.

The eel was on the orange, they heard peel.

The end of the sentence retroactively determines what you perceive at the beginning.

It's a stunning example of top -down processing.

So now let's flip around and look at production and sentence comprehension.

One of the best ways we learn about how we produce language is by studying what happens when it goes wrong, by analyzing speech errors.

Slips of the tongue.

Exactly.

And the researcher Garrett did a deep analysis of these errors and found something really insightful.

He discovered that when people substitute one word for another, the errors fall into two very distinct categories.

What were they?

The first type was based on meaning relations.

You say finger when you meant to say toe, or hot when you meant cold.

The words are semantically related.

Okay, and the second type?

Based on form relations.

The sounds are similar.

You say guest when you meant goat.

The meanings are totally unrelated, but the words sound alike.

And the critical finding was that these two types of errors almost never mix.

Right.

You rarely get a substitution that is similar in both meaning and sound.

And for Garrett, this was a huge clue about how this system is organized.

What did it suggest?

It suggests a two -stage process.

First, your brain selects the meaning, the abstract concept you want to express.

Then, in a separate later stage, it selects the phonological form, the actual word that matches that concept.

If both were happening at the same time, you'd expect to see a lot more mixed errors.

So it's like a production line, meaning first, sound second.

That's the model it points to.

Now, shifting back to comprehension, when we hear a sentence, we don't just process a string of words.

We're very sensitive to those syntactic chunks, the constituents.

Jarvella's experiment in 1971 proved this beautifully.

He had people listen to his story, then stop them and ask them to recall the last few clauses they had heard.

And he compared their memory for the clause they were currently in the middle of processing versus a clause from the sentence they had just finished.

The difference was massive.

People recalled the words from the currently active clause with 54 % accuracy.

For the just completed clause, it plummeted to 20%.

So as soon as the brain figures out the meaning of a shunk, it throws away the exact wording.

It does.

It keeps the gist, the semantic meaning, but the surface form, the exact syntactic structure, is discarded from working memory.

It's used for parsing, and then it's gone.

We're usually so good at this parsing that we don't even notice ambiguities.

But sometimes we get led astray by what are called garden path sentences.

These are so fun.

They're sentences that are grammatically correct, but are constructed to trick your parser.

The classic one is, the horse raced past the barn fell.

Ugh, yeah.

My brain just hits a wall.

You read, the horse raced past the barn.

And you assume raced is the main verb.

Your parser makes the simplest, most common assumption.

But then you hit the word fell, and you realize that can't be right.

You have to go back and reparse the sentence.

You realize it means, the horse that was raced past the barn fell.

Exactly.

That little moment of confusion.

That's the feeling of your brain having to backtrack and rebuild the syntactic structure.

It reveals the step -by -step nature of parsing.

This speed of processing brings us to a truly landmark experiment by Sweeney in 1979 on lexical ambiguity resolution.

This is where we see the brain dealing with words that have multiple meanings.

This study is so elegant.

He had people listen to a passage that contained an ambiguous word, like bug.

And the context of the sentence strongly suggested one meaning.

For example, they were talking about spiders and roaches, so it meant insect.

And while they were listening, they were also doing a second task on a computer screen.

A lexical decision task.

A word would flash on the screen, and they had to press a button as fast as they could if it was a real English word.

The idea is, if your brain has just activated a concept, you'll be faster to recognize a related word.

This is called priming.

So if you hear bug in the context of insects, you should be faster to recognize the word ant.

Exactly.

And he also tested the other inappropriate meaning with the word spy.

And what did he find?

This is the amazing part.

He tested them in two different time points.

The first was immediately after they heard the word bug.

And at that instant, he found that both meanings were primed.

People were faster to recognize the ant A and D.

They were faster to recognize spy.

Even though the context made the spy meaning totally irrelevant.

It didn't matter.

For a split second, the brain activates all possible meanings of a word.

It's an automatic bottom -up process.

But that can't last, or we'd be constantly confused.

It doesn't.

At the second time point, just a few syllables later, about 750 milliseconds, he tested again.

And by then, only the contextually appropriate meaning ant was still active.

The irrelevant spy meaning had been completely suppressed.

So it's this two -step dance.

A massive automatic activation of everything, followed by an incredibly fast context -driven selection process.

It's one of the strongest pieces of evidence we have for how language processing is both automatic and very quickly controlled.

Okay, let's zoom out from single sentences to how we understand connected text.

Reading.

The work of Justin Carpenter using eye tracking really laid the foundation here.

It did.

They came up with two core assumptions that still drive most models of reading.

The first is the immediacy assumption.

Which says that as soon as your eye lands on a word, you try to interpret it and fit it into the sentence you're building.

You don't wait until the end of the sentence.

No, it's highly incremental.

And the second assumption is the eye -mind hypothesis.

The idea that where your eye is looking is where your mind is processing.

And the longer your eye stays fixed on a word, the more mental work you're doing on it.

Precisely.

The average fixation is only about 250 milliseconds, a quarter of a second.

And they found, reliably, that we spend more time looking at content words, nouns, and verbs than little function words like the and of.

The eye gaze duration is a direct window into cognitive load.

And that cognitive load isn't just about the words themselves, but about the density of ideas in a sentence.

This is what Kinsh and Keenan called propositional complexity.

A proposition is just a basic idea unit.

And what they showed is you have two sentences that are the exact same length in words.

But if one of them contains more propositions, it will take significantly longer to read and understand.

And this affects memory, too.

It does.

They found that people were much better at remembering the propositions that were central to the story's meaning

compared to the more peripheral, less important ones.

Which tells us that our mental model of a text is hierarchical.

It's not a flat list of sentences.

The main ideas are at the top.

Absolutely.

And to build that coherent model, we have to constantly link new sentences to what we've already read.

This is where the given new strategy comes in.

The idea from Haviland and Clark that we mentally split sentences into two parts.

The given part, which is information the listener already knows, and the new part.

When you hear a sentence, your first job is to search your memory for the antecedent, the thing that the given information refers back to.

And reading gets slower when you have to work to find that connection.

When you have to make a bridging inference.

The textbook example is great.

Compare two many passages.

First one, we got some beer out of the car.

The beer was warm.

Easy.

The antecedent for the beer is right there in the previous sentence.

Right.

Now the second one, we check the picnic supplies.

The beer was warm.

Reading time for that second sentence is measurably slower.

Why?

Because your brain has to perform an extra step.

It has to make the bridging inference that picnic supplies can include beer.

It has to build that logical bridge to connect the new sentence to the old one.

And that inference takes time.

And this whole process of building a mental model relies hugely on our existing background knowledge, which we call schemata.

The power of a schema was shown so clearly in that famous Bransford and Johnson experiment with the ambiguous story.

Oh, the one about the balloons and the serenading.

That's the one.

They gave people this bizarre abstract passage, and because nobody had a schema for it, they could barely remember any of it.

They recalled about 3 .6 ideas on average.

But for another group, they showed them a simple line drawing that explained the whole situation before they read the passage.

And giving them that schema, that mental framework beforehand,

it more than doubled their recall, up to eight ideas.

And crucially, if they showed the picture after the participants read the story, it didn't help at all.

Which proves that schemata aren't just for retrieval.

You need them upfront, during encoding, to help you make sense of the information and build that coherent mental model in the first place.

This applies to everything.

If you know how to braid hair, you'll understand and remember a passage about braiding hair better than someone who doesn't.

The schema is already there.

For longer texts, like stories, that schema is called a story grammar.

It's our internalized template for how a narrative is supposed to be structured.

Right.

Thorndyke's model of story equals setting plus theme plus plot plus resolution.

And just like with propositions, the elements that are higher up in that structure, like the main theme, are remembered much better than the lower level details.

And this explains those classic findings from Bartlett back in the 1930s with the War of the Ghosts story.

It does.

His British participants didn't just forget parts of this weird Native American folktale.

They actively distorted it in their recall.

They changed things to make it fit their own Western story grammar, their own schema for how a story should work.

We've covered an immense amount of ground,

from sounds to stories.

So for our final section, let's tackle the big question, language and cognition.

How does this system we've described actually relate to thought itself?

This is one of the oldest and deepest debates in the field.

And it really spans a continuum between two big ideas.

On one end, you have the idea that language and thought are basically separate.

On the other, the idea that language fundamentally shapes our thoughts.

We should probably just get the most extreme view out of the way first.

Watson, the behaviorist, who claimed that thought was just silent subvocal speech.

Right, an idea that was pretty definitively debunked by that experiment with Smith, who was fully paralyzed by Cuirere but remained conscious.

He couldn't move a muscle, couldn't speak at all.

But when he recovered, he reported that he had been thinking, problem solving the whole time.

So thought is clearly not the same thing as inner speech.

Okay, so let's look at the more sophisticated independence argument.

Jerry Fodor's modularity hypothesis.

Fodor's idea was that the mind is not one single general purpose computer.

Instead, it's made up of a number of specialized modules.

And that language processing is one of these modules.

And a module has two key properties.

It's domain specific.

Meaning it only does one job.

The syntax module only does syntax.

It can't help you with facial recognition.

And more importantly, it's informationally encapsulated.

This is the critical part.

It means the module works automatically, like a reflex.

And it's sealed off from your general knowledge and beliefs.

Fodor's analogy was the blink reflex.

You can know with 100 % certainty that I'm not going to poke you in the eye.

But if I jab my finger towards it, you'll still blink.

Your high level knowledge can't stop the low level encapsulated resex from firing.

Exactly.

And Fodor would say that early language processing works the same way.

And the evidence for this would be Sweeney's experiment.

Precisely.

The fact that your brain activates both meanings of bug, even in a context that makes one of them ridiculous,

that's the module firing automatically encapsulated from the contextual information.

Okay, so that's the independence camp.

Now let's go to the other side of the debate.

The idea that language constrains thought.

The Horfian hypothesis of linguistic relativity.

Benjamin Whorf was a linguist who argued that the language you speak shapes and maybe even determines how you perceive and think about the world.

We dissect nature, as he put it, along the lines laid down by our language.

His evidence was things like, famously, Eskimo languages having many words for snow, or the Hopi having a different concept of time.

Right.

And the most direct way to test this idea came from studying color perception.

The argument was, if your language doesn't have a word for a color, maybe you can't perceive or remember it as well.

The key study here involved speakers of Dani, a language from Indonesia.

The Dani language only has two basic color terms.

Mili for dark, cool colors, and Mola for light, warm colors.

So if the strong Horfian hypothesis is true, a Dani speaker should have a really hard time remembering the difference between, say, a bright red and a bright green.

They should, but that's not what Eleanor Roche found.

She tested their memory for different color chips, including what are called focal colors.

The very best, most typical examples of a color, like a perfect fire engine red.

And what was the result?

The result was a huge blow to the Horfian hypothesis.

The Dani speakers, just like the English speakers, had much better memory for the focal colors than for the non -focal ones.

Even though they had no words for them?

It didn't matter.

It suggests that our concept of color is based on something more universal, probably rooted in our visual system and not constrained by our language's labels.

So the strong version of the hypothesis, that language determines thought, pretty much fell apart.

But what about a weaker version?

That language just influences thought or makes certain kinds of thinking easier.

That was tested by Bloom, who looked at counterfactuals.

You know, if X had happened, then Y would have happened, thinking about things that are contrary to fact.

He argued that Chinese lacks a simple grammatical structure for this, unlike the would have been in English.

And he initially found that Chinese speakers were much worse at this kind of reasoning.

He did.

He reported a massive difference.

But later research showed that his Chinese test materials were just badly translated.

They were awkward and unidiomatic.

When other researchers fixed the passages and made them sound natural, the performance gap between Chinese and English speakers basically disappeared.

So at the end of the day, there's just not a lot of strong evidence for the Horfian view.

It seems more likely that language reflects our thinking.

A wine expert develops a rich vocabulary for wine because they can perceive subtle differences, not the other way around.

Okay, for our final topic, let's map all of this onto the brain itself, the neuropsychological views.

We just have to pause again to appreciate the speed of this system.

We can recognize a word in 125 milliseconds.

And produce about three words second from a vocabulary of tens of thousands of words.

It's just staggering.

For a long time, the main approach to understanding the brain basis of this was localization, trying to find the language spot in the brain.

It started in the 1860s with Paul Broca.

He studied patients who had suffered strokes and found that damage to a specific area in the left frontal lobe led to problems with speech production.

They could understand language, but their speech was slow, labored, and ungrammatical.

That became known as Broca's area and the condition expressive aphasia.

A decade later, Carl Wernicke found another area in the left temporal lobe.

Damage there resulted in a different problem.

Patients could speak fluently, but their speech was often nonsensical, and they had severe trouble comprehending language.

Wernicke's area and receptive aphasia.

And these two discoveries established the principle of lateralization, the idea that, for most people, language functions are housed predominantly in the left hemisphere of the brain.

Which is confirmed clinically with things like the WADA test.

Right, where they temporarily anesthetize one hemisphere, put the left hemisphere to sleep in a right -handed person, and their language ability vanishes for a few minutes.

So when neuroimaging like Pete's scans came along, the hope was that we'd see these two areas just light up.

And to some extent, we did.

Peterson and his colleagues found that different language tasks did activate different areas.

Just passively viewing words lit up the visual cortex.

Listening to words activated the auditory cortex.

But the interesting finding was when they gave people a more complex task, like generating a verb related to a noun they saw.

Then you saw a big jump in activation in Broca's area.

This suggested Broca's area wasn't just for moving the mouth muscles, but for higher level processes of selecting and planning words.

But the neat picture of Broca's area for production, Wernicke's for comprehension, started to get a lot cloudier.

It did, because you find patients with damage to Broca's area who don't have Broca's aphasia, and vice versa.

The mapping isn't one -to -one.

So a new review proposed by people like Kaplan is that language processes are likely distributed across large neural networks.

So it's not two simple spots on the brain.

It's more like a network of interconnected regions, spanning the frontal, temporal, and parietal lobes, all working together.

A function like comprehension is likely supported by a whole network of areas, which makes the system more robust to damage.

The much more complex and probably more accurate picture.

Wow, okay.

We have covered a huge amount of territory here, from the tiny rules of phonology and syntax, up through the social conventions of pragmatics.

We talked about the core requirements for any system to be called a language.

Yeah.

Regularity and productivity.

And we dove into some of the key experiments that reveal how the system works.

Right, like the sharp boundaries in categorical perception, and Swinney's elegant study showing that initial lexical access is automatic and bottom -up.

We also tackled that huge philosophical debate between the modularity hypothesis, the idea that language is a separate, encapsulated system, and the Horfian hypothesis, which argues language shapes thought.

And we saw that the evidence tends to lean much more towards modularity.

And we finished by looking at the brain, seeing the classic models of Broca and Wernicke give way to a more modern, complex view of language being distributed across wide neural networks.

It's an incredible journey from a single sound to a full thought.

So to wrap up, here is a final provocative thought for you to take away.

We've seen this fascinating tension in the research.

On one hand, language processes seem to be functionally modular.

They're specialized, automatic, encapsulated.

But on the other hand, they seem to be physically distributed across vast interconnected networks in the brain.

So what does that tension between specialized function and distributed hardware imply for our ongoing quest to build truly human -like artificial intelligence?

It seems the more we learn about the sheer complexity of language, the more we realize what a miracle it is that we can do it at all.

Thank you for joining us for this deep dive into the cognitive psychology of language.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Language emerges as a system fundamentally distinct from animal communication through its grammatical regularity and capacity for infinite productivity, enabling humans to express novel thoughts and meanings. Understanding how minds process language requires examining four interconnected linguistic domains that structure communication at different levels. Phonology concerns the sound system, identifying phonemes as the smallest units that create meaning distinctions, while syntax reveals how words combine hierarchically into larger constituents that form grammatical sentences. Semantics addresses the mechanisms through which people extract and assign meaning to utterances, navigating situations where single words carry multiple definitions or where broader context shifts interpretation. Pragmatics examines the unwritten social rules that govern how language functions in real interaction, moving beyond grammar to consider what speakers actually intend to communicate. Speech perception involves remarkable cognitive feats, as listeners transform continuous auditory streams into discrete categories through categorical perception, and mentally reconstruct sounds obscured by noise or interruption through phoneme restoration. Comprehension of sentences and longer texts relies on strategic mental operations, including the coordination of visual attention with linguistic processing and the integration of new information with already-established context. Narrative understanding depends partly on schematic knowledge about typical story structures that help readers anticipate upcoming events and organize information coherently. Social interaction succeeds when participants follow implicit principles of cooperative discourse, maintaining informativeness, honesty, relevance, and clarity in their exchanges. Theoretical controversies about language's nature persist, particularly regarding whether language processing depends on specialized neural machinery separate from general cognition versus how linguistic categories might shape thought itself. Neuropsychological research clarifies these questions by showing how brain damage produces distinct language impairments, with injuries to specific regions disrupting either the production or comprehension of speech and with language functions predominantly concentrated in the left hemisphere across most individuals.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 10: Language Processing & Cognitive Structure

Related Chapters