Chapter 8: Experimental Designs: Between-Subjects Design

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Okay, let's unpack this.

We're diving deep into research methods today, really focusing on a fundamental way scientists compare different groups of people.

We've got a stack of material here guiding us, all centered around understanding the between subjects design,

our mission to pull out the essential knowledge, what it is, why researchers use it, its biggest headache, the challenges, exactly, and how they try to solve them and what you do with the data afterwards.

That's right.

And to get a feel for what this looks like in action, maybe we should start with a compelling study example.

Good idea.

The material brings up the work by Ackerman and Goldsmith back in 2011.

They compared reading on screen versus printed hard copy.

Yes, the one with college students and reading comprehension.

What was the setup there again?

So they took a sample of college students and instead of having everyone do both types of reading, they split them up.

Participants were randomly assigned to one of two groups.

One studied text on a computer screen.

The other studied the same text but as, you know, printed hard copy.

Afterwards, they tested their comprehension.

And what they found was pretty interesting, wasn't it?

Something about self -pacing.

Yeah, exactly.

When students were in control of their own study time, performance was significantly worse for those who studied on the screen.

Worse on the screen, okay.

Precisely.

And because the researchers were, you know, really careful about controlling other potential influences, making sure the only systematic difference was the reading media, they felt pretty confident concluding that the media actually caused that difference in learning performance.

So that study is a classic experiment then.

And the key thing for our discussion today is that it compared scores from different groups of participants.

That's the core idea we're digging into.

It is.

And that immediately makes it different from research where you might get scores from the same group under different conditions.

Right, like the pain and swearing study the material mentioned earlier.

Yeah.

Where everyone did both conditions.

Right.

Exactly.

Stevens and colleagues.

Well, that's a totally different approach for getting comparison data.

We'll talk about those later.

That's within -subject stuff.

Got it.

So today's deep dive is all about the between -subjects design, defined by using a separate, distinct group of individuals for each different treatment condition you're comparing.

And because each participant only gives you one score, essentially, and that score isn't influenced by what happened to someone in another group.

It's independent.

Right.

It's also commonly called an independent measures design.

Same thing, different name.

Perfect.

So today we're going to unpack its defining features.

The major strengths, the big weaknesses.

Especially the challenges.

Yeah.

Like those threats to validity, dealing with messy data from individual differences.

And then the techniques researchers use to try and grapple with all that.

And finally, where you see these designs used and the stats you need to make sense of it all.

Right.

Let's really dig into the foundation of this whole approach.

Okay.

So maybe we should quickly review the basic logic of the experimental strategy itself.

Like the material lays out.

Good idea.

You're essentially manipulating one variable, right?

To create different treatment conditions.

Then you measure a second variable, the outcome, to get scores in each of those conditions.

And then compare the scores across the treatments.

Crucially, while controlling everything else.

Trying to rule out other explanations.

You know, preventing confounding.

Right.

And the material points out when you need comparison groups for this, there are two main ways to go.

One way, which is chapter nine stuff, is the within subjects design.

Same people, all treatments.

Like that pain swearing example again.

And the other path, our focus now, is the between subjects design.

Using entirely different groups of people for each treatment.

The defining feature, like we said, separate group for each condition.

The material kind of visualizes it like you take a sample, you assign people into these separate groups.

Hopefully making them equivalent at the start.

We'll get to how.

Exactly.

Then you give each group a different treatment, and then you measure your outcome, your dependent variable.

Because each person is only in one group, you only get one score per participant, essentially.

Right.

So if you're comparing, say, three different teaching methods, and you want 30 scores for each method.

You need 90 different students.

30 for A, 30 for B, 30 for C.

Yep.

90 participants total just to get those 30 scores per group.

Wow.

Okay.

And even if you take like multiple measurements from one person within their condition,

maybe several reaction times, the material notes, you usually average those.

To get one representative sore for that person.

Right.

End result, still one score linked to one unique individual.

Okay.

Now, what are the big advantages then?

Why go to all that trouble?

Well, the main one comes directly from those independent scores we just talked about.

Since a participant only does one condition, their score isn't messed up by what they did before.

Exactly.

It's not contaminated by things like practice effects or getting tired or bored from doing multiple tasks or those weird contrast affects the material mentions.

Like that room temperature example, judging a 60 degree room.

Yeah.

It might feel cold if you just came from 70, but warm if you just came from 50.

Subjective contrast.

But in a between subjects design, each person only feels one temperature.

So that prior comparison isn't there.

Exactly.

Their judgment is cleaner, more independent.

Makes sense.

What else?

Another advantage is just versatility.

For pretty much any experiment comparing treatments, you can usually structure it as a between subjects design.

It's almost always an option.

Okay.

But there have to be downsides.

Oh, absolutely.

The immediate one you often hit is needing a lot of participants, sometimes a really large number.

Because it's one score per person.

Right.

And that can be a huge hurdle if you're studying, say, special populations.

The material mentions kids with rare learning disabilities or patients with a specific condition.

Finding enough people can be incredibly tough.

Okay.

Participant numbers.

What else?

And here's the primary disadvantage.

The one that's like the central challenge for this design.

Individual differences.

Ah.

People are just different.

Fundamentally.

The material uses that simple John and Mary example.

Different ages, genders, IQs, backgrounds, personalities.

Even how much sleep they got.

Or if they skipped breakfast.

Right.

Those differences exist before your experiment even starts.

And those pre -existing differences can cause people to get different scores on your outcome measure.

Even if your treatment has absolutely zero effect.

Precisely.

Just because they're different people.

Okay.

That sounds like a major problem.

Which I guess brings us squarely to

individual differences as confounding variables.

Exactly.

The dream scenario in between subjects design is that your groups,

group A, group B, whatever, start out as similar as possible.

In every way that matters for your study.

Right.

Except for the independent variable you're actually manipulating.

But because of individual differences, the specific people who land in group A might just happen to be different fundamentally from the people in group B.

Yes.

And here's where it gets really tricky.

If those individual differences systematically vary between the groups.

Meaning like one group accidentally ends up on average smarter.

Or older.

Or more anxious.

Yeah.

If there's a systematic imbalance like that, then those individual differences become a confounding variable.

Oh, the classic confound definition.

Some extraneous variable that systematically differentiates the groups besides your independent variable.

It gives you an alternative explanation for your results.

Totally.

Imagine like the material sketches out, you compare an older group getting treatment with a younger group getting treatment too.

Okay.

If you find a difference in scores, you're stuck.

Is it the treatment?

Or is it just that older people do differently on this task than younger people?

You can't tell.

Because age is confounded with the treatment condition.

Exactly.

And this specific problem, individual differences being unevenly spread between groups is a unique vulnerability of these between subjects designs.

And it's not just about the people themselves, right?

No.

The environment can play a role too.

Good point.

The material also flags environmental variables as potential confounds if they differ systematically between groups.

Like if group A gets tested in a nice quiet lab in the morning.

And group B gets tested in a noisy classroom in the late afternoon.

Those environmental differences could explain any performance difference totally separate from your actual treatment.

Absolutely.

So we can kind of boil down the two main sources of confounding here.

One, confounding from individual differences between the groups.

And two, confounding from environmental variables between the groups.

Which really highlights the researcher's crucial job creating equivalent groups right from the start.

The material gives three criteria for that, right?

Groups need to be...

Created equally same process for getting and assigning people.

Treated equally same experiences except for the IV.

And...

Composed of equivalent individuals.

The characteristics of the people should be as similar as possible across the groups.

That last one is the trickiest, obviously.

So how do researchers actually try to achieve that?

Making the initial groups composed of

individuals.

Okay, this brings us to the specific techniques for limiting confounding.

These are methods aimed at making those initial groups as comparable as they can be.

And the first big one is random assignment.

Randomization.

Yep, that's the most common

workhorse.

How does it actually work?

Like, practically?

You use a truly random process.

Could be flipping a coin if you only have two groups.

Yeah.

Drawing numbers out of a hat.

Using a computer's random number generator.

Okay.

The point is to decide which group each participant goes into purely by chance.

Every person has an equal probability of ending up in any condition.

And the hope or the goal is that by doing this...

All those individual characteristics, age, IQ, gender, motivation, whatever, will get distributed randomly and, you know, roughly equally across the different groups.

So no systematic differences pop up just because of who ended up where.

It's unbiased.

Theoretically, yes.

It's unbiased.

The material does mention a tweak called restricted random assignment.

Oh yeah, what's that?

It's just a way to make sure you end up with exactly the same number of participants in each group.

Like, if you need 20 per group, you might draw 20 numbers assigned to group A, then 20 for group B ensures equal group sizes, which is often desirable for stats.

Okay.

But random assignment isn't foolproof, is it?

The material points out a limitation.

Right.

Here's the rub.

Pure chance doesn't guarantee perfectly balanced groups, especially, and this is key, with small sample sizes.

You could just get unlucky.

Exactly.

Just by the luck of the draw, you could still accidentally end up with, say, more motivated people in one group than the other.

It reduces the likelihood of systematic differences, but doesn't eliminate the possibility entirely.

It's still a bit of a gamble.

Okay.

So it's good, but maybe not perfect, especially if your groups aren't huge.

What else can researchers do?

Well, sometimes they use a more deliberate approach, matching groups or matched assignment.

Matching.

So instead of leaving it all to champs.

You proactively identify specific variables that you think are really likely to influence your outcome measure and potentially confound things.

Like maybe intelligence in a learning study,

or pre -existing anxiety levels in a study on stress reduction.

Perfect examples.

So the material outlines three steps.

First, you identify the variable you want to match on.

Second, you measure that variable for all your potential participants before you assign them to groups.

Get that baseline measurement.

Right.

Third, you use that measurement to assign participants to groups, often using restricted random assignment within levels of that variable to make sure it's balanced.

So you might ensure each group gets an equal number of high IQ, medium IQ, and low IQ participants.

Or equal proportions of older and younger adults.

Precisely.

You engineer the groups so they are equivalent or matched on that specific variable or variable.

And the advantage is clear.

You've guaranteed that those specific things you matched on cannot be confounding variables.

Exactly.

They're deliberately neutralized because they're spread evenly.

But I'm guessing there are downsides here too.

Of course.

Measuring that matching variable for everyone beforehand.

That can take extra time, effort, maybe money for specific tests.

And can you match on everything?

No way.

It's really hard, sometimes practically impossible, to match effectively on more than one or two variables at the same time.

And you certainly can't identify and match on every single potential difference between people.

So you have to pick the most likely culprits.

Usually, yeah.

You typically only match on variables where there's a very strong reason to believe they could seriously confound your results if left uncontrolled.

And the material clarifies this is different from matched subjects designs, which we'll get to in another chapter.

Okay.

So random assignment matching.

What's the third strategy?

Holding variables constant or restricting the range.

This sounds more straightforward.

It is, in a way.

It's the most direct method to stop a variable from being a confounding.

You just eliminate it as a source of difference between your groups entirely.

If you're worried age might be an issue, you simply decide to only recruit participants who are, say, 18 to 20 years old.

You hold age relatively constant.

Or if gender differences could affect the outcome, you might study only females holding gender constant.

Exactly.

Or you might restrict the range, maybe only include people with IQs between 100 and 110.

You're making the groups more homogenous on that specific dimension by limiting who gets into the study in the first place.

And the benefit is?

It absolutely guarantees that the variable you held constant or restricted cannot confound your results because it doesn't vary significantly across your groups.

Okay, that sounds powerful.

But what's the catch?

There's always a catch.

A big one.

The material highlights a serious drawback.

It severely limits external validity.

Generalizability.

Right.

If you only study 19 -year -old females from one university, can you confidently say your findings apply to 40 -year -old males?

Or even 19 -year -olds from a different background?

Probably not.

So you gain internal control, but you lose the ability to generalize your findings broadly.

Precisely.

It really throws that classic tension between internal and external validity into sharp focus.

What you gain in certainty within your study, you often lose in relevance outside your study.

Okay, so let's quickly recap those three main ways to limit confounding from individual differences.

Random assignment.

Easy, unbiased, but a gamble with small n.

Matching.

More precise for specific variables, but needs pre -measurement and can't cover everything.

Holding constant restricting range.

Maximum control for that variable, but kills generalizability.

It's always a balancing act, choosing the best approach for your specific research question and constraints.

A constant balancing act.

Okay.

Now, individual differences.

Yeah.

They don't just cause problems between groups, right?

The material says they also create issues within groups.

Yeah, that's the other major headache they cause.

Beyond potentially making the groups non -equivalent, individual differences also contribute to the variability of scores within a single treatment group.

Meaning, even if everyone in group A gets the exact same treatment.

They won't all get the exact same score because they're still different people.

Different abilities, moods, attention spans, backgrounds, all that stuff still causes their scores to vary around the group average.

Okay, so that brings in the idea of variability within treatments.

Variance.

Right.

Variance is just the statistical term for how spread out the scores are within a group.

If people within the group have scores that are all over the map, you have high variance.

If their scores are tightly clustered together, low variance.

And why is high variance a problem if the average score is still different between groups?

Because high variance acts like statistical noise.

The material uses that great analogy.

It obscures the signal of any potential treatment effect.

Like static on a radio.

Kind of.

If there's a ton of random noise or variability within each group because of individual differences, then a real difference between the group averages say a 10 -point difference that could be meaningful might get completely drowned out.

It becomes invisible against the background noise.

Okay, I can see that.

The material have those figures, right?

Showing the same mean difference.

Exactly.

Figures 8 .3 and 8 .4 visually.

A 10 -point mean difference looks huge and obvious if the scores within each group are tightly packed, low variance.

But that same 10 -point difference looks like nothing.

Maybe just random fluctuation.

If the scores within each group are super spread out, high variance.

Precisely.

And this has direct statistical implications.

High variance makes it much harder to find a statistically significant difference between your groups.

Meaning, you can't confidently rule out that the difference you saw was just due to that random noise, that chance variability.

Exactly.

You can't be sure it's a real treatment effect.

So the researcher's goal here is kind of two -fold.

You want a big difference between your treatment groups.

To show your independent variable actually did something.

And you want small variance within each of those treatment groups.

To make that between -group difference clear and statistically detectable,

turn down the noise so the signal comes through.

Okay.

And in -between -subjects designs,

that variance within treatments is just the score variability within each separate group.

Yep.

Variance within groups.

So how do researchers try to turn down that noise?

Minimize that within group variance.

The material suggests a few key strategies.

First, and this is fundamental for good research anyway,

standardized procedures and the treatment setting.

Meaning?

Make absolutely sure that every single participant within a given treatment group has an identical experience.

Same instructions, read the same way,

same room conditions, same timing if possible, same experimental behavior.

Why does that help reduce variance within the group?

Because any inconsistency in how the treatment is delivered or how the data is collected can introduce extra differences between participants within that same group, artificially inflating the variability.

Keeping everything identical minimizes that extra source of noise.

So it's essential for replication too, right?

Absolutely.

If procedures aren't standardized, no one can replicate your study reliably.

Okay.

Standardize everything.

What else?

Second strategy,

limit individual differences.

And this brings us back to...

Holding variables constant or restricting their range.

Exactly.

The material points out this technique has a dual benefit.

We already said it helps create equivalent groups, reducing confounding between groups, but it also helps reduce variance within groups by creating a more homogeneous group.

Because if everyone in the group is, say, the same age and gender, then age and gender differences simply cannot contribute to the spread of scores within that group.

Less variety in the people means less random variability in their scores, ideally.

Okay.

So that technique tackles both confounding between groups and variance within groups.

Seems powerful.

It is, but remember the trade -off.

External validity.

Generalizability.

Right.

Always the trade -off.

What about random assignment and matching?

Do they help reduce the noise within the groups?

That's a common misconception.

The material is clear.

No.

Random assignment and matching are tools designed specifically to balance individual differences across the different groups.

To reduce confounding between groups.

Right.

They don't actually change the inherent mix of individuals within any single group.

You still have all that original variability from different people inside each condition.

They just help ensure that variability is, hopefully,

distributed evenly between the conditions.

So they address the between -group comparison issue, not the within -group noise issue.

Whatever.

Okay.

What about just using a really big sample size?

Doesn't that help overcome noise?

Well, statistically, yes, a larger sample size can make it easier to detect a significant difference even when variance is high, but it's often the least efficient way to go about it.

The material notes that the statistical power benefit is related to the square root of the sample size.

Meaning?

Meaning, to cut the negative impact of variance in half, you don't just double your sample size, you have to quadruple it.

To reduce the variance effect by a factor of four, you'd need 16 times the participants.

Wow.

That gets impractical fast.

Very fast.

So while large N helps, directly controlling variance by standardizing procedures or limiting individual differences, if you can tolerate the generalizability hit, is usually a much more efficient use of research resources.

Okay.

So best bets for minimizing within -group variance are standardizing everything meticulously and limiting individual differences via holding constant or restricting range.

Keeping that external validity trade -off firmly in mind.

Got it.

Okay, moving on.

Are there other threats to internal validity we need to worry about with between -subject designs beyond just those initial individual differences and environmental confounds?

Yes.

The material points out a couple more that are particularly relevant or sometimes even unique to this type of design.

Like what?

One significant one is differential attrition.

Attrition.

That's just participants dropping out, right?

Attrition is when participants withdraw from the study before it's completed.

Differential attrition is the real problem.

That's when the rate of dropout is significantly different between your experimental groups.

Or maybe the type of person dropping out is different across groups.

Exactly.

That's the key.

Why is that such a threat?

Well, you started with equivalent groups, hopefully.

Assuming you used random assignment or matching effectively, yes?

But if, say, in a difficult weight loss study, only the most highly motivated people stick with the experimental diet program while less motivated people drop out.

But in the control group, maybe dropout is lower or random across motivation levels.

Then the groups aren't equivalent anymore at the end of the study.

The experimental group is now systematically higher in motivation.

Precisely.

So if you find that group lost more weight, was it the diet?

Or was it just because the remaining participants were the super motivated ones?

The differential dropout pattern has confounded your results.

It warped your initially equivalent groups.

That makes sense.

A subtle but powerful confounded.

What else?

Another threat, and this one is pretty much unique to between subjects designs because you have physically separate groups of people, is communication between groups.

Ah, participants talking to each other.

Exactly.

The material highlights this can cause several kinds of trouble.

First, there's diffusion.

Diffusion, like spreading?

Yeah, diffusion of treatment.

It's when information about the treatment, or even elements of the treatment itself, spread from the experimental group over to the control group.

So the control group finds out about the cool new teaching method the other group is getting, and they start trying parts of it themselves.

Could be.

Or they hear about the specific strategies being taught.

What happens then?

It blurs the lines between the conditions.

The control group isn't a true no treatment control anymore.

Right.

It reduces the actual difference between your groups, making it harder to see if the treatment really had an effect.

It masks the effect.

Okay, diffusion.

What's next?

Compensatory equalization.

Equalization.

It sounds like making things equal.

Yeah.

This happens when the untreated group finds out the other group is getting something perceived as desirable or beneficial, and they demand the same treatment, or something equally good.

Ah, hey, it's not fair they get the new computers.

The material mentions that Feischbach and Singer study.

Right.

1971.

Where the boys in the group not watching violent TV demanded to watch the violent show Batman like the other group.

What happens if the researchers give in?

Well, you've just lost your control group, or at least seriously contaminated it.

Again, it masks or eliminates the true treatment effect because the groups become more similar.

Common issue in clinical or educational settings where fairness is a big concern.

Okay.

Then there's compensatory rivalry.

Rivalry.

So the untreated group hears about the special treatment the other group is getting, and instead of demanding it, they get competitive.

They think, we'll show them.

We can do just as well without that fancy treatment.

Exactly.

They work extra hard, try to perform better than they normally would, specifically to compete with the treated group.

How does that threaten validity?

It artificially improves the performance of the control group, making the apparent difference between the groups smaller than the true treatment effect actually is.

They're closing the gap through extra effort, not because the treatment doesn't work.

Makes the treatment look less effective than it really is.

Right.

And finally, the flip side of rivalry,

resentful demoralization.

Ooh, that sounds negative.

It is.

Here, the untreated group finds out about the special treatment, feels resentful or that it's unfair, and they basically just give up.

They become less motivated, less productive.

Think, why bother trying?

They're getting all the help.

Exactly.

They get demoralized because they feel disadvantaged.

And the consequence.

This makes the treated group look much better by comparison.

The difference between the groups appears larger than the treatment effect alone would justify, because the control group is performing worse than they normally would.

So it artificially inflates the perceived treatment effect.

Correct.

These communication issues, diffusion, equalization, rivalry, demoralization are tricky because they arise from very natural human social comparisons and reactions.

So how do you stop participants from talking or reacting?

Well, you often can't completely.

But the best defense, as the material suggests, is to try and keep the groups physically separated as much as possible during the study.

And if feasible and ethical, keep participants unaware that other conditions even exist, or at least unaware of the specific nature of those other conditions.

Minimize the opportunity and the motivation for comparison.

Easier said than done sometimes, I imagine.

Definitely.

Especially in field settings.

Okay.

So let's shift gears slightly.

How are these between -subjects designs actually used?

And how do researchers analyze the data they get?

Right.

The final section covers applications and statistical analyses.

The most basic application seems to be the two -group mean difference.

Yep.

The simplest version.

You have just one independent variable, and it only has two levels.

Maybe a treatment group and a control group, or two different treatments.

A single factor, two -group design.

And if your outcome, your dependent variable, is measured using numerical scores, like reaction time, test scores, ratings on a scale.

Then you calculate the average score, the mean for each of the two groups.

And the stats test to see if those two means are reliably, statistically different from each other is.

The independent measures Tata test.

The material points ahead to Chapter 15 for the computational details, but that's the tool.

What's the big plus of this simple two -group design?

It's simplicity and clarity.

It's relatively straightforward to set up, and the interpretation is pretty clear -cut.

Are the groups different on average, or aren't they?

Plus, with only two levels, you have the best chance to maximize the difference between your conditions.

How so?

You can pick levels representing opposite extremes of your independent variable, like no drug versus a high dose, or very easy task versus very difficult task.

That gives the treatment the best shot at showing an effect.

Makes sense.

But what's the main drawback of only having two groups?

It gives you very limited information about the overall relationship between your variables.

The material uses that graph idea, figure 8 .5.

Right, showing how two points can be misleading.

Exactly.

Two points might suggest a simple straight line, maybe increasing, but if you had picked two different points on the true curve, you might have seen a decrease, or maybe a curve.

You don't get the full picture, the functional relationship, with just two data points.

And sometimes you need more than just one control group.

Yeah, definitely.

In clinical research, for instance, you might want a no -treatment group and a placebo group to compare against your active treatment.

That immediately requires more than two groups.

The material also notes ethical issues sometimes mean using a standard therapy as control, not nothing.

So that naturally leads to comparing means for more than two groups, the single factor multiple group design.

Right.

You need this approach when you want to map out that full functional relationship, see the curve or pattern across several levels.

Like testing multiple drug dosages to find the optimal one.

Perfect example.

Or maybe comparing driving performance under three different phone conditions.

Handheld, hands -free,

and no phone.

That's three groups.

So if you have numerical scores again, you calculate the mean for each of your three or four or five groups.

And the statistical analysis now is the single factor analysis of variants,

ANOVA.

Specifically, the independent measures ANOVA, also detailed later.

And what does ANOVA tell you?

The initial ANOVA test tells you if there is a significant difference somewhere among all those group means.

It doesn't tell you exactly which groups differ from which other groups.

So if the ANOVA is significant, you need to do more.

Yes.

You then follow up with what are called post hoc tests.

These are specific comparisons between pairs of groups or combinations to pinpoint exactly where the significant differences lie.

Like is group A different from B?

Is B different from C?

Is A different from C?

Okay.

And the advantage here is getting that richer picture.

Definitely.

You can actually see the shape of the relationship between your independent and dependent variables.

It provides much stronger evidence for a cause and effect link if you see a systematic pattern across multiple levels of your treatment.

Is there a downside to having lots of groups?

There can be.

The material offers a word of caution if you have too many groups, especially if they represent levels that are very close together on your independent variable.

Like testing drug dosages of five milligrams, six milligrams, seven milligrams, eight milligrams.

Yeah.

The actual difference in effect between adjacent levels, like six milligrams, seven milligrams, might be tiny.

That makes it harder to find a significant difference overall.

Potentially, yes.

Having many small, non -significant steps can sometimes dilute the overall effect size and make the ANOVA less likely to come out significant.

So while multiple groups are good, you still need to choose levels that are sufficiently distinct from each other to likely show meaningful differences.

Good point.

Okay.

Last scenario.

What if your outcome measure isn't a numerical score?

What if you can't calculate means?

Right.

What if your dependent variable is measured on a nominal scale categories with names, like did someone choose option A, B, or C, or political affiliation, or maybe an ordinal scale categories with an order,

like ranking performances, low, medium, or high.

So you don't have scores to average.

You have counts,

frequencies.

Exactly.

Your data consists of frequency counts, how many people fall into each category within each of your treatment groups.

In this case, you can't use a t -test or ANOVA.

Nope.

Those rely on means and variances of scores.

So what do you use?

The appropriate statistical tool here is the Chi -square test for independence.

Again, chapter 15 stuff.

Chi -square.

And how does that work conceptually?

It compares the pattern of frequencies or the proportions of participants falling into each category across your different treatment groups, it asks.

Is the distribution of people across categories significantly different depending on which group they were in?

Like that classic Loftus and Palmer eyewitness study the material used as an example.

Perfect illustration.

They had different groups hear different verbs, smashed, hit, etc.

after seeing a film of a car accident.

And the dependent variable wasn't a speed estimate in this part it was.

It was a yes or no question asked later.

Did you see any broken glass, even though there wasn't any in the film?

So the data was just the number, the frequency of people in each verb group who said yes versus no.

Right.

The material shows it like a table for the smashed group, maybe 16 out of 50 said yes, 32 percent.

While for the hit group, only seven out of 50 said yes, 14 percent.

So the Chi -square test would compare those proportions.

Exactly.

It would tell you if that difference in proportions, 32 per fence, 40 sears, 14 percent, is statistically significant if the likelihood of saying you saw broken glass significantly depended on which verb you heard earlier.

And they found it did, right?

Conclusion, the language used influenced memory.

Yeah.

A classic demonstration using frequency data and implicitly a Chi -square type analysis shows you can definitely compare groups even when your outcome is categorical.

That's really useful to know.

It's not all about averages.

Absolutely.

Got to use the right tool for the type of data you have.

Okay, wow.

We have covered a lot of ground here.

A really deep dive into the between subjects design, all based on the service material.

We definitely have.

We started with its core definition, separate groups for each treatment and its key feature, those independent scores.

We hit the main advantage, getting those clean measurements free from carryover effects like practice or fatigue.

But then we spent a lot of time on the primary overwhelming disadvantage, individual differences.

Yeah.

How they're not just a minor nuisance, but the root cause of major headaches.

They can create confounding between the groups.

Providing alternative explanations for results.

And they also increase that unwanted variability, that statistical noise within the groups.

Making it harder to see the real treatment effect.

We explored the main techniques researchers use to try and manage those individual differences, random assignment.

Good, but imperfect, especially with small n.

Matching.

Precise for specific variables, but requires effort and can't cover everything.

And holding variables constant or restricting the range.

Great for control, but at the cost of generalizability, that constant tension.

The trade -off between internal control and external relevance.

We also talked about why minimizing that within group variants is so important, turning down the noise.

And how standardizing procedures meticulously is key.

Plus how limiting individual differences helps here too, again with that generalizability trade -off.

And we clarified that random assignment and matching don't reduce within group variants, only balance differences between groups.

Then we covered those other tricky threats, like differential attrition.

Where uneven dropout rates mess up your initially equivalent groups.

And all those ways communication between groups can sabotage a study.

Diffusion.

Compensatory equalization.

Compensatory rivalry.

Resentful demoralization.

Yeah, all those potential social dynamics that can undermine the comparison you're trying to make really highlights the human element you have to consider.

And finally, we looked at how these designs get used.

Simple two group comparisons with t -tests.

Multiple group designs using ANOVA to see the bigger picture.

And even comparing frequencies using TriSquare when your outcome is categorical.

Covers a lot of ground.

It really does.

The core challenge running through all of this, it seems, is that constant struggle with individual differences.

Trying to create equivalent groups.

Trying to minimize the noise within them.

Trying to demonstrate a clear, unambiguous treatment effect.

All while navigating the practical limitations and ethical considerations.

And those unavoidable trade -offs researchers constantly face in the real world.

Which, thinking about all those potential pitfalls,

the randomness, the dropouts, the communication issues, it really raises an important question for you, the listener, to think about.

Given how messy reality can be with all these potential influences swirling around, what level of control do you think is really necessary for a researcher to be truly confident?

Confident that the difference they observed is only because of the treatment they applied, and not something else.

And, maybe relatedly, how much are researchers, or maybe, how much should they be willing to sacrifice?

How much realism or generalizability should they give up just to achieve that high level of internal control and certainty in their findings?

That's a deep question.

Something for you to ponder.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Between-subjects designs distribute participants into separate groups, each experiencing a distinct experimental condition, creating a research structure fundamentally dependent on comparing outcomes across independent samples. The core strength of this approach rests on the independence of observations, meaning that one participant's responses remain isolated from another's, preserving the integrity of group comparisons. Yet this very independence introduces a substantial challenge: naturally occurring differences among individual participants accumulate within each group, inflating error variance and potentially masking the actual impact of the experimental manipulation. Researchers counteract this problem through deliberate control mechanisms including random assignment, which scatters idiosyncratic participant characteristics evenly across all conditions, matched assignment strategies that systematically pair individuals with equivalent baseline characteristics across different groups, and experimental protocols that constrain or eliminate specific variables unrelated to the treatment of interest. Because between-subjects designs position groups in isolation from one another, particular threats to validity emerge: differential dropout where the rates and patterns of participant attrition vary across conditions can distort group compositions and compromise comparison validity, and unintended group interactions where participants communicate about their experiences can contaminate the independence assumption and blur treatment boundaries. Researchers strengthen their ability to detect authentic treatment effects by reducing noise through procedure standardization, recruiting relatively homogeneous participant pools, and calculating sufficient sample sizes before data collection begins. Statistical inference in between-subjects studies follows different paths depending on research structure: independent-measures t-tests address two-group comparisons involving continuous dependent variables, analysis of variance handles designs with three or more groups, and chi-square procedures evaluate categorical outcome associations across separate groups. Mastering between-subjects design implementation requires understanding both its practical advantages in avoiding carryover effects and its inherent limitations regarding participant heterogeneity, along with the specific validity concerns and analytical techniques that accompany this methodological choice.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 8: Experimental Designs: Between-Subjects Design

Related Chapters