Chapter 12: The Correlational Research Strategy

0:00 / 0:00
Report an issue

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement, not replace, the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

You know, you see it all the time, like, uh, working more hours often seems linked to getting less sleep.

Yeah, or that classic one about SAT scores maybe predicting college success.

Exactly.

These aren't just random thoughts, they're, well, they're everyday examples of how things seem connected, right?

They move together somehow.

Mm -hmm.

Patterns that exist out there.

And that basic idea, just seeing if things are connected and, you know, describing that connection, that's a fundamental tool for researchers.

It really is.

It's all about looking at patterns that are already there without jumping in to change anything.

Just noticing if, say,

more of A tends to show up with more or maybe less of B.

Precisely.

Just observing.

And that's pretty much our mission in this deep dive.

We're really going to get into one specific way researchers study these connections.

Yeah, we're using a chapter on the correlational research strategy.

It's from the textbook Research Methods for the Behavioral Sciences.

Think of this as your guide, really.

We want you to understand what correlation actually means in research, how it's done, what it's good for.

And crucially, what it can't tell you.

Right.

We're pulling out the key stuff from this chapter just for you.

Okay.

So the core idea, what is correlational research?

At its heart, it's not about changing things.

Not intervention.

No.

The goal isn't to figure out why something's happening.

It's simply to examine and describe the relationships that are already there, the associations between variables as they naturally occur.

So just identifying that a relationship exists and getting a handle on its nature.

What kind is it?

How strong?

Exactly that.

And because the aim is observation, the researcher doesn't manipulate anything.

No controlling variables, no interfering.

Nope.

Just measuring them as they are.

Okay, that makes sense.

You just sort of collect data on two things or maybe more for a bunch of people or animals or whatever.

Right.

You measure variable X and variable Y for each participant or, as you said, maybe each unit like a family.

You end up with pairs of scores.

The source mentioned examples like looking at kids' on -task behavior in their grades.

Each kid gives you two scores.

Or measuring food intake and activity level for lab rats.

Same idea, pairs of scores for each rat.

And wasn't there that study about GPA and Facebook time?

Yes.

Junko's study from 2015 measured students' GPA and their time on Facebook.

Two numbers for each student.

And then looked for a pattern in those pairs.

They found one.

Generally, more Facebook time went along with lower GPAs, but, and this is vital finding that Link doesn't prove Facebook caused the lower grades.

Ah, right.

Just showed they were related.

We'll definitely circle back to that point.

It's fundamental.

You also mentioned families as a unit.

Yeah, the source points that out.

The individual isn't always one person.

Think about studies on family income and kids' school performance.

Like Elstad and Bakken's work.

Exactly.

They might get income from parents, academic score from the child, but those two scores are paired for that family.

Then they look at the pattern across many families.

Okay, so measure two things as they are.

See if they go together.

Now, how does this compare to other research strategies?

We've talked about experiments before.

Good question.

It helps to place it.

Let's think about experimental research.

The goal there is totally different.

Right.

Experiments aim for cause and effect.

That's the big difference.

Experiments want to show why A affects B.

Correlational studies just show that A and B are related.

And the method's different too.

Completely.

Experiments manipulate one variable.

The independent variable create different conditions.

Central everything else.

And then compare the outcomes, the dependent variable, between groups of participants in those different conditions.

Correlational research, remember, just measures two variables within a single group and looks at the pattern between those paired scores for each individual.

No manipulation, no separate groups getting different things.

Got it.

Clear difference.

Cause and effect versus describing a relationship.

Manipulating groups versus measuring pairs in one group.

What about differential research?

The source calls it non -experimental, like correlational,

but different.

They do share something.

Neither proves causation on its own.

But the key difference is how they handle the data.

Differential research uses one variable, often something people already have, like high versus low self -esteem, or gender, to define groups.

Ah, so you start by sorting people into groups based on something.

Exactly.

Pre -existing groups.

Then you compare those groups on a second variable,

like comparing the average academic performance of the high self -esteem group versus the low self -esteem group.

So it's still comparing groups, but the groups weren't created by the researcher.

Right.

Correlational, again, takes two scores from one sample, maybe a self -esteem score, and an academic score for everyone, and looks for the relationship between those two scores across all individuals.

Less about group averages, more about whether individual scores track together.

You've got it.

Differential defines groups first, compares variable 2 between them.

Correlational measures two variables in one group, looks at the pairs.

Okay, so you've got these pairs of scores.

What next?

How do you actually look at the relationship?

Well, when you have numerical scores, things like height, weight, test scores, what we call interval or ratio scale data, you can just list the pairs.

But the best way to see it is usually a scatter plot.

Ah, the graph with all the dots.

I know those.

Exactly.

Each dot is one person or one unit.

Its position shows their score on variable X along the bottom and variable Y up the side.

And just looking at how the dots spread out gives you a feel for the relationship.

It really does.

It's a great visual tool.

But researchers also want a number to summarize that pattern.

And that's the correlation coefficient.

Usually R.

That's the one.

And that little R packs a punch.

It tells you three key things about the relationship.

Okay, what are they?

Direction, form, and strength.

Let's start with direction.

What does that mean?

Direction is shown by the sign.

Positive, plus, or negative.

A positive correlation means the variables tend to move in the same direction.

As X goes up, Y tends to go up.

Or if X goes down, Y tends to go down.

Like height and weight?

Generally taller people, way more.

Good example.

On the scatter plot, the dots would tend to cluster around a line sloping up from bottom left to top right.

And negative is just the opposite pattern.

Exactly.

A negative correlation means they tend to move in opposite directions.

As X increases, Y tends to decrease.

Hmm.

Like maybe speed and accuracy.

If you try to do something faster, you might make more mistakes.

That's a common one.

The scatter plot points would cluster around a line sloping down from top left to bottom right.

Okay, direction.

Same way, plus.

Or opposite ways.

What's form?

Form describes the basic shape the relationship takes.

The most common one we look for is linear.

Meaning the dots roughly follow a straight line.

Yes.

The Pearson correlation that we mentioned is specifically designed to measure linear relationships when you have numerical data.

But life isn't always straight lines, right?

What if the pattern's consistent but curved?

Good point.

The source mentions monotonic relationships.

These are relationships that are consistently one -directional, always increasing or always decreasing, but maybe not at a constant rate.

Like learning something new.

Big jumps in skill at first, then smaller improvements later.

Perfect example.

That's a curve, not a straight line.

For those, especially if you're using rank data, you might use a Spearman correlation, but the main idea is still describing a consistent pattern.

Okay, so form is about the shape usually linear, sometimes monotonic.

Pearson for linear, Spearman often for monotonic or ranks.

What's the third thing?

Strength.

Strength or consistency.

This is about how closely the dots hug that line or curve, how consistent is the pattern.

How tightly packed are the dots?

Exactly.

And this is measured by the numerical value of r, ignoring the sign.

It goes from 0 .00 up to 1 .00.

So 1 .00 is perfect.

All dots right on the line.

Yep.

Or an accurate 1 .00 is also perfect, just sloping downwards.

A value of 0 .00 means absolutely no consistent relationship.

The dots are just a random cloud.

And numbers in between show different degrees, like 0 .80 would be pretty strong, dots are close.

Very strong.

Well, maybe 0 .20 would be weak, the dots are much more stattered, though still showing a slight trend.

The source probably has pictures of this.

It does.

Figure 12 .3 shows scatter plots for different strength.

And it's really important to remember the sign plus or manc tells you direction.

The number 0 to 1 tells you strength.

They're independent.

So a correlation advantage of 0 .8 is just as strong as plus 0 .80.

Absolutely.

Just in the opposite direction.

Okay.

And here we go again.

Even if you get a perfect 1 .00 or a negation of 1 .00.

It still does not mean one variable caused the other, cannot stress that enough.

Got it loud and clear.

Okay.

But what if your data isn't neat numbers?

What if one variable is categorical, like male -female or pass -fail?

Right.

The source covers that too.

If one variable is numerical, like IQ, and the other is non -numerical but has just two categories.

Like graduate and non -graduate.

Yes.

You could split the people into those two groups and compare their IQ scores using say a t -test.

That actually pushes it towards differential research comparing groups.

Or you can still calculate a correlation.

You assign numbers to the categories, maybe graduate one, non -graduate zero.

Then you run the Pearson correlation formula using those 0s and 1s and the actual IQ scores.

And that gives you a correlation value.

It does.

It's called a point -biserial correlation.

The value tells you the strength of the association.

The sign plus or wank isn't really meaningful though because assigning 0 or 1 was arbitrary.

Hmm.

Okay.

What if both variables are categories?

If both are non -numerical, especially if they have multiple categories each, like say different majors and preferred study methods, you usually organize the data differently.

Often in a table, a matrix, showing how many people fall into each combination of categories.

The source has an example with college experience and problem -solving success in figure 12 .4.

And how do you analyze that table?

Typically with a statistical test called a chi -square test.

It looks for patterns or associations in those frequencies.

Okay.

Is there a correlation coefficient for two categories?

If both variables have exactly two categories each, like yes, no on two questions, you can code both as 0 and 1.

Then applying the Pearson formula gives you what's called the phi coefficient.

Another special name?

Yeah.

But it's essentially a Pearson R is calculated on two dichotomous variables.

Again, the value tells you strength, but the sign doesn't mean much.

So even with categories,

there are ways to measure association, though the math might change.

The goal is still seeing if there's a link.

Exactly.

Still looking for that pattern.

Now, beyond just the R value, there are two more really important concepts for interpreting what a correlation means.

Okay.

What are they?

The coefficient of determination and statistical significance.

Coefficient of determination.

Wait, that sounds familiar.

Is that R squared?

That's the one.

You just square the correlation coefficient R.

Okay.

Easy math.

But what does that squared number actually tell me?

This is where you get a sense of the sort of practical impact or the shared overlap between the variables.

R squared tells you the proportion or percentage of the variability in one variable that is associated with or predictable from the variability in the other variable.

Whoa.

Okay.

Unpack that.

Let's use the IQ and GPA example.

If R equals 0 .80.

Okay.

So R .80.

Then R .04 is 0 .80 times 0 .80,

which equals 0 .64.

So 0 .64.

What does that mean?

It means that 64 % of the differences we see in students' GPAs can be statistically accounted for or predicted by knowing their IQ scores.

64%.

So the other 36 % of why GPAs differ is due to other stuff.

Exactly.

Study habits, motivation, luck, quality of teaching, everything else not captured by IQ in this simple relationship.

That's actually really useful.

So if the correlation was weaker, say R .30.

Then R squared would be 0 .30 times 0 .30, which is 0 .09.

Only 9%.

Right.

In that case, only 9 % of the variability in GPA is associated with IQ.

Much less overlap, much less predictive power from IQ alone.

So R squared gives a clearer picture of the relationship's size or importance.

In a way, yes.

The source mentions Cohen's general guidelines people sometimes use in behavioral science.

R around 0 .10 is small, R around 0 .01 or 1%, R around 0 .30 is medium, R around 0 .019 or 9%, and R around 0 .50 is large, or 0 .25 or 25%.

But those are just rough guides.

Very rough.

What counts as large or important really depends on the specific research area in question.

For some things, like checking if a test is reliable,

you'd want much higher correlations.

Okay, so R gives strength,

R squared gives shared variability.

What was the other thing?

Statistical significance.

Right.

Statistical significance is about confidence, it asks.

Is the correlation we found in our sample of people likely just a fluke, just random chance for this particular group?

Or does it probably reflect a real relationship that exists in the larger population we drew the sample from?

Exactly that.

And the source really emphasizes the role of sample size here.

Why is sample size so important?

Well, think about it.

If you only measure two things for, say, two people, you'll always get a perfect correlation, plus 1 .00 or negative 1 .00, just by definition.

It doesn't mean much.

With a really small sample, you can get strong -looking correlations just by luck.

As your sample gets larger, it becomes much more likely that the correlation you find accurately reflects what's going on in the broader population.

So,

statistical significance tests take both the correlation value and the sample size into account.

Yes.

They calculate the probability that you'd get a correlation that strong or stronger in your sample if there were actually no relationship in the population.

If that probability is very low, usually less than 5 % or p .05, we call the result statistically significant.

Meaning, it's unlikely this happened just by random chance in our sample.

Pretty much.

But, and this is a huge but, the source stresses statistical significance does not automatically mean the correlation is strong or practically important.

Wait, how can it be significant but not strong?

If you have a massive sample size, like thousands of people, even a tiny, tiny correlation, say RE0 .10, can be statistically significant.

Really?

Yes.

The significance test tells you that this very weak relationship, RE0 .0, only accounts for 1 % of the variance, remember, our boy, .01, is probably real in the population, not just a sample fluke, but it's still a very weak relationship in practical terms.

Wow.

Okay.

That's a critical distinction.

So, you need to look at significance and the actual size of the correlation, or R -squared.

Absolutely essential.

You need both pieces of information to really understand the finding.

Okay.

So, we know what correlation is, how we measure it, how we interpret R -squared significance.

What do researchers actually do with this correlational strategy?

What are its main uses?

The chapter highlights several key applications.

One big one is prediction.

Makes sense.

If two things are related, knowing one helps you guess the other.

Precisely.

If variables X and Y are correlated, you can use someone's score on X to predict their likely score on Y.

The classic example mentioned in the source is using SAT scores, predictor variable, to predict college GPA, criterion variable.

Because they tend to be positively correlated.

So, a higher SAT score predicts a likely higher GPA.

This is super useful when the criterion variable is hard to measure directly.

Maybe it happens in the future, or it's complex.

Like predicting Alzheimer's risk, maybe?

That could be an example.

The source mentions using positive affect to predict problem -solving ability in older adults as another instance.

You use the known variable to predict the one you're more interested in understanding or anticipating.

The statistical technique often used here is called regression.

Okay, prediction is one use.

What else?

Another fundamental application is evaluating our measurement tools, assessing their reliability and validity.

How well our tests or surveys actually work.

Exactly.

Reliability is about consistency.

Does a test give stable results?

We measure test -retest reliability by giving the same test to people twice, maybe a few weeks apart, and correlating their scores.

High correlation means it's reliable, consistent.

Yep.

And validity is about accuracy.

Does the test measure what it claims to measure?

One type is concurrent validity, where you correlate scores on your new test with scores on an existing established test that measures the same thing.

Like the example of validating a quick seven -minute Alzheimer's screen?

Right.

Each one and colleagues correlated scores from their new short test with scores from traditional longer tests.

They found strong correlations around .70.

Which suggested the new short test was indeed measuring something similar to the established ones.

It showed validity.

Precisely.

It gives confidence in the new measure.

Okay.

Prediction, reliability, validity, any other big applications.

A third major one is evaluating theories.

Many scientific theories make predictions about how variables should be related.

Correlational studies are a great way to test these predictions.

Also.

Well, take the nature versus nurture debate regarding intelligence.

Theories make different claims about the role of genetics versus environment.

And studying twins helps test this.

Exactly.

The source mentions studying identical twins who are raised in different environments.

This is a natural correlational setup.

You measure their IQs and calculate the correlation.

A high correlation, despite different environments, would suggest a strong genetic influence, supporting certain theories.

Precisely.

It provides evidence relevant to the theory.

The chapter even touches on the historical context with Cyril Burt's work, though that had its own controversies later.

So prediction,

checking our tools, testing theories.

Correlational research definitely has value.

But we keep mentioning its limits.

What are the main weaknesses we need to keep in mind?

Okay.

So strengths first.

It's great for describing relationships simply.

It's often non -intrusive, meaning it studies things as they naturally happen, which can give it good external validity.

The findings might apply well to the real world.

And you can study things you can't or shouldn't manipulate in an experiment, right?

Like effects of poverty or personality traits.

Absolutely.

Things that already exist.

And it's fantastic for initial exploratory research, finding relationships that might be worth investigating further.

But.

Here comes the but.

The huge weakness is low internal validity.

It simply cannot definitively determine cause and effect.

Why not?

What stops it?

Two major intertwined problems.

The first is the third variable problem.

Okay, what's that?

Just because you find a correlation between A and B, it doesn't mean A causes B or B causes A.

There might be some other unmeasured variable, let's call it C, that's actually causing both A and B to change.

Creating a kind of fake relationship between A and B?

Exactly.

An illusion of a direct link.

The classic example the source uses is perfect.

Ice cream and crime.

That's the one.

There's a positive correlation.

When ice cream sales go up, crime rates tend to go up too, figure 12 .5.

But surely ice cream doesn't cause crime?

Of course not.

The third variable here is temperature or hot weather.

Hot weather makes people buy more ice cream.

And hot weather also makes people go outside more, interact more, maybe get more irritable, leading to more opportunities for crime.

Exactly.

Temperature C is causing changes in both ice cream sales, A, and crime rates, B, making A and B look directly related when they aren't.

That makes the problem really clear.

What's the second big issue?

The directionality problem.

Even if there is a direct link between A and B, and no third variable C is involved, the correlation itself doesn't tell you which way the causal arrow points.

Does A cause B, or does B cause A?

You just don't know from the correlation alone.

The source uses the example of the link between watching violent TV and children's aggressive behavior.

Okay, so studies find kids who watch more violent TV tend to be more aggressive.

Yes.

But does watching the violence cause the aggression, or is it that kids who are already naturally more aggressive choose to watch more violent TV?

The correlation could go either way.

Exactly.

Figure 12 .6.

The correlation shows they're related, but not the direction of influence.

And these two problems, third variables and directionality, are why we absolutely cannot say correlation proves causation.

Nailed it.

That is the single most important takeaway.

So when you hear media reports… Steady links coffee to longer life, or whatever.

You immediately need to be skeptical about any causal claim.

Think, could something else be involved?

Which way might the influence really go?

Remember the source's analogy, football season starting is correlated with leaves changing color.

But the games don't cause autumn.

Precisely.

Association is not causation.

Okay, so mainly focused on two variables.

Does the source mention looking at more complex situations?

It does, briefly.

It acknowledges that in the real world, things are rarely influenced by just one other thing.

Academic performance isn't just about IQ.

No, it's motivation, study habits, support,

lots of stuff.

So researchers often need to look at relationships involving more than two variables.

The main statistical technique for this is called multiple regression.

Multiple regression.

What does that do?

It lets researchers examine how a set of predictor variables, altogether, relate to or predict one criterion variable.

So instead of just using IQ to predict GPA, you could use IQ and study hours and motivation scores all at once.

Exactly.

It often gives you a much better, more complete prediction than relying on just one predictor.

The source mentions using multiple factors to predict smoking behavior as an example.

Can it also help with that third variable problem?

It can, to some extent.

That's a clever application.

Multiple regression allows you to statistically control for the influence of potential third variables while you examine the relationship between your main two variables.

How does that work?

You essentially ask the statistics,

OK, let's account for the influence of, say, age.

After we've statistically removed the effect of age, is there still a relationship between watching violent TV and aggression?

Ah,

trying to see if the original link holds up even after considering a specific third variable.

Yes.

Collins and colleagues did this in their study on TV content, Controlling for Age.

It adds a layer of sophistication.

Why?

I sense another but coming.

You're right.

The source makes one final critical point, even multiple regression, which helps with prediction and allows you to statistically control for measured third variables.

Still doesn't prove cause and effect.

Correct.

Statistical prediction and statistical control are not the same as establishing a causal explanation.

There could always be other, unmeasured third variables.

For true causal claims, you really need experimental methods involving manipulation and control over variables.

OK, that really rounds out the picture.

So let's try and sum up this whole chapter.

We've seen that the correlational strategy is fundamentally about describing relationships between variables as they naturally exist.

Right, observing and measuring, not manipulating.

We talked about how it's different from experimental research, which seeks causation, and differential research, which compares pre -existing groups.

We dug into the data pairs of scores, visualized with scatter plots, and summarized by the correlation coefficient r.

And that r tells us direction, positive or negative, form, usually linear with Pearson, maybe monotonic with Spearman, and strength, from 0 to 1.

We also covered handling non -numerical data, using techniques that lead to things like the point -by -serial correlation or the phi coefficient, or using chi -square tests.

Then we went deeper into interpretation,

looking at r -squared, the coefficient of determination.

Which tells us the percentage of shared variability, a measure of practical importance.

And statistical significance, which tells us if the finding in our sample is likely real in the population,

but remembering that significance doesn't automatically mean strong.

Absolutely.

And we reviewed the valuable applications, prediction using predictor and criterion variables, assessing the reliability and validity of our measurements.

And providing evidence to evaluate scientific theories, like the nature -nurtured debate.

But the big caution sign, the fundamental weaknesses, the third variable problem that hid in C causing both A and B.

And the directionality problem, not knowing if A causes B or B causes A.

Which together mean correlation does not equal causation.

Even with multiple regression, which helps predict using several variables and can statistically control for some third variables, it's still not proving cause.

Exactly.

It's a vital tool for exploring the world non -intrusively, especially when experiments aren't feasible or ethical.

It describes what is, but it leaves the why questions largely unanswered.

So it's essential,

but we have to be really aware of its limits when we interpret findings.

Couldn't say it better.

Okay, so here's a final thought for you listening right now.

Knowing everything we've just discussed especially,

that correlation can't prove cause and effect.

How might that change the way you look at claims you see every day?

Yeah, when you read a headline or hear someone say studies show X is linked to Y, maybe health or success or behavior.

What questions should pop into your head immediately?

What about those potential third variables?

What about the direction of influence?

Thinking critically about those connections, those correlations, that's really key to navigating the information that comes at us constantly.

And that wraps up our deep dive into the correlational research strategy, drawing entirely on the insights from this chapter in research methods for the behavioral sciences.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers
Examining associations between variables without experimental manipulation forms the foundation of correlational research, a methodological approach essential when ethical constraints, practical limitations, or the nature of naturally occurring phenomena preclude controlled intervention. Rather than establishing causation through deliberate manipulation, correlational studies quantify the strength, direction, and form of relationships using statistical coefficients tailored to different variable types—Pearson correlation for continuous measurements, Spearman correlation for ranked ordinal data, point-biserial correlation when pairing continuous with dichotomous variables, and phi coefficients for categorical associations. Visual representation through scatter plots enables researchers and consumers to identify trend patterns, assess linearity or monotonic progression, and detect outliers that may distort statistical summaries. The practical utility of correlational research extends considerably beyond descriptive association; regression analysis permits prediction of criterion variables from known predictor variables, reliability and validity coefficients quantify consistency and accuracy in psychological measurement instruments, and theoretical models can be evaluated against empirical patterns in real-world data. However, two fundamental obstacles prevent correlational findings from establishing causal conclusions. The third-variable problem emerges when unmeasured confounding factors simultaneously influence both variables, producing spurious correlations that suggest relationships where none actually exist. The directionality problem reflects the inherent ambiguity of correlational data regarding which variable influences the other, since associations remain symmetric regardless of causal direction. Multiple regression techniques address complexity by simultaneously examining relationships among numerous variables, while the coefficient of determination quantifies precisely what proportion of variation in one variable can be accounted for by another. Competent interpretation of correlational research requires recognizing both its strengths—particularly its external validity and applicability to real-world phenomena—and its constraints as a fundamentally descriptive rather than explanatory tool for understanding human behavior and natural processes.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥