Chapter 9: Covariance and Correlation

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replace the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Let me just paint a picture for you real quick.

Oh, please do.

Okay, so you're a college student.

The glow of your desk lamp is eliminating your open textbook and you are staring down

chapter 9 of John R.

Taylor's Introduction to Error Analysis.

A classic, but yeah, an intimidating one.

Exactly.

I mean, the title of the chapter alone, Covariance and Correlation, sounds imposing and the pages, they are just swimming with Greek letters,

summation symbols, and these incredibly dense formulas.

It feels entirely overwhelming.

It absolutely can be.

When you first encounter this level of statistical mechanics, it looks like a completely foreign language, but there is a very elegant, like very tangible logic beneath all of those symbols.

We just have to translate it from math into reality.

And that is our mission for this deep dive.

Consider this your personal tutoring session.

I love that.

We are going to rescue you from that sea of equations.

We're going to decode exactly how uncertainties and physical measurements interact with each other in the real world.

Because sometimes those uncertainties compound.

They make your errors dramatically worse.

Yeah, but surprisingly, sometimes they actually work together to cancel each other out entirely.

Which is a beautiful concept to wrap your head around.

In physics and honestly, in life variables rarely exist in total isolation.

True.

Understanding how they influence each other isn't just about passing a test or getting good grade on a lab report.

It's about knowing whether an experiment actually proves what it claims to prove.

Before we jump into the heavy stuff like covariance, we need to establish a baseline.

We have to understand what happens when you combine measurements in a simple scenario.

Let's lay the groundwork.

Section 9 .1 stuff.

Yeah, exactly.

Let's use an everyday analogy.

Think of budgeting for a road trip.

You're trying to calculate your total cost, which we'll call your final function, or dollars.

Okay, dollars is the total budget.

Right.

And you have two measured quantities you're estimating.

Your gas cost, let's call that $6, and your food cost, high dollars.

A perfect practical setup.

So what goes wrong?

Well, let's say you overestimate your gas cost by 50 bucks, and you also happen to overestimate your food cost by 50 bucks.

Your total budget error is just the sum of both of those mistakes.

You're off by a hundred bucks.

Right.

In error analysis, this represents equation 9 .1, the absolute sum.

It gives you the upper bound of uncertainty where errors just directly brutally add up.

That's the absolute worst case scenario.

Mathematically, it assumes that everything that could go wrong went wrong in the exact same direction.

Yeah.

The formula for this essentially looks at how sensitive your total budget is to changes in gas or food using partial derivatives,

multiplies that sensitivity by your specific uncertainties, and adds the absolute values together.

But reality is usually a little more forgiving, right?

Because what are the odds that both your gas and your food estimates are terribly wrong in the exact same direction by the maximum possible amount?

Highly unlikely.

I mean, if your estimates for gas and food are completely independent of each other, meaning a mistake in the gas budget has absolutely no physical connection to a mistake in the food budget, we use a different approach.

Assuming the errors are random, of course.

Exactly.

For independent random errors, we use equation 9 .2, the quadratic sum.

That's where you take the square root of the sum of the squares, right?

I always wondered why we go through that extra mathematical gymnastics instead of just adding the numbers.

We do it because random errors are just as likely to be positive as they are negative.

Like, if you do this trip 100 times, sometimes you'll overestimate gas but underestimate food.

Oh, I see.

Over time, those random fluctuations tend to partially cancel each other out.

Squaring the individual errors makes all the values positive so they don't zero out completely.

Right, because a negative 50 squared is still a positive number.

Exactly.

And taking the square root at the end scales it all back down.

The quadratic sum gives you a smaller, much more realistic estimate of your true uncertainty.

So the quadratic sum is our standard go -to for error propagation.

But here's the catch, and this is the transition into the real meat of the chapter, section 9 .2.

The plot twist.

The entire premise of that quadratic sum rests on a massive assumption.

It completely assumes that your variables are totally independent.

And in the real world of experimental physics, assuming two things are entirely independent is a very dangerous game to play.

What happens if your variables aren't independent?

What if they are inextricably linked?

That brings us to the introduction of the hero, or maybe the villain, depending on how much you like math of this material.

Covariance.

Defined in equation 9 .8.

Right.

Covariance is the mathematical tool we use to measure the relationship between the errors of two variables.

It asks a really simple question.

If you overestimate your first variable, does that mean you also typically overestimate your second variable?

And if they do?

If they move together like that, we call it a positive covariance.

Alternatively, if you overestimate the first variable, does that mean you typically underestimate the second one?

Ah, moving in opposite directions.

Exactly.

That would be a negative covariance.

Okay, I have to push back here.

How can errors perfectly cancel each other out in a lab setting?

That sounds like, I don't know, getting away with a sloppy mistake.

It does sound counterintuitive.

Yeah, like how does an overestimate in one thing guarantee an underestimate in another?

The textbook provides a brilliant real world example to illustrate this exact phenomenon.

It's in table 9 .1.

Imagine you and four other students are looking at a drawing on a piece of paper.

Okay, I'm visualizing it.

On this drawing, there are two adjacent angles sitting right next to each other.

We'll call them angle alpha and angle beta.

Together they form one larger total angle.

A single large angle sliced into two smaller pieces by a line drawn down the middle.

Exactly.

Your assignment is to measure alpha, then measure beta, and add them together to find the total angle.

Now, the main source of error here isn't the outer edges of the big angle.

What is it then?

The main source of error is figuring out exactly where that middle dividing line is.

Maybe because the line was drawn a bit wide or it's slightly blurry.

Oh, I've been there staring at a port tractor trying to guess the center of a thick pencil mark.

Right.

If you look at that blurry line and accidentally judge it to be a bit too far to the left, you are going to overestimate angle alpha.

Because you've essentially stolen some space from the other side.

Which means, because of that exact same mistake, because you moved the shared dividing line to the left,

you physically must underestimate angle beta.

Oh, wow.

That completely clicks.

The error isn't random.

The error is the dividing line itself.

Exactly.

The two measurements are inextricably linked.

You cannot make a mistake on one without making the exact opposite mistake on the other.

And when you look at the measurements from all five students in the textbooks example, this relationship shows up clearly in the math, doesn't it?

It really does.

If a student overestimates alpha by two degrees, they inevitably underestimate beta by two degrees.

To find the covariance equation 9 .8 has us multiply those individual deviations together.

So a positive two overestimate multiplied by a negative two underestimate gives you a negative four.

Precisely.

And since every single student is fighting that same blurry middle line,

the product of their deviations is always going to be a negative number or zero.

It's never going to be positive.

Because the products are consistently negative, when you average them all out to find your overall covariance, you end up with a negative value.

Yes.

We have a definitive negative covariance.

So we have a negative number.

How does finding that negative number actually help us calculate our total error for the big angle?

This is where we look at the ultimate equation of the chapter equation 9 .9.

This is the exact formula for the deviation of our final calculation.

The massive one.

When variables are correlated, you can't just use the simple quadratic sum anymore.

You start with that familiar quadratic sum, the squares and the square root, but you bolt on a vital third piece at the end.

Let me guess.

The third piece includes our covariance.

You got it.

It's plus two times the partial derivatives, the sensitivities we mentioned earlier, multiplied by the covariance.

And here is magic, right?

Because the covariance in our angle example is negative, that entire third piece of the equation becomes negative.

It physically subtracts from the total uncertainty.

Let's look at the actual numbers in Taylor's text.

If we ignored the fact that the angles share a line.

Like if we treated them as totally independent.

Right.

If we just use the basic quadratic sum, our calculated uncertainty would be 2 .3 degrees.

But by using equation 9 .9 and factoring in the negative covariance, those linked errors cancel out mathematically, just like they do physically.

What does the uncertainty drop to?

Our true uncertainty drops down to a mere 0 .6 degrees.

Wait, really?

From 2 .3 down to 0 .6?

Yes.

It proves mathematically how correlated errors can literally save your experiment.

That is incredibly satisfying to see the math perfectly mirror the physical reality.

But I can imagine a student panicking at this point.

Why is that?

Well, covariance requires a lot of data points to find those averages, right?

What if you're in an introductory lab, the clock is ticking, and you just don't have enough data to calculate the covariance explicitly?

That's a very real fear.

Are you just doomed to have the wrong error bounds?

Not at all.

The textbook provides a mathematical safety net for exactly this situation, known as the Schwarz Inequality.

That's equation 9 .11 and 9 .1.

The Schwarz Inequality.

It sounds like the spy thriller.

It does carry some dramatic weight,

but what it essentially proves is a hard mathematical limit.

It states that the absolute value of your covariance will never, ever exceed the product of your individual standard deviations.

Translate that into a testing strategy for me.

What does that actually mean for my lab report?

It means that when you merge that rule into our big, complicated formula, equation 9 .9, it creates a foolproof boundary.

It definitively proves that your total uncertainty will never be worse than that absolute worst case scenario we talked about at the very beginning.

Equation 9 .1.

The symbol upper bound where you just add the absolute errors together.

Exactly.

So if you're totally stuck and you can't figure out the linked errors, you can just fall back on the direct addition of your absolute errors.

You can just do that.

You can.

The Schwarz Inequality mathematically guarantees that even with all this underlying complexity, you're always safe assuming the worst.

Wow.

Okay.

It won't give you the most precise, refined answer.

You won't get that beautiful 0 .6 degrees, but it will never be mathematically invalid.

You are always safe using the upper bound.

That is a massive relief.

Okay, so we spent all this time in sections 9 .1 and 9 .2, figuring out how to track errors and uncertainties, but the text does a fascinating pivot in section 9 .3.

It really does.

We take these exact same mathematical concepts and shift from tracking lab mistakes to tracking actual relationships in the real world.

This is a major conceptual leap.

We are moving from error analysis into testing hypotheses.

We want to know if two totally different variables support the theory that they are linearly related.

Meaning if you plot them on a graph, they form a straight line.

Precisely.

And to do that, we have to talk about the coefficient of linear correlation, which is denoted as a lowercase r.

Equation 9 .15, right.

Let's break down r.

So the coefficient r takes our old friend covariance and normalizes it.

It literally just divides the covariance by the standard deviations of your two variables, six balls in your dollars.

Why do we need to divide it?

We do this because covariance on its own is highly dependent on the units you're using.

Like whether you're measuring in centimeters or miles or kilograms, the raw number changes wildly.

By dividing out the standard deviations, we strip away the units entirely.

Leaving us with a pure dimensionless number.

I love this part.

I like to think of r as a volume dial on a stereo, but instead of going from zero to 10, this dial goes from negative one to positive one.

That's a great analogy.

If the dial is right in the middle at zero, that's just static.

The variables have no relationship whatsoever.

They're totally uncorrelated.

And if you crank that dial all the way up to a positive one, you are blasting a perfect positive signal.

If you plot those data points on a scatter plot, every single dot will fall exactly on an upward sloping straight line.

As one variable increases, the other increases in lockstep.

Exactly.

And cranking it all the way down to negative one.

No, that's a perfect negative signal.

The data points form a perfect downward sloping straight line.

High values in one variable correlate perfectly with low values in the other.

Let's put this volume dial to the test with one of my favorite examples from the text.

Figure 9 .1 and table 9 .3.

The case of the anxious professor.

A scenario playing out in lecture halls everywhere, I'm sure.

Oh, absolutely.

So we have a professor who desperately wants to convince his class that doing their homework will actually help them do well on the final exams.

He runs a little experiment.

He tracks 10 students.

Okay, 10 students.

He plots their homework scores horizontally on the x -axis and their exam scores vertically on the x -axis.

When you look at the resulting scatter plot, the dots don't form a perfect rigid line, but they definitely seem to trend upward.

Right.

There's a visible pattern.

He takes all that data, plugs it into equation 9 .15 for r, and calculates a correlation coefficient of 0 .8.

And the professor is thrilled.

I mean, an r of 0 .8 is reasonably close to positive one.

He looks at this finding and declares victory.

High homework scores clearly correlate with high exam scores.

I was right.

Okay, I'm going to step in here and play the skeptic.

Because yes, 0 .8 is relatively close to one, but is close enough, actual science.

A very fair question.

How do we know this professor didn't just get lucky?

He only tracked 10 students.

What if this specific group of 10 students just happened to be naturally great test takers and it has absolutely nothing to do with the homework?

You are hitting on the exact mechanism that sections 9 .4 and 9 .5 address.

Quantitative significance.

It's the ultimate so what test.

Yes.

How do we objectively decide if an r value of 0 .8 is actually meaningful?

To answer your skepticism, we have to bridge the gap between looking at a scatter plot and proving it with rigorous statistical probability.

So how do we prove it?

Is there a master answer key somewhere?

There is, actually, in the form of table 9 .4 provided in the text.

This table looks like a terrifying wall of numbers at first glance, but it's an incredibly powerful reality check.

What does it actually calculate?

It calculates the exact probability that a set of completely uncorrelated, totally random variables would miraculously produce a correlation coefficient at least as big as the one you found, purely by sheer luck.

Oh, wow.

So it tells you the odds that your discovery is just a fluke.

Yes.

And the magic ingredient here, the thing that dictates everything,

is your sample size, denoted as n.

The number of data points you have completely dictates how much you can trust your volume dial.

Let's walk through how this probability check works, starting incredibly small.

Let's say you do a totally different experiment.

You take just three measurements.

So your sample size n is 3.

Okay, n equals 3.

And you calculate an r value of 0 .7.

Now, out of context, 0 .7 sounds pretty strong, right?

It sounds great.

Until you consult table 9 .4, if you look at the intersection for a sample size of 3 and an r of 0 .7, the math tells you there is a 51 % chance of getting a correlation that high, purely by random chance.

51%.

It's literally a coin flip.

With only three measurements, your data is virtually worthless.

Yeah.

It is entirely possible, even probable, that your variables are completely unconnected.

Okay, let's double our effort.

We go back to the lab.

We take six measurements, n is 6.

And we get that exact same r value of 0 .7.

We consult the table again.

For a sample size of 6 and an r of 0 .7, the probability of it being a random fluke drops to 12%.

12 % is definitely better than 51%.

It is better.

But in the scientific community, it's still not good enough.

A 12 % chance of your results being a random accident is simply too high to confidently publish a scientific paper or claim a definitive relationship.

All right, let's get serious.

We put in the real work.

We take 20 measurements, n is 20.

And again, we calculate that exact same r value of 0 .7.

Now, the situation completely changes.

According to the probability calculations, with 20 data points, the chance of uncorrelated variables randomly producing an r of 0 .7 is a microscopic 0 .1%.

Wow.

So the math is finally saying it is incredibly highly improbable that this happened by accident.

Exactly.

And this brings us to the formal definitions of significance.

The definitions every college student needs to have locked in for their exams.

Let's hear them.

By standard convention, if the probability of your result occurring by random chance is less than 5%, the correlation is officially called significant.

Less than 5 % is significant.

Got it.

And if that probability drops below 1%, it is elevated to highly significant.

Okay, let's close the loop on our anxious professor and his homework study.

He had a sample size of 10 students, and he found a correlation coefficient of 0 .8.

Let's subject his findings to the ultimate reality check using table 9 .4.

We look at the probability thresholds for an n of 10 and an r of 0 .8.

The math shows that the probability of 10 uncorrelated data points randomly producing a signal that's strong is only 0 .5%.

0 .5%.

That is well under the 1 % threshold.

Which means the professor's correlation is officially highly significant.

He is vindicated by the statistics.

It is incredibly likely that the scores on homework and examinations really are linked.

Good for him, and good for his students.

Man, this has been quite the journey through chapter 9.

Let's quickly summarize the logical path we just walked down so you have the entire narrative locked in for your study session.

A structural review is the best way to solidify the concepts.

We start it with the baseline.

How independent random errors compound using the quadratic sum, equation 9 .2.

But then we hit the plot twist.

We discovered that if variables are linked, their connected errors can actually cancel each other out, which we track using covariance and equation 9 .9.

We also learned that if we are ever short on data, the Schwarz inequality ensures we are always mathematically safe falling back on our simple upper bounds.

Right.

Then we took the exact same underlying mechanics of covariance, stripped away the units, and used it to find the correlation coefficient bar to test for actual linear relationships in the real world.

And finally, we learned how to prove those relationships aren't just random luck by using sample sizes and probability thresholds in table 9 .4 to determine quantitative significance.

It is a beautifully constructed sequence of logic that builds entirely on itself.

It really is.

Once you get past the intimidation of the Greek letters.

Now, as we wrap up this deep dive, we want to leave you with something to chew on that goes a bit beyond the standard textbook examples.

Think about how these hidden correlations affect absolutely everything around us.

If you look at the problem sets at the very end of chapter 9, Taylor shows us just how incredibly broad these applications are.

Oh yeah, the problem sets are wild.

In one problem, problem 9 .6, we see this exact math being used to track the momentum of subatomic particles like pions and kaons breaking apart in a bubble chamber photograph.

The direction of one particle perfectly correlates with the other.

It's the exact same logic as the adjacent angle dividing line, but applied at the quantum level.

Exactly.

And then in problem 9 .13, the context completely shifts to psychology, tracking the intelligence quotients of fathers and sons to see if there's a highly significant correlation between generations.

That's a huge leap from measuring blurry pencil lines.

It is.

What's fascinating here is that the math you just learned isn't just about passing a test or balancing a budget.

It is the fundamental mathematical lens used by humanity to prove whether the universe is driven by hidden rules or just sheer coincidence.

That's a powerful way to look at it.

It makes you wonder,

how many things in your own life do you just assume are perfectly correlated when really, maybe your sample size is just too small?

Something to think about the next time you jump to a conclusion.

An excellent point.

Well, we've covered the text, we've decoded the equations, and hopefully your study session is looking a lot brighter.

On behalf of us here, a warm thank you from the Last Minute Lecture team.

Good luck on your exams and keep questioning the data.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Analyzing relationships between measured variables requires accounting for how systematic errors in one quantity influence errors in another, a consideration that standard independence-based uncertainty propagation often overlooks. When two variables exhibit correlation—such that deviations from their respective means tend to occur together in consistent patterns—the traditional quadratic sum formula for combining uncertainties becomes invalid. Covariance provides the mathematical framework for capturing this joint variability by quantifying the average product of paired deviations from their means. Incorporating covariance into error propagation yields additional terms that modify the final uncertainty estimate, either increasing it when variables are positively correlated or decreasing it when correlations are negative, thereby reflecting how systematic errors interact. The Schwarz inequality establishes a mathematical guarantee that using absolute values in uncertainty formulas produces conservative estimates that remain valid across any correlation structure. Beyond error propagation, understanding the strength and direction of linear relationships requires the correlation coefficient, a standardized metric derived by normalizing covariance by the product of individual standard deviations. This dimensionless measure ranges from negative one to positive one, where intermediate values reveal whether data points cluster tightly around a best-fit line or scatter widely, indicating weak association. However, observed correlation coefficients require rigorous statistical evaluation because random noise in finite datasets can occasionally produce apparent relationships by chance alone. Statistical significance testing quantifies this possibility by computing the probability that uncorrelated population parameters would generate the observed sample correlation purely through random variation. This probability depends critically on sample size—larger datasets render spurious correlations increasingly improbable—making sample magnitude a fundamental consideration when interpreting correlation strength. Standard probability thresholds of five percent and one percent mark the boundaries between statistically significant results, highly significant results, and findings likely attributable to random variation, enabling researchers to distinguish genuine linear relationships from sampling artifacts.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 9: Covariance and Correlation

Related Chapters