Chapter 5: The Normal Distribution

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement, not replace, the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Imagine you're standing in the physics lab and your entire grade for the week depends on finding the exact focal length of a lens.

Oh yeah, the classic lens experiment.

It's a rite of passage.

Right.

So you measure it once and you get 26 centimeters.

And okay, cool.

But you want to be thorough, so you measure it again, but this time you get 24.

And then you do it again and get 28, then 23.

You do it 10 times and you get 10 completely different numbers.

It is incredibly frustrating.

It really is.

You're just staring at this chaotic jungle of numbers, wondering, you know, which one is the actual true answer?

Are any of them right?

Well, it's a very universal wall that pretty much every science student hits.

We inherently crave order and a single right answer.

Yeah, exactly.

But raw experimental data just hands us noise.

And if you were to stay in that lab and record a thousand measurements, that jungle of numbers would just get denser and more impenetrable.

And untangling that chaos is exactly what we are doing for you today.

Welcome to this custom tailored deep dive.

We are so glad to have you here.

We know you are a college student tackling error analysis for the very first time.

So consider this your personal one -on -one tutoring session.

Yep, just us working through the material.

We are diving exclusively into chapter five from introduction to error analysis, which is all about the normal distribution.

But our goal isn't just to give you formulas to memorize.

Far from it.

We really want you to understand the theoretical why behind the statistical tools you use.

Right, because just plugging numbers into a formula doesn't actually teach you physics.

Exactly.

So we are going to start with those raw data piles,

figure out how to build smooth theoretical curves out of them, and translate the intimidating math into plain logic.

And finally understand how you actually know if your experimental result is acceptable or if it's a total failure.

Which is the most important part.

So let's jump right into that lab scenario.

You have this messy list of ten different measurements for your lens.

How do we start making sense of it?

I mean, writing them in a list conveys almost zero information.

Well, the very first step is organizing them into a distribution.

Now, if your measurements are tidy,

discrete integer values.

Wait, discrete integers?

Yeah, let's step away from the lens for a second and imagine scores on a 50 -point exam.

You can use a simple bar histogram for that.

Oh sure, like a standard bar chart.

Right.

You plot the different possible test scores on the horizontal axis.

And on the vertical axis, you plot the fraction of times that specific score occurred.

Okay, so if the score 42 showed up three times out of ten total students, the fraction is three -tenths, or .3.

Exactly.

The text called this fraction F sub k.

F sub k, got it.

And logically, because these fractions represent parts of the whole class, if you add up all those fractions for every score, they have to equal exactly one.

Spot on.

The text refers to that as the normalization condition.

The total fraction must equal one.

And organizing the data this way gives us a much more powerful way to calculate the mean.

Oh really?

How so?

Well, instead of adding up a massive list of individual numbers and dividing by the total, you can do a weighted sum.

Ah, I think I remember this.

Yeah, you just multiply each unique value by its specific fraction, its F sub k, and sum those up.

It's mathematically identical to a standard average, but it's far more efficient.

Okay, that makes sense for test scores, where you can only get like a 42 or a 43, but going back to measuring the focal length of a lens down to the millimeter, we hit a wall.

We do.

Here is an analogy I always use.

Measuring continuous physical quantities is like trying to catch exact drops of water falling from a ceiling.

Oh, that's a good way to put it.

You are rarely, if ever, going to get the exact same decimal twice.

So plotting a single vertical bar for every exact decimal is kind of useless.

Totally useless.

You just get a flat line of a million tiny bars that are all the exact same height.

Right, which forces us to change our visual approach.

Exactly.

Instead of plotting exact decimal values, we divide the horizontal axis into convenient intervals, or bins.

Bins, okay.

For instance, any measurement between 23 and 24 centimeters goes into one bin.

Any measurement between 24 and 25 goes into the next.

Okay, that groups things nicely.

It does.

But this requires a crucial shift in how we interpret the graph.

In a bin histogram, the fraction of measurements falling into a specific bin isn't just the height of the rectangle you draw.

It is the area of that rectangle.

Wait, let me make sure I have this straight.

The area is the fraction.

So the width of my bin multiplied by the height of the bar tells me what percentage of my data landed there.

Exactly.

The total area of all the rectangles combined still equals one.

But by using area, we set ourselves up for a massive conceptual leap.

I'm ready.

What's the leap?

Well, what happens if you don't take ten measurements, but a thousand?

Or a million?

Or theoretically, an infinite number of measurements?

I assume my bins can get narrower.

Like, if I have a million data points, I don't need my bins to be a whole centimeter wide.

I can make them a millimeter wide and they'd still be full of data.

Not precisely.

And as you take the number of measurements toward infinity, those bins become infinitesimally narrow.

The blocky, jagged steps of your bar chart transform into a perfectly smooth continuous curve.

The text calls this the limiting distribution, denoted as a continuous function, f of x.

So, because we already established that the area represents a fraction of measurements, this smooth curve seamlessly translates to calculus.

Yes.

You've got it.

If I want to know the probability of a measurement falling between point A and point B, I don't add up blocky rectangles anymore.

I just find the area under that smooth curve between A and B using a definite integral.

And just like the bar chart, the total area under this entire smooth curve integrated from negative infinity to positive infinity must equal exactly one.

Because a measurement has to fall somewhere, so the total probability is 100%.

Exactly.

Okay, so we've morphed our blocky bins into a perfectly smooth mathematical curve by imagining infinite data.

But does that curve always look the same?

What do you mean?

Do my messy lab results magically form a perfect symmetrical bell shape every single time?

Because knowing my lab partners, things usually skew pretty weirdly.

Yeah, that's a great point.

They absolutely do not always form a bell shape.

That perfect symmetry comes with a massive strict condition.

Okay, what's the condition?

Your measurements will only form that classic bell curve if they are subject to many small random errors,

and your systematic errors are negligible.

Let's define that difference for the listener, because it's vital.

Please do.

A systematic error is like using a fabric tape measure that has been stretched out over years of use.

Oh, I hate when that happens.

Right.

Every single time you measure, it's going to be off in the exact same direction.

It pushes all your values off center and skews the whole curve.

Meanwhile, random errors are the tiny fluctuations you just can't control.

A flight change in reading an analog dial, a tiny vibration in the table, things like that.

It's just background noise, basically.

Yeah, and they are equally likely to push your reading slightly above the true value as they are to push it slightly below.

Okay, so they balance out.

Exactly.

When you have many of these small random errors pushing equally in both directions, the results naturally build up in the middle and trail off evenly on the sides.

Which gives us that perfect bell shape.

Right.

We call this specific symmetrical limiting distribution the normal distribution or the Gauss distribution.

So the text actually drops the full mathematical formula for this Gauss curve.

And if you are just staring at it on the page, it looks like a nightmare.

It really is intimidating at first glance.

It's got a lowercase e raised to a massive negative fraction multiplied by another massive fraction with a square root of pi in it.

It's a lot of symbols.

I mean, instead of reading a terrifying string of letters, let's look at what the pieces of this equation actually do to the shape of the curve.

That's the best way to tackle it.

The most important piece is sitting up in the exponent.

You have the term x minus capital X.

Okay, so capital X represents the actual perfect true value we are trying to find, right?

Yes.

And lowercase x is our specific measurement.

So x minus capital X is just the physical distance our measurement landed away from the true answer.

And in the formula, that distance is squared.

Right.

That is the secret to the symmetry.

If you guess two units above the true value or two units below the true value, squaring it gives you a positive four either way.

Exactly.

The function is forced to fall off equally on the left and the right.

And there's a negative sign in front of that squared term, which ensures that as your measurement gets further and further away from the center, the probability exponentially decays down towards zero.

You nailed it.

We also have to look at the width parameter denoted by the Greek letter sigma.

It sits in the denominator of that exponent and it defines the physical spread of our bell curve.

Sigma, the standard deviation.

So what happens if sigma is really small?

If sigma is a small number, it means our measurements are highly precise and tightly clustered.

The curve will look like a tall, skinny spike.

And if sigma is large?

Then our precision is terrible and the curve looks like a short, wide, spread -out hill.

Okay, that makes visualizing it so much easier, but what about that big,

messy fraction at the very front of the equation?

The one with the standard deviation and the square root of 2 pi.

It just looks so out of place.

Oh, that is just the normalization factor.

Normalization, like making the area equal 1.

Exactly.

Remember our rule that the total area under the entire curve must equal exactly 1.

If you integrate that exponent part all by itself, the map spits out a very specific weird number.

Oh, so that fraction at the front just cancels it out.

Precisely.

It is perfectly calibrated to divide out that weird number.

It just guarantees the final integrated area is 1.

Okay, so we have this beautiful symmetrical curve with a center and a defined with sigma.

But what does sigma actually mean for you when you are standing in the lab?

This brings us to what I call the magic 68%.

The magic 68%.

I love that.

So how does that work?

Well, if we take the definite integral of our Gauss function, meaning we calculate the area under the curve, starting from exactly one standard deviation below the true value, and going to exactly one standard deviation above, the area is approximately 0 .68 or 68%.

Which means if your experiment is normally distributed,

there is a 68 % probability that any single random reading you take will fall within one sigma of the true answer.

Right.

You can be 68 % confident.

And it scales up predictably, right?

It does.

If you expand your integration limits to two standard deviations, the probability jumps to about 95 .4%.

At three standard deviations, you cover 99 .7 % of the area.

Wow, 99 .7%.

So almost everything.

Exactly.

It is exceptionally rare, only a 0 .3 % chance that a valid measurement subject only to random errors will fall more than three standard deviations away from the true value.

Okay, but wait, I have to stop you here.

Uh oh.

What's up?

This is the part of the chapter that always drove me crazy.

This entire bell curve, all this math, it relies completely on knowing the true value.

Capital X.

Yes, it does.

The distance from the center is based on X.

But if I knew the true value of the lens's focal length, I wouldn't be stuck in the lab doing the experiment in the first place.

That is entirely fair.

How can we use any of this if we don't know the most important variable?

That is the ultimate paradox of experimental physics.

And it introduces a profound statistical concept to solve it.

It's called the principle of maximum likelihood.

The principle of maximum likelihood.

Okay, lay it on me.

The premise is actually quite elegant.

Since we don't know the true value or the true width, we have to guess.

And the best estimates for those values are the ones that make our specific messy set of observed measurements the most statistically likely to have occurred.

I always picture it like this.

Imagine you walk up to a wooden fence and see five arrows stuck in it.

Okay, I'm picturing it.

They are clustered around a specific spot, but there is no target painted on the wood.

The arrows have already landed.

The principle of maximum likelihood is basically you taking a bucket of paint and drawing the bullseye exactly where it makes those scattered arrow hits make the most sense.

You center the bullseye right in the middle of the cluster to maximize the odds that the archer was actually aiming there.

That is a brilliant visual.

And the textbook actually proves this mathematically.

Oh, the math backs up the paint bucket.

It sure does.

The probability of getting your exact set of 10 measurements is the product of their individual probabilities on the Gauss curve.

To maximize that total probability,

we have to look back at the exponent in our formula.

To make the probability as large as possible, we have to make that negative exponent as small as possible.

Right, because it's a negative exponent.

Exactly.

So we must minimize the sum of the square distances between our raw measurements and our guessed bullseye.

And to find a minimum in calculus, you take the derivative and set it to zero, right?

You are basically looking for the flat bottom of a valley.

Precisely.

You take the derivative of that sum with respect to your guest center, x, and set it to zero.

And when you do that, the algebra beautifully simplifies.

What does it simplify to?

The math undeniably proves that the single value of x that maximizes the likelihood of your data is exactly the average.

The mean of your measurements.

Wait, really?

Just the normal average?

Yep.

Every time you've ever averaged your data in high school chemistry, you were actually applying the principle of maximum likelihood without even knowing it.

That is incredibly satisfying, just mathematically proving that averaging is the right thing to do.

It is nice when things work out.

But the text points at a really sneaky subtlety here when we try to estimate the width, the standard deviation.

Ah yes, the dreaded n minus one.

You know it.

When we do the exact same calculus to find the best estimate for the standard deviation, we run into a bias.

We are forced to calculate the spread of the data using our newly calculated mean because we still don't have the true perfect center.

Right, and because the mean is literally calculated from our data points, it's inherently drawn to be closer to them than the actual true value might be.

Exactly.

The mean sits perfectly snug inside your data.

Because of this, using the mean mathematically underestimates the true spread of the errors.

Because it's too close to the data points.

Right.

So to correct this underestimation, the formula for the best estimate of the standard deviation divides the sum of the squares by n minus one instead of just n.

Dividing by n minus one is one of those things you easily forget on a midterm, but it proves you are actively correcting for the fact that you used a guessed mean.

It's a small change, but conceptually huge.

And the text also highlights that if you only take three or four measurements, your standard deviation is wildly unreliable anyway.

It gives a formula for the fractional uncertainty of the standard deviation itself.

Yes, the uncertainty of the uncertainty.

Yeah, if you only take three measurements, your standard deviation is mathematically 50 % uncertain.

Which is why we always push for more data.

Now we've figured out how to evaluate a single measured quantity.

But science rarely stops there.

True.

What happens when we combine different measured quantities together to calculate something totally new?

Let me push back on this on behalf of our listener, because the math here feels counterintuitive.

Okay, let's hear it.

Say I'm measuring a rolling cart.

I measure the distance it traveled with an uncertainty of two inches.

I measure the time it took with an uncertainty of two seconds.

When I combine them to find velocity, isn't my total uncertainty just four?

It feels like it should be.

Why did chapter three tell us to use that weird Pythagorean theorem formula where we square the uncertainties, add them, and take the square root?

It does feel like you should just add them up.

But the normal distribution justifies this specific method, which is called addition in quadrature.

Addition in quadrature, okay.

The text walks through four steps to prove why this happens.

First, if you just add a fixed exact number to a measured quantity, the bell curve just shifts down the axis.

The center moves, but the width, the uncertainty stays exactly the same.

Okay, that makes sense.

If I add five to everything, the spread doesn't change.

Right.

Step two, if you multiply a measured quantity by a fixed constant, say you multiply by three, the center gets multiplied by three, but the width also gets stretched out by three.

That still feels pretty logical.

But step three is where the magic happens.

What if we add two independent measured quantities, x and y?

Okay, so two totally different variables.

Because they are entirely independent, the probability of getting a specific combination is the product of their individual probability curves.

When you multiply two exponential functions together, you add their exponents.

And when you do the algebra on those combined exponents, the new distribution naturally reveals its width.

Yes.

And that width isn't simply the uncertainty of x plus the uncertainty of y.

It mathematically falls out as the square root of their squares added together.

The physical reason for this goes back to the fact that random errors are independent.

If you randomly overestimate your distance, there is a very good chance you might randomly underestimate your time.

Oh, so they partially cancel each other out.

Exactly.

It is highly improbable that every single variable in your calculation will be randomly skewed in the exact same direction to the maximum degree.

That makes so much sense.

Addition in quadrature mathematically accounts for this natural partial cancellation.

And step four takes this and builds the ultimate tool.

For any complex function that relies on multiple variables, we can approximate how small errors ripple through the math using partial derivatives.

Yes, partial derivatives, the real heavy lifters.

Think of a partial derivative as a measure of sensitivity.

Like how much does the final answer care if this one specific variable changes?

That's a great way to think of it.

You take the partial derivative for x, multiply it by the uncertainty of x, square it, and add it to the same process for y.

This yields the grand error propagation formula.

And the normal distribution proves it works.

It's brilliant.

Now, we just proved how to combine completely different variables.

What if we combine multiple measurements of the same variable?

Specifically, what is the uncertainty of the mean itself?

We call this the standard deviation of the mean, or sdom.

And the derivation here is one of the most satisfying things in the chapter.

I agree completely.

Because the mean is just a function of your measurements.

You add up one plus by two all the way to xn and divide the whole thing by n.

So we just apply the general error propagation formula we just talked about.

The partial derivative of the mean with respect to any single measurement is simply one over n.

When you plug one over n into the propagation formula, square it, and sum it up n times, the math collapses beautifully.

The final result is simply the original standard deviation of your measurements divided by the square root of n.

Let me repeat that because it is a massive takeaway for you listening.

The uncertainty of your average is your original standard deviation divided by the square root of n.

That's such an elegant result.

This is the mathematical reason why taking more measurements makes your final answer more precise.

But it's also a tragedy for anyone working in a lab.

A tragedy?

How so?

Because of the square root.

Your single measurement uncertainty doesn't magically get smaller just because you take more readings.

Your ruler is only so precise.

Very true.

But your confidence in the average improves by the square root of your trials.

So if you want to make your answer twice as precise, you can't just do twice the work.

You have to take four times as many measurements.

Oh yeah, that is painful.

If you want it to be ten times more precise, you have to sit there and take a hundred times as many measurements.

It is a cruel reality of statistics, but a necessary one to understand.

Now you've done all the grueling work.

You've found your mean, applied the n minus one correction, propagated your errors, and calculated your final uncertainty.

The finish line is in sight.

You finally have your best answer.

Now you compare it to the expected true theoretical value.

How do you actually know if you totally failed the experiment?

We determine this by calculating what is usually called a t -score.

You find the absolute discrepancy, just the raw distance between your best answer and the expected answer.

By just simple subtraction.

Then you divide that distance by your standard deviation.

This simply tells you how many standard deviations away your answer is from the theoretical target.

So if your answer is twelve, the expected is ten, and your standard deviation is one, your discrepancy is two.

You divide two by one, and your t -score is two.

You are exactly two standard deviations away.

Pretty straightforward.

But this immediately bays the most common question asked in every introductory physics lab.

Where is the pass -fail line?

What t -score means my data is garbage.

Right, because if I'm one standard deviation away, we already established that happens 32 % of the time purely by chance.

Being one sigma away is totally fine.

It's well within normal statistical noise.

But if you get a discrepancy of three standard deviations, that only happens by pure chance 0 .3 % of the time.

So extremely rare.

Highly suspicious.

It means either you got monumentally unlucky,

or much more likely you have an undetected systematic error ruining your data,

or the theoretical expected value is actually wrong.

Wow.

Because of this, scientists have to draw a boundary of acceptability.

And the frustrating but honest truth is that there is no universal law for this boundary.

It relies entirely on a scientist's judgment.

Yeah, context matters.

Many fields use a 5 % probability boundary, which corresponds to a t -score of 1 .96.

Anything outside that is considered unreasonably improbable and potentially unacceptable.

While others use a stricter 1 % boundary, which is a t -score of 2 .58, labeling anything beyond it is highly significant.

So if you fall outside these boundaries, the math has done its job.

It has raised the red flag.

Exactly.

But you have to go back to the physical world, check your calculations, check your equipment, look for that stretch tape measure.

Because the math can tell you something is wrong, but it takes a human scientist to figure out what is wrong.

The text leaves us with a truly fascinating caveat at the very end.

We've built this entire mathematical cathedral on the assumption of perfectly random errors forming a perfect bell curve.

But what if your real world measurements aren't perfectly normally distributed?

Yeah, what if there's a slight skew?

The beauty is that even then, the core ideas in this chapter mean as the best estimate, the standard deviation as the spread, adding independent errors in quadrature almost always work as excellent approximations.

Oh, really?

Yeah, they are incredibly robust tools, even when the real world is a little messy.

It really leaves you wondering just how deeply this mathematical bell curve is woven into the very fabric of physical reality.

We started this tutoring session talking about a chaotic, frustrating jungle of numbers.

We did.

But when you apply these theoretical tools, that chaos organizes itself into a predictable curve that reveals the hidden truth of what you are measuring.

It is the ultimate triumph of order over chaos.

And we hope walking through the theory today has helped you see that underlying order.

A huge thank you on behalf of the Last Minute Lecture team for joining us today.

You've got the theory.

You've unpacked the math.

And most importantly, you know the why.

Now, the next time you are staring at a scatter of lab data, ask yourself, where will you paint your bullseye?

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

A histogram organizing repeated measurements into bins approaches a smooth probability density function as sample size increases and bin width decreases, establishing the mathematical foundation for understanding measurement uncertainty. The normal distribution, characterized by its bell-shaped Gaussian curve, naturally emerges when measurements are affected by many small random errors without substantial systematic bias. This distribution is completely specified by two parameters: the center value representing the true measurement and the standard deviation quantifying how tightly data cluster around that center, with narrower distributions indicating greater precision. Integration of the Gaussian function yields the empirical rule that approximately 68 percent of observations fall within one standard deviation of the mean, 95 percent within two standard deviations, and 99.7 percent within three standard deviations, providing quantifiable confidence limits for experimental accuracy. Maximum likelihood principles justify using the arithmetic mean as the optimal estimate of the true value and validate the sample standard deviation as the best measure of measurement precision. A central mathematical result derives that when combining independent measurements with random uncertainties, the combined uncertainty follows quadrature addition, meaning the total error equals the square root of the sum of squared individual errors rather than simple addition of errors. The standard deviation of the mean formula demonstrates that measurement uncertainty decreases proportionally to the inverse square root of the number of observations, quantifying how repeated measurements systematically reduce random error. Practical significance testing involves calculating how many standard deviations separate an experimental result from a theoretical prediction, then determining whether such a discrepancy could reasonably occur by random chance alone. Conventional thresholds establish statistical significance at 5 percent or 1 percent probability levels, allowing researchers to distinguish between measured values that genuinely disagree with predictions versus those consistent with expected random variation. These tools collectively enable rigorous evaluation of whether experimental data support or contradict theoretical expectations.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 5: The Normal Distribution

Related Chapters