Chapter 6: Normal Probability Distributions

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome to the Deep Dive, where we sift through the sources and uncover those essential nuggets of insight.

Today, we're plunging into a concept that is, well, truly foundational to understanding so much of our world.

Think about height requirements for U .S.

Air Force pilots, or even for Disney's Tinkerbell character, or the really serious safety implications of maximum lobe limits for elevators, like the one that famously overloaded at the San Francisco airport, or that tragic turbo capsizing on Lake George.

Yeah, those aren't just random numbers or isolated incidents.

These are critical design parameters, safety considerations, questions of fit, and they all hinge on understanding how data is distributed, how it spreads out.

And that's exactly our mission today.

We're taking a deep dive into what is arguably the most important distribution in statistics.

The normal distribution.

The bell curve.

The bell curve, exactly.

We'll unpack the core ideas, the formulas that act as our statistical tools, and the procedures from Chapter 6 of Triola's Elementary Statistics.

It's a great resource.

Our goal is really to make these concepts crystal clear, connect them to your everyday life, your decision -making, maybe even your research.

Get ready for some genuine aha moments, because this really does change how you see data.

It absolutely does.

Okay, so let's start with the very bedrock, the foundation of probability,

and the standard normal distribution.

What exactly are we talking about here?

Okay, so picture a perfectly symmetrical bell -shaped curve.

That's the visual.

Right.

Now, what makes it standard are its precise properties.

Its mean, its average is exactly zero.

Its standard deviation, how spread out it is, is exactly one.

Okay, mean zero, standard deviation one.

And crucially, the total area under that entire curve is exactly one.

That's the key part.

This area directly corresponds to probability.

So area equals probability.

Hmm, that sounds a bit abstract for a curvy shape.

Can we maybe ground that idea in something a bit simpler first to get a feel for it?

Yeah, absolutely.

Let's think about a uniform distribution.

Imagine security waiting times at JFK airport.

Let's say they're distributed uniformly between zero and five minutes.

Okay, so any time between zero and five is equally likely.

Exactly.

If you graph this, it's just a simple rectangle.

The height of this rectangle would be 0 .2 because the total range is five minutes, and one divided by that range, five, gives us 0 .2.

Right, because the total area has to be one.

Five times 0 .2 is one.

Precisely.

So if you want to find the probability of waiting at least two minutes, you'd calculate the area of the rectangle from two minutes up to five minutes.

Okay, so that's a width of three minutes.

Right, width of three times the height of 0 .2 gives you 0 .6.

So the probability is 0 .6 or 60%.

Simple area calculation.

That's a great way to visualize area equals probability for a rectangle.

Much clearer.

But like you said, for our curvy bell -shaped normal curve, it's definitely not as simple as length times width.

How do we find those probabilities, those areas, for the standard normal distribution?

Yeah, you can't just multiply.

For these bell -shaped curves, we generally rely on statistical technology.

You know, software like STATISK, MINITAB, Excel, or even a graphing calculator.

Or, the old -school way, you can use a specialized reference table, like table A2 from the Triola text.

The tables.

Yeah, the tables.

The important thing to remember when using a table like A2 is that it typically gives you the cumulative area from the far left end of the curve up to a specific point, a specific z -score.

Always from the left.

Okay, let's run through a few examples, maybe, using bone density scores.

You mentioned they often follow a standard normal distribution.

Perfect example.

They're actually standardized.

So the mean is zero and standard deviation is one.

Okay, so if we want to find the probability that someone has a bone density score less than 1 .27, that's a z -score, right?

Yep, z equals 1 .27.

So using our tools, or the table we look up, 1 .27, we find the cumulative area to the left is 0 .8980.

So almost 90%.

What does that mean in plain English?

It means there's an 89 .80 % probability that a randomly selected person will have a bone density score less than 1 .27.

Or, you could say 89 .80 % of people have scores below that level.

Okay, what if we're interested in a score above Manigas 1 .0?

That's generally considered sort of normal range, right?

Yeah, a score of Manigas 1 .00 is pretty typical.

Now, remember, our tools usually give the area to the left.

So for z equals 1 .00, the area to its left is 0 .1587.

But we want the area above it to the right, since the total area is one.

Ah, you just subtract.

Exactly.

You do 1 .1587, which gives us 0 .8413.

So about 84 % of people have bone density levels above Manigas 1 .00.

Makes sense.

And what if we want the probability between two values, say, between Manigas 1 .00 and Manigas 2 .5?

I think the text mentions that's a range that might indicate osteopenia.

Yes, that's the range often associated with osteopenia or lower bone density.

So here we find the cumulative area, the area to the left for each z score.

But for z equals 1 .00, we already know it's 0 .1587.

For z equals 2 .50, looking it up, we get 0 .0062.

Very small area to the left of Manigas 2 .50.

Right.

So the area between them is just the difference between those two areas.

0 .1587, 0 .0062 equals 0 .1525.

So about 15 and a quarter percent of people fall into that specific osteopenia range.

Exactly.

That's how you find probabilities between two values.

Okay, so we can go from a z score to a probability or an area.

What about the reverse?

If we know a probability or maybe a percentile, how do we find the corresponding z score, like finding the 95th

percentile P95?

Right, this is the inverse problem.

You've got the area or probability and you need the z score that marks that boundary.

Okay.

So for P95, we're looking for the z score that has 95 % or 0 .950 of the area to its left.

So we look inside the table or use the inverse function on our software.

Precisely.

You're looking for the area of 0 .95.

Consulting our tools or the table, we find this corresponds very closely to a z score of 1 .645.

1 .645.

So that means 95 % of scores are less than or equal to 1 .645 standard deviations above the mean.

Yep.

That's the 95th percentile.

But what about finding the z scores that cut off, say, the bottom 2 .5 % and the top 2 .5 %?

Okay, good question.

For the bottom 2 .5%, we need the z score with an area of 0 .025 to its left.

Looking that up, we get z equals 1 .96.

Minus 1 .96.

And for the top 2 .5%, that means 97 .5 % or 0 .975 is to the left.

Looking at 0 .975 gives us z plus 1 .96.

Symmetrical.

Minus 1 .96 and plus 1 .96.

Exactly.

So this tells us that 95 % of all bone density scores fall between z equals 1 .96 and z equals plus 1 .96.

And these specific z scores, by the way, these boundary markers, they're often referred to as critical values.

Critical values.

Okay.

So these are like statistical tripwires, almost.

They mark the points where results start to become

significantly unusual, maybe.

That's a great way to put it, tripwires, yeah.

And we use this notation z, that z sub alpha, to denote the z score that has the specific area alpha to its right.

To its right.

So for example, that z we use 1 .96 we just found.

That's z 0 .025 because it has 0 .025 or 2 .5 % of the area to its right.

Got it.

These values are absolutely fundamental for making decisions based on data, helping us figure out if something we deserve is just random chance or if it's actually meaningful, statistically significant.

Okay, that makes sense.

So we've mapped out this perfect standardized bell curve in zero, standard deviation one.

But let's face it, the real world is messy, right?

Right.

How do we apply all this to the, well, non -standard distributions we see every day, like human heights or elevator loads or things like that?

Right.

It's rare for real data to magically have a mean of zero and a standard deviation of one.

This is where we introduce our universal translator formula.

It's really important.

Z equals x minus mu over sigma.

I remember that one.

Yeah.

This lets you take any raw data point x from any normal distribution, doesn't matter what the mean or standard deviation are, and translate it into that standard z score.

So it converts everything back to that standard scale we just talked about.

Exactly.

Once you have that z score, you can use all the probability finding methods, the tables, the software, everything we just discussed.

It standardizes the data, basically letting you compare diverse measurements, heights, weights, test scores on a single scale of how unusual is this?

Okay.

Let's try an example.

Men's heights.

The book says they're normally distributed, mean 68 .6 inches, standard deviation 2 .8 inches.

What percentage of men are taller than a standard 72 inch showerhead?

Okay.

Good practical one.

First, we convert that 72 inches into a z score using our formula.

So z, z is 72, 68 .6, 2 .8.

Okay.

72 minus 68 .6 is 3 .4.

Divided by 2 .8 gives about 1 .21.

So z is 1 .21.

Now we want the probability of being taller than this, so we need the area to the right of z is 1 .21.

Using our tools or table, the area to the left of 1 .21 is 0 .8869.

So the area to the right is 1 .8869, which is 1 .1131.

So about 11 .3%.

Yep.

Roughly 11 .3 % of men are taller than 72 inches.

So if you're designing showerheads at 72 inches, well, you're potentially making it uncomfortable for over a tenth of your male users.

That has real implications for, you know, user experience, maybe even safety.

Wow.

Yeah.

11 % is definitely not trivial for design consideration.

Okay.

Let's try another one from the chapter of problems.

The historical U .S.

Air Force pilot height requirement for women.

It used to be between 64 and 77 inches.

Women's heights, the book says, are normal, mean 63 .7 inches, standard deviation 2 .9 inches.

What percentage of women would meet this requirement?

Okay.

So here we need the probability between two values, 64 and 77 inches.

So we convert both boundary heights to z scores.

Right.

Two z scores needed.

For 77 inches, z, 77, that's 63 .7, 2 .9, that's 13 .3 divided by 2 .9 gives a z score of about 4 .59.

That's way out there.

Very tall.

Okay.

Extremely tall.

And for 64 inches, z equals 64, 61 .7, 2 .9.

That's 0 .3 divided by 2 .9, which is about 0 .10.

Okay.

So z equals real point veto and z equals 4 .59.

Now we need the area between them.

Yep.

Find the area to left of each.

The area left of z 4 .59 is basically one.

It's almost the entire distribution.

The area left of z 0 .10 is 0 .5398.

So the area between them is approximately 1 .5398, which is 0 .4602.

So just over 46%.

Yeah.

Roughly 46 % of women would meet that historical height requirement, which if you flip it means that until recently, over half of all women were ineligible based solely on height, regardless of any other qualification.

That's really powerful to see how a simple distribution can reveal such broad exclusion.

Wow.

Okay.

So we've gone from an actual value x, converted it to z and found the probability.

What about reversing that process for non -standard distributions?

Given a probability, how do we find the original x value, like the actual height or weight?

Good question.

We just need to rearrange our universal translator formula.

If cz x, then solving for x gives us x plus u.

Okay.

x equals the mean plus the z score times the standard deviation.

Exactly.

So let's say you're designing a front door and you want it tall enough to accommodate, say, 95 % of adults.

Adult heights, combining men and women maybe, are normally distributed with a mean of 66 .2 inches and a standard deviation of 3 .8 inches.

Okay.

So we're basically looking for P95 again, the 95th percentile, but for this specific non -standard height distribution.

Precisely.

We already figured out earlier that the z score for P95, the score with 95 % area to the left, is 1 .645.

Right.

z equals 1 .645.

Now we plug that z score along with the mean and standard deviation for adult heights into our rearranged formula.

x equals 60 .2 plus 1 .645, 3 .8.

Okay.

1 .645 times 3 .8 is about 6 .25 plus 66 .2.

Gives you x equals 72 .451 inches.

So rounded up, maybe 72 .5 inches.

That's the height needed for 95 % of adults.

72 .5 inches.

Now compare that to standard doors.

They seem taller?

They usually are.

The International Residential Code, for instance, often requires an 80 -inch door height.

So you can see that code is actually incredibly generous, accommodating well over 95 % of adults.

It's a great example of applying statistics to real -world design and safety

That insight into design choices.

Yeah, it's really compelling.

This also brings us back to defining significance in real data.

How do we know if a value, like a height or a weight, is truly significantly high or low?

Not just random noise or variation.

Yeah.

This is a critical concept in statistics.

We generally say a value is significantly high.

If the probability of getting that value or greater is very small, usually the cutoff is 0 .05 or less.

That's 5%.

Okay.

Px equals value 0 .05.

And similarly, it's significantly low if the probability of getting that value or less is 0 .05 or less.

Px 0 .05.

Okay.

Exactly.

This 0 .05 threshold or significance level might seem a bit arbitrary, but it's become the standard convention in many fields.

It's the bedrock of how we decide if a research finding is real or if it could have just happened by chance.

It's how we validate medical treatments, assess safety risks.

Fundamentally, how we decide what evidence we can trust.

Can you give us a practical application of this?

Maybe for significantly low birth weights.

That seems important.

Definitely.

Let's look at male birth weights.

The mean is 372 .8 grams.

Standard deviation, 660 .2 grams, assuming a normal distribution.

We want to find the birth weight that separates the lowest 5%, the significantly low ones, by that 0 .05 criterion.

We need the 6 value corresponding to the 5th percentile P5.

Right.

First, we find the Z score for the 5th percentile area, 0 .05 to the left.

That Z score is negative 1 .645.

Okay.

Same magnitude as P95, just negative.

Yep.

Because of symmetry.

Now, we use our rearrange formula.

X equals B plus Z.

So, X 3272 .8 plus negative 1 .645, 660 .2.

Okay.

Minus 1 .645 times 660 .2 is about negative 2086.

So, 3272 .8 minus 20, there's.

Comes out to approximately 2086 .8 grams.

This value marks the statistical borderline for significantly low birth weights using that 5 % cutoff.

To 186 .8 grams.

And how does that compare to medical guidelines?

Well, it's interesting.

The World Health Organization often uses a cutoff of 2500 grams for classifying low birth weight.

Our calculated value is lower, but you can see how these statistical insights directly inform and relate to those crucial medical guidelines and interventions for newborns.

Right.

It provides a data -driven basis for those cutoffs.

Okay.

This is great.

Now, let's shift gears a bit.

We've been talking mostly about individual data points.

What happens when we start taking many samples from a population and look at the distribution of their statistics, like sample means or sample proportions?

This gets us into sampling distributions and estimators.

What's the core idea here?

Yeah.

This is a really important shift in perspective.

The big idea is that when you repeatedly take samples from a population's same size samples each time and you calculate a statistic for each sample, like its mean or the proportion of yes answers or maybe its variance,

those calculated statistics themselves form their own distribution.

Okay.

A distribution of statistics, not of original data points.

Exactly.

The Triola text uses this neat anecdote about self -driving cars.

Imagine, say, 70 % of adults nationally don't feel comfortable in them.

That's the true population proportion.

Now, picture 50 ,000 slightly clueless newbie surveyors, as the book calls them.

Each taking their own random sample of, say, 1 ,000 adults.

Each surveyor calculates the proportion in their sample who are uncomfortable.

So you'd get a whole bunch of different sample proportions, right?

One might get 68%, another 72%, maybe one hit 70%.

Exactly.

They'd vary due to random chance and who they picked.

Exactly.

But here's the amazing part.

If you then took all those 50 ,000 different sample proportions and plotted them in a histogram,

what would you see?

I guess they'd cluster around the true value, around 70%.

Exactly.

You'd see a distribution that tends to look normal, like a bell shape, and it would be centered right around the true population proportion of .70.

This tendency for sample statistics to cluster around the true population value and to do so in a predictable way is absolutely key to understanding how we can make inferences about entire populations based on just one sample.

That's actually quite a revelation.

So the sampling distribution of a statistic, like the sample proportion, is essentially the distribution of all the values that statistics could take from all possible samples of the same size.

You got it.

That's the formal definition.

Okay.

What about the behavior of other key statistics?

We mentioned sample proportions tend to be normal and centered on true proportion.

What about sample means?

If we took lots of samples and calculated the mean of each one?

Same idea holds.

Sample means also tend to form a normal distribution, and the mean of that distribution, the average of all possible sample means, is equal to the true population mean.

Even if you take relatively small samples, the means of those samples will cluster around the population mean in a predictable, normal -like way.

Okay.

So sample proportions and sample means both give us these nice, centered, normal -ish sampling distributions.

What about sample variances?

If we calculate the variance for each sample, do they also behave normally?

Interestingly, no.

Sample variances tend to have a skewed distribution, often skewed to the right.

It's not symmetrical like the normal curve.

Oh, okay.

So that's different.

It is different.

However, the mean of the sampling distribution of the variance does equal the true population variance.

So even though shape isn't normal, it still targets the right value on average, which makes it very usable.

This idea of targeting the right value brings us to a really critical distinction.

Unbiased versus biased estimators.

What exactly is an estimator, and why does this unbiased bias thing matter so much for researchers or, well, anyone using data?

Okay.

So an estimator is simply a statistic like a sample mean or sample variance that we use to estimate or approximate a population parameter like the true population mean or variance.

Makes sense.

We use the sample info to guess the population info.

Right.

Now, an unbiased estimator is a special kind of estimator.

It's one whose sampling distribution has a mean that is exactly equal to the actual population parameter it's trying to estimate.

Think of it like an archer who, on average, hits the bullseye.

They might not hit it every time, but their shots aren't systematically high or low.

Okay, so unbiased estimators target the true value correctly on average.

Which statistics are unbiased?

The good news is that three key statistics we use all the time are unbiased estimators.

The sample proportion, the sample mean, and importantly, the sample variance.

They are fantastic at hitting the bullseye on average when estimating their corresponding population parameters.

So sample mean targets population mean means sample proportion targets population portion bulls and sample variance gen targets population variance sigma two tallers.

That's good to know.

It's very good to know.

So why does this matter so much?

If an estimator is biased, does that mean we just can't use it?

We just throw it out?

Not necessarily.

It just means we need to be aware of its limitations.

A biased estimator is one whose sampling distribution mean is not equal to the population parameter.

It systematically overestimates or underestimates the true value on average.

Like a crooked archer who always shoots a bit to the left.

Kind of, yeah.

Examples of biased estimators include the sample median, the sample range, and perhaps surprisingly, the sample standard deviation.

Wait, the sample standard deviation is biased, but the sample variance is unbiased.

How does that work?

It seems counterintuitive, doesn't it?

But yeah.

While a PALS tool unbiasedly estimates sigma two a two, taking the square root introduces a slight bias.

So sigma doesn't perfectly target sigma on average.

However, the bias in the sample standard deviation is usually very small, especially for larger samples.

Oh, okay.

So it's technically biased, but often practically okay for large samples.

What about the sample range?

Is that badly biased?

It can be, yeah.

Let's take that small population example from the book, four, five, nine.

The true population range is nine dollars and four violars is five feet five.

Right.

Now, if you list all possible samples of size N2, like four, four, four, five, four, nine, five, etc., and calculate the range for each sample, and then find the average of all those sample ranges, it turns out the mean of the sample ranges is only 2 .2.

2 .2.

That's way off from the true range of five.

Exactly.

It clearly shows the sample range is a biased estimator.

It tends to underestimate the true population range.

So while it's easy to calculate, we have to be careful using it to infer the population spread.

Understanding which estimators are biased and unbiased is crucial for choosing the right statistical tools and interpreting results correctly.

Got it.

Know your tools, know their limitations.

Okay.

Now, for what you call the statistical superpower, the central limit theorem, this sounds incredibly important.

Can you break down what makes it such a, well,

game changer?

It truly is remarkable.

The central limit theorem, or CLT, is one of the most fundamental concepts in statistics, and it underpins so much of what we do later on, especially with inference.

Here's the gist.

Even if the original population data is not normally distributed.

Okay, so it could be skewed or uniform or just weird.

Exactly.

Think of something like, say, commute times in a city often heavily skewed with a long tail of really long commutes, or maybe income data, or even just rolling a die, which is a uniform distribution.

The CLT says that if you take samples from such a population and you calculate the mean for each sample, the distribution of those sample means will tend to become normal, or at least approximately normal.

Wait, really?

The distribution of the averages becomes normal, even if the original data wasn't?

Yes.

Provided the sample size n is sufficiently large.

Sufficiently large is a bit vague, but a calming rule of thumb is that if your sample size n is greater than 30, the sampling distribution of the mean will be approximately normal, regardless of the original population shape.

Wow.

So even if individual Boston commute times are all over the place and skewed, if I take many, many samples of, say, 50 commute times each and I calculate the average commute time for each sample, those averages will form a nice, predictable bell -shaped normal distribution.

That is genuinely incredible.

It is.

It's like order emerging from chaos in a statistical sense.

And what are the specific parameters for this sampling distribution of the mean that emerges?

Does it have the same mean and standard deviation as the original population?

Good question.

The will be the same as the original population mean.

So the sample means cluster around the true population mean, which makes sense.

Okay, the center is in the right place.

What about the spread, the standard deviation?

Ah, the standard deviation is different, and this is crucial.

The standard deviation of the sample means denoted sigma bar, sigma sub x bar, is smaller than the population standard deviation sigma.

We calculate it as sigma squirtin.

Sigma divided by the square root of n.

Okay, the sigma has a special name, doesn't it?

Yes, it's called the standard error of the mean, or SEM, the standard error.

That squirtin in the denominator is key.

It tells us that as the sample size n gets bigger, the standard error gets smaller.

Ah, so bigger samples mean, the sample means cluster even more tightly around the true population mean, less variability in the sample averages.

Exactly, which means our sample mean becomes a more precise estimate of the population mean as our sample size increases.

That's the power of large samples combined with the CLT.

This raises a really important practical point then.

When we're calculating z -scores, when do we use the original population standard deviation versus this new standard error?

What's the critical difference in how we apply the formulas?

I can see this tripping people up.

It's a super common point of confusion, absolutely.

Here's the rule.

If you're dealing with a single observation x drawn from a normally distributed population, you use the standard formula we learned earlier, sigma.

Okay, individual value, use sigma.

But if you're dealing with the mean of a sample, x bar, you must use the standard error in the denominator.

The formula becomes zero meal, zero equals zero man sigma 10.

Okay, sample mean, use sigma over root n, the standard error.

Got it.

The denominator changes depending on whether it's an individual or a sample average.

Precisely.

And notice that the denominator sigma is much smaller than sigma, assuming n1, which means the z -score for a sample mean will often be much larger in magnitude than for an individual value the same distance from the mean.

This reflects the fact that sample means are less variable than individual scores.

Let's make this concrete with that Boeing 737 airline seat example from the book.

Adult male hip widths are normally distributed, being 14 .3 inches, standard deviation 1 .9 inches.

An engineer is thinking about a seat width of 16 .0 inches.

Part A, what's the probability an individual male passenger has a hip width greater than 16 .2 inches?

Okay, this is an individual value, x equals 16 .0, so we use sigma as our dollar, 16 .0, 14 .3, 0 .99, so that's 1 .7 divided by 0 .9.

About 1 .89.

Yeah, z equal 1 .89.

We want the area to the right, greater than.

The area to the left of 1 .89 is 0 .9706, so the area to the right is 1 .9706, so the area to the right is 1 .9706 equals 0 .0294.

So just under 3 % probability that a single randomly selected male won't fit comfortably withwise.

Correct, about a 3 % chance for any given individual male passenger.

Okay, now part B, same airline, same seats, what's the probability that the mean hip width of 126 males, roughly a full flight, is greater than 16 .0 inches?

Uh -huh, now we're dealing with a sample mean is $16 and a large sample size n is 126, so we need the central limit theorem and the standard error.

Right, use sigma over root n.

The mean of the sample means is still a population mean 14 .3, but the standard deviation of the sample means the standard error is 0 .9.

Score root of 126 is about 11 .2, so 0 .9 divided by 11 .2.

This is about 0 .0802, much smaller standard deviation now.

Wow, much smaller spread for the sample mean.

So now we calculate the z -score for the sample mean, 16 .014 .3, 0 .08022, that's 1 .7 divided by 0 .0802.

That's huge, like 21 .2.

Exactly, a z -score of 21 .2.

What's the probability of getting a z -score greater than 21 .2?

Be basically zero, right, off the charts.

Effectively zero, an unimaginably tiny number.

It's virtually impossible for the width of 126 males to be that high.

Okay, so we have two probabilities.

Almost 3 % for an individual and basically 0 % for the mean of 126 individuals.

So what does this tell the seed designer?

Which result is actually more relevant here and why?

This is such a critical point.

Part A, the probability for the individual is far more relevant for designing that seed.

Why?

Because individual seats are occupied by individuals.

A 3 % chance that any given male passenger won't fit comfortably is a significant issue.

That's potentially several passengers on every flight facing discomfort, maybe even safety issues if they can't buckle properly.

Right, you don't average out comfort across 120s people.

Exactly.

The near zero probability in Part B just tells us it's virtually impossible for the average hip width of the whole plane to exceed 16 inches.

That doesn't help the individuals who do exceed it.

Designing based on the average of a large group rather than considering the variation and needs of individuals within that group can lead to serious design flaws, discomfort, and safety problems.

It really highlights the importance of understanding which probability individual or sample mean applies to your specific question.

That's a fantastic illustration.

And you mentioned that the central limit theorem is also foundational for hypothesis testing and this idea called the rare event rule.

Can you explain how that connection works?

How does the CLT let us test assumptions?

Right.

The rare event rule is a really intuitive concept.

It basically says if under a given assumption, like assuming a coin is fair or assuming a certain population mean is correct, the probability of observing a particular event like getting nine heads and 10 flips or getting a specific sample mean is extremely small, then we should probably

assumption was incorrect.

Okay.

If what we see is super unlikely based on our assumption, maybe the assumption is wrong.

Exactly.

Let's use that classic example.

The common belief that the mean human body temperature is 98 .6 degrees ferrous.

Let's assume that's true.

One down's $98 .666.

And we know from data the standard deviation is about 0 .62 degrees ferrous.

Okay.

Assumption mean is 98 .6.

Now imagine we take a sample of 106 healthy adults and 106 and we find their mean body temperature is 98 .2 degrees ferrous.

That seems a bit lower than 98 .6, but is it low enough to doubt the 98 .6 number?

That's the question we ask.

What's the probability of getting a sample mean of 98 .2 degrees aref or lower if the true population mean really is 98 .6 degrees arecs?

This is where the CLT comes in.

Because we're dealing with a sample mean.

Right.

We need the sampling distribution of mean for N106.

Its mean is assumed to be 98 .6.

Its standard error is a

Square root of 106 is about 10 .3.

So 0 .62 divided by 10 .3 is about 0 .0602.

That's our standard error.

Now we find the z -score for our observed sample mean.

Line 8 dollars sue 2 dollars.

Sigma 0 .06022.

That's 98 .4 divided by 0 .0602, which is about 98 .6 .64.

Whoa.

A z -score of 98 .6 .64.

That's really far out in the left tail.

What's the probability of getting a z -score of 96 .64 or lower?

It must be incredibly tiny.

Like the airline seat example.

Practically zero again.

It's extremely profoundly small.

Using software, it comes out to be around 0 .0 .0 .0 .0 .0 .0 .0.

Basically zero for all practical purposes.

Let's just say roughly 0 .00000000001 from the book Calculation Approximation.

Okay.

An incredibly rare event if 98 .6 was the true mean.

So what does 98 .2 degrees fare is significantly low if the true mean were actually 98 .6 degrees fair?

It's so incredibly unlikely to happen just by random chance under that assumption that it provides very strong evidence against the assumption.

It strongly suggests that the common belief that the average human body temperature is 98 .6 degrees fair is actually incorrect.

And this is precisely how statistics works.

It allows us to use sample data to rigorously challenge long -held assumptions, drive new scientific understanding, and make evidence -based conclusions.

The CLT makes this kind of powerful inference possible.

That's a perfect example of statistics in action.

Okay.

Finally, we need to address a really crucial practical question.

Assessing normality.

We've talked a lot about normal distributions, but how do we actually know if our own data follows one?

And why does it even matter if our data looks normal?

Yeah.

This is super important.

It matters immensely because many of the most common and powerful statistical methods, especially the inferential ones we use to compare groups or build models like t -tests or ANOVA, which we'll likely cover in future deep dives, actually require or assume that your data comes from a population that is at least approximately normally distributed.

So the math behind those tests relies on that bell shape.

Pretty much.

Yes.

If your data wildly violates that assumption, if it's heavily skewed or has multiple peaks or something, then applying those standard methods might give you incorrect p -values, misleading confidence intervals,

basically wrong conclusions.

So you need to check first.

Okay.

So how do we check?

What are the key methods for assessing if our data looks reasonably normal?

There are a few common ways.

The first quick and dirty check is often just to construct a histogram of your data.

Just look at the picture.

Yeah.

Just visually inspect it.

Does it look roughly bell -shaped?

Is it reasonably symmetrical?

Or is it obviously skewed way off to one side?

Do they have big gaps or multiple humps?

If the histogram looks drastically non -normal, that's your first clue.

Okay.

So histogram first.

But visual inspection can be subjective, right?

What if it looks kind of symmetrical, but we need something a bit more rigorous for our assessment?

Absolutely.

For a more formal check, you typically use a normal quantile plot, which is sometimes also called a normal probability plot or Q -Q plot.

Normal quantile plot.

Okay.

How does that work?

It's a clever idea.

It's basically a scatter plot.

On one axis, you plot your actual sorted data values.

On the other axis, you plot the z -scores you would expect to get for those ranks if the data came from a perfect standard normal distribution.

Okay.

Plotting actual data against expected normal z -scores.

How do we interpret that visually?

What are we looking for?

You're looking for a straight line.

If the data is indeed normally distributed, the points on the normal quantile plot should fall reasonably close to a straight diagonal line.

There might be a little wiggle, especially at the tails, but there shouldn't be any obvious systematic curve or pattern away from that line.

Okay.

Straight line equals normal.

What if it's not normal?

What would we see?

If the data is not normal, the points will deviate systematically from a straight line.

For example, if your data is skewed to the right, the plot will typically

If it's skewed left, it curves downwards.

If it's uniform, you might see an s -shape.

Any clear, systematic, non -linear pattern is evidence against normality.

Okay.

That seems like a more objective visual test than just the histogram.

Let's think about those Dallas commute times again.

The book mentions if you take a really small sample, maybe just five commute times, like 20, 16, 25, 10, 30 minutes, what might the plot look like then?

Well, with only five points, a normal quantile plot might happen to look reasonably straight just by chance.

Small samples can be really deceiving when assessing normality.

You don't have enough data to see the true underlying shape clearly.

Ah, that's a really important warning.

Small samples can fool you.

Definitely.

But now, contrast that with the full data set.

The book uses 1 ,000 Dallas commute times.

If you plot those 1 ,000 points on a normal quantile plot.

I bet it doesn't look straight then.

Not at all.

The histogram for the full 1 ,000 times is clearly skewed to the right.

Lots of relatively short commutes, but a long tail of very long ones.

And the normal quantile plot for that large data set shows a very clear pronounced curve away from a straight line.

In that case, with the large sample, we can be much more confident in concluding that Dallas commute times are not normally distributed.

So sample size really matters for making a good assessment.

Are there any other practical things to keep in mind when we're checking for normality?

Yeah, a couple of things.

First,

always be aware of outliers, extreme values that are far away from the rest of your data.

Outliers can heavily distort both histograms and normal quantile plots, making data look non -normal when the bulk of it might be fine, or vice versa.

You need to investigate outliers.

Are they errors or are they real interesting data points?

Good point.

Check the outliers.

Anything else?

Well, sometimes if your data isn't normal, you might be able to use a data transformation like taking the logarithm or the square root of each data value.

Sometimes transforming the data can make it look more normally distributed.

And then you can use those statistical methods that assume normality on the transformed data.

It's a bit more advanced, but it's a useful trip to have in your back pocket.

Okay.

Transformations is a potential fix.

Got it.

Wow.

We have covered an immense amount of ground today.

We've really navigated the foundational concepts of the standard normal distribution, that perfect bell curve, the bedrock.

Then we saw how to apply it to real world non -standard data using that z -score formula, everything from designing doors and airplane seats to analyzing birth weights.

That translation is key.

We dug into how sample statistics behave with sampling distributions,

distinguishing between those useful unbiased estimators like the mean and variance and the biased ones like the range.

Knowing your estimators.

And then the big one, the central limit

Understanding how normality emerges from sample means, even from non -normal data, and how that powers hypothesis testing like challenging the 98 .6 body temp net.

The CLT really is the superpower.

And finally, we learned practical ways to actually check if our own data fits that normal pattern using histograms and those normal quantile plots.

Crucial checks before applying many tests.

You know, what's really fascinating here, stepping back, is the surprising power of normal distribution.

It's not just that it shows up a lot in nature, heights, IQ scores, things like that.

Right, that's common knowledge.

But it's almost, I don't know, magical ability to emerge from messy, chaotic,

definitely non -normal data, but only when you start looking at the averages of samples.

That's the CLT effect.

It's truly profound.

It's like this underlying order that statistics helps us uncover.

Taking seemingly random information and finding these predictable patterns within.

Patterns we can then use for everything from manufacturing quality control to public health and safety assessments.

It is a powerful insight, isn't it?

Helping us make sense of a really complex world, one data point and maybe one deep dive at a time.

Thank you so much for joining us on this exploration today.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Normal distributions form the mathematical backbone of statistical inference, characterized by a symmetric bell-shaped curve where the mean, median, and mode converge at a single point. Two parameters completely define any normal distribution: the mean determines its horizontal position while the standard deviation quantifies the horizontal spread of observations around that central location. The empirical rule offers a straightforward mechanism for predicting data concentration, revealing that roughly 68 percent of values cluster within one standard deviation of the mean, 95 percent within two standard deviations, and 99.7 percent within three standard deviations. Converting individual measurements into z-scores rescales raw data to a standardized form, indicating precisely how many standard deviation units a particular observation sits above or below the mean and facilitating meaningful comparisons between measurements collected on entirely different scales. Determining areas beneath the normal curve and calculating tail probabilities requires either standard normal distribution tables or computational tools, skills that students practice in both directions: finding probabilities for given data values and locating data values corresponding to specified probability thresholds. Real-world applications permeate quality assurance programs that monitor manufacturing outputs, educational testing organizations that interpret assessment results, and natural sciences that model physical measurements throughout populations. The central limit theorem represents perhaps the most consequential principle in this chapter, establishing that sample means drawn repeatedly from virtually any population distribution themselves form an approximately normal distribution when sample sizes are sufficiently large, even when the underlying population deviates substantially from normality. This profound result explains the extraordinary utility of the normal model across disciplines and validates its use in practical inference problems. Mastery of these relationships between parameters, areas, probabilities, and real data values equips students to recognize outliers, construct probability-based forecasts, and access the theoretical foundations required for constructing confidence intervals and executing hypothesis tests.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 6: Normal Probability Distributions

Related Chapters