Chapter 7: Sampling Distributions

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Have you ever found yourself trying to understand, say, an entire city just by chatting with a few people?

Or maybe predicting a huge election based on a really small poll.

Yeah, it definitely leaves you wondering, right, how much can you actually trust what you've found from such a tiny slice of the whole picture?

It's a fundamental challenge, making sense of big situations from limited data.

How do we draw reliable conclusions about a whole population using just one sample?

And importantly, what are the limits?

Well that's exactly our mission today on The Deep Dives.

We're taking a journey into Chapter 7, Sampling Distributions from the Practice of Statistics, 6th edition.

And our goal for you, our listener, is to really pull back the curtain on how we make these big inferences from small samples.

We'll unpack the core ideas, the vocab, the formulas.

Using real examples and pointing out some common trip ups, especially think about exam scenarios.

We want you to really visualize these concepts.

We'll kick things off by nailing down the difference between what we measure in our sample and what we want to know about the whole population.

Then we'll see how just the act of taking samples creates these things called sampling distributions.

Understanding their characteristics, their shape, their center, their spread, is absolutely key for reliable statistics.

And we'll focus particularly on two really common statistics.

Sample proportions, which we often call p -hat.

And sample means, or x -bar.

So yeah, let's dive in.

OK, first things first.

The absolute bedrock concept here is the difference between a parameter and a statistic.

Right, so parameter.

Think of that as the true value for the entire population.

It's usually what we're interested in, but often we don't actually know it.

It's fixed.

Exactly.

And the statistic, that's what you calculate from your sample data.

You know this number because you just figured it out from the data you collected.

And we use that statistic as our best guess, right?

Our estimate for the unknown parameter.

Precisely.

Think about the current population survey.

They survey, what, 60 ,000 households?

Something like that, yeah.

The proportion of unemployed people in that specific example, that's their statistic.

They call it p -hat.

But they use that number to estimate the true national unemployment rate for everyone, the population parameter, p.

Perfect example.

Or imagine a battery company.

They test maybe 50 batteries from a production run.

OK.

The average lifetime of those 50 batteries is their sample mean, x -bar.

That's their statistic.

And they use that to estimate the true average lifetime for all the batteries made that hour.

You got it.

And a handy little mnemonic, statistics come from samples.

And parameters come from populations, S and P, simple but effective.

And a quick tip, especially for exams,

when you define a parameter, make sure you use words like all or true.

Good point.

To make it super clear, you mean the whole population, not just the sample you looked at.

Exactly.

So avoid that ambiguity.

OK, so statistics come from samples.

But what if I took a different random sample from that same population?

I probably wouldn't get the exact same statistic value, would I?

Almost certainly not.

And that leads us straight into sampling variability.

It sounds kind of like a problem, but it's not really an error, is it?

Not at all.

It's totally expected.

It's just the natural variation you get because, well, different samples contain different individuals.

It's inherent in random sampling.

Right.

It's like grabbing a handful of M &Ms.

You don't expect every handful to have the exact same number of reds.

Perfect analogy.

Understanding this variability is crucial so we don't overreact to the results of just one sample.

So this natural variation brings us to the main event of this chapter,

the sampling distribution of a statistic.

Yes.

This is a bit more abstract, but hang with us.

It's the distribution of values your statistic would take if you looked at all possible samples of the same size from the same population.

OK, all possible samples.

That sounds like a lot.

How do we visualize that?

Let's use a small example from the book.

John and Carol's sons have heights, 71, 75, 72, and 68 inches, a tiny population of four.

Right.

Now, if we take simple random samples of size two, how many different pairs of sons could we pick?

Oh, let's see, four choose two.

That's six possible pairs.

Exactly, six possible samples.

Now, for each of those six samples, calculate the average height, the sample mean.

OK, so for 71 and 75, the mean is 73.

For 71 and 72, it's 71 .5, and so on for all six pairs.

Right.

You'd end up with six different sample means.

Now, imagine plotting those six mean values on a simple dot plot.

Ah, OK.

So that collection of dots showing where all the possible sample means landed, that's the sampling distribution.

For the sample mean with n and 2 from this population.

You nailed it.

That dot plot is the sampling distribution of the sample mean in this specific scenario.

It shows every possible outcome for x -bar and how often each occurs.

And we could do the same thing for a different statistic, maybe like the sample range.

Calculate the range for each of the six samples and plot those.

Precisely.

Each statistic mean, proportion, range, standard deviation has its own sampling distribution.

Got it.

It's a theoretical map of all the possibilities.

And this is where it's super important, especially for clear communication, to distinguish between three types of distributions.

This trips people up.

OK, what are they?

First, the population distribution.

That describes the variable for all individuals in the population.

Like, for Mrs.

Lin's chips, it's the colors of all 200 chips in the back.

OK, the whole enchilada.

Second, the distribution of sample data.

This is just the data from your one specific sample.

So if Jenna draws 20 chips and gets seven red, the distribution of those 20 specific colors is her sample data distribution.

Right, what I actually observe.

And third, the sampling distribution of a statistic.

That's the one we just built the distribution of the statistics values, like p -hat or x -bar, from all possible samples of the same size.

Like the dot plot of the six sample means, or maybe a simulation showing hundreds of sample proportions.

Exactly.

So the key takeaway, never just say the distribution.

Always specify which distribution you're talking about, population, sample data, or the sampling distribution of a specific statistic.

Crystal clear.

OK, now let's talk about the properties of these sampling distributions.

What about their center?

Do these statistics, on average, hit the right target?

Great question.

That brings us to bias.

We call a statistic an unbiased estimator if the mean of its sampling distribution is equal to the true value of the population parameter it's estimating.

So it doesn't consistently aim too high or too low.

Exactly.

Think back to John and Carol's sons.

The true mean height of all four sons is 71 .5 inches.

Right.

And if you calculated the average of those six possible sample means we found earlier.

Let's see, mumbling calculations.

It also comes out to 71 .5.

Bingo.

So the sample mean x bar is an unbiased estimator of the population mean mu.

On average, it gets it right.

OK, that makes sense.

But what about the sample range we mentioned?

Ah, good counterexample.

The true range of the sons' heights is 7 inches, 75 minus 68.

But if you average the ranges from all six possible samples, you get only 3 .67 inches.

So the sample range consistently underestimates the true population range.

It does in this case.

That makes the sample range a biased estimator of the population range.

Interesting.

Does this relate to why we divide by n1 for sample standard deviation?

I've always wondered about that.

It absolutely does.

It turns out that the sample variance squared, calculated with n1 in the denominator, is an unbiased estimator of the population variance sigma square.

If we just divided by n, it would be biased.

It would tend to underestimate the true population variance.

So the n1 is a correction factor to remove that bias.

That finally makes sense.

Precisely.

It adjusts the sample variance just enough so that, on average, it hits the population variance target.

OK, so unbiasedness is good.

What about the spread, the variability of the sampling distribution?

Also crucial, we want low variability.

A sampling distribution with less spread means our statistics value is likely to be closer to the true parameter value, regardless of which sample we happen to get.

It's more precise.

So tighter clustering around the true value is better.

How do we achieve that?

The biggest factor is sample size.

Look at the survivor example in the book.

If the true proportion watching is, say, 0 .37, samples of size n, e to 1 ,000 will produce p hat values that are much more tightly clustered around 0 .37 than samples of size n100.

The spread, the standard deviation of the sampling distribution, shrinks as n gets bigger.

So bigger sample size means less sampling variability, more precision.

Makes sense.

And here's something that surprises many people.

The variability of a statistic depends mainly on the sample size, not the population size, as long as the population is much larger than the sample.

Wait, really?

So taking a sample of 1 ,000 people gives roughly the same precision whether you're sampling from New York City or Cheyenne, Wyoming, assuming the population is big enough.

Pretty much, yeah.

The textbook uses the analogy of scooping corn kernels.

The variability in your scoop depends on the size of your scoop, sample size, not really on whether the container holds a ton or just 50 pounds, population size, as long as the container is much bigger than your scoop.

Wow, that's counterintuitive but important.

What's that rule of thumb for much bigger?

That's the 10 % condition we'll keep coming back to.

Your sample size n should be no more than 10 % of the population size n.

If that holds, you can mostly ignore the population size when thinking about the spread of the sampling distribution.

Got it.

So summarizing desirable properties.

We want estimators that are unbiased, accurate on average and have low variability, precise.

Exactly, think of a target.

High bias means your shots are centered off the bullseye.

High variability means your shots are scattered all over.

Low bias, low variability means your shots are tightly clustered right on the bullseye.

That's the goal.

Perfect, and just to clarify terminology, sometimes people mix up accuracy and precision.

Right, bias relates to accuracy are you centered on the target.

Variability relates to precision.

How spread out are your shots?

An increasing sample size mainly improves precision, reduces variability.

It doesn't fix bias if your sampling method itself is flawed.

Good distinction.

Okay, let's get specific.

First up, sample proportions, p -hat.

When do we use these?

We use p -hat when our variable of interest is categorical.

We wanna estimate the proportion of a population that falls into a certain category.

Like the proportion of voters favoring a candidate or the proportion of defective items

or the proportion of orange M &Ms.

Exactly.

Now the sampling distribution of p -hat has specific rules.

First, it's mean.

The mean of all possible p -hat values written V sub p -hat is equal to the true population proportion, p.

So p -hat is an unbiased estimator of p, check.

Check.

Now the standard deviation of p -hat, sigma sub p -hat.

The formula is the square root of p times one minus p divided by n.

Okay, square root of p one p n, what does that number tell us?

It measures the typical distance or the standard amount of error between a sample proportion p -hat and the true population proportion p due to sampling variability.

Gotcha.

And does that formula always work?

Almost.

You need that 10 % condition again.

Your sample size n must be no more than 10 % of the population size n.

This ensures observations are independent enough for the formula to be accurate.

Right, the corn scoop analogy again.

Don't sample too large a fraction of the population.

Okay, I mean standard deviation.

What about the shape of the sampling distribution of p -hat?

Ah, the shape depends on the sample size and the population proportion.

But the good news is if the sample size is large enough, the sampling distribution of p -hat becomes approximately normal.

Approximately normal.

Like the bell curve, how large is large enough?

That's where the large counts condition comes in.

You need to check two things.

Is n times p greater than or equal to 10?

And d is n times one minus p also greater than or equal to 10?

Okay, n p o 10 and n one p.

So you need at least 10 expected successes and at least 10 expected failures in your sample for the normal shape to kick in.

You got it.

If p is really close to zero or one, you'll need a larger n to satisfy this condition compared to when p is near 0 .5.

Makes sense.

So if the 10 % condition holds for the standard deviation formula and the large counts condition holds for the shape, then we can model the sampling distribution of p -hat using a normal distribution.

Precisely, which is incredibly useful for calculating probabilities.

Let's walk through an example.

Say 35 % of all first -year college students attend a college within 50 miles of home.

So p o .35.

We take a random sample of n equals 1 ,500 students.

What's the probability our sample proportion p -hat falls between 0 .33 and 0 .37?

So within two percentage points.

Okay, step -by -step.

First, the parameter is p o equals 0 .35 sample size n equals 1 ,500.

Check the conditions.

10 % condition.

Is 1 ,500 less than 10 % of all first -year college students?

Yeah, definitely safe to assume that.

Now calculate the mean and standard deviation of the sampling distribution of p -hat.

The mean no sub p -hat is just p, so 0 .35.

Okay, and the standard deviation sigma sub p -hat is the square root of 0 .35, 0 .35, 1 ,500.

Let me calculate that.

It sounds like about 0 .0123.

Right, about 0 .0123.

Now check the large counts condition for shape.

Is n p e 1, 1 ,500, 0 .35 equals 525.

Yep, way bigger than 10.

And n 1 p, 1 ,500, 0 .65 equals 975.

Also much bigger than 10.

Great, so both conditions met.

We can say the sampling distribution of p -hat is approximately normal with mean 0 .35 and standard deviation 0 .0123.

So now we just need to find the probability of being between 0 .33 and a 0 .37 on that normal curve.

We'd find the z -scores for 0 .33 and 0 .37.

Exactly, z for 0 .33 is 0 .330, 0 .35, 0 .0123,

which is about modicle 1 .63.

And z for 0 .37 is point echo 0 .35 at 0 .0123, which is positive 1 .63.

So we need the area under the standard normal curve between z equals 1 .63 and z equals plus 1 .63 using a table or calculator.

That comes out to roughly 0 .896 or about 90%.

Right, so there's about a 90 % chance that our sample proportion from 1 ,500 students will be within two percentage points of the true population proportion of 35%.

That shows how larger samples give us results that are likely quite close to the truth and understanding this is the foundation for things like confidence intervals, right?

Absolutely, this whole framework is what makes inference possible in chapter eight and beyond.

Cool, okay, that covers sample proportions p -hat for categorical data.

Let's switch gears to sample means x -bar for quantitative data.

Right, things like average income, mean cholesterol level, average reaction time, anything where the underlying data is numerical and we're interested in the average.

So what about the sampling distribution of x -bar?

What are its rules?

Very similar structure, actually.

First, the mean of the sampling distribution of x -bar, sub x -bar, is equal to the true population mean me.

Okay, so x -bar is also an unbiased estimator, this time for the population mean me.

Good.

Second, the standard deviation of the sampling distribution of x -bar, sigma sub x -bar, is the population standard deviation sigma divided by the square root of the sample size n, so score where n.

Sigma over root n, and I bet the 10 % condition applies here too.

You bet, need n, 0 .10n for this formula to be accurate, ensuring independence.

And it's super important to remember that division by score when it shows that averages are less variable than individual measurements.

Crucial point, averages tend to cancel out the highs and lows, making them cluster more tightly than the original population data.

Don't forget that square root of n.

Okay, mean is the standard deviation to score when n, if 10 % condition met.

Now the big one, what about the shape of the sampling distribution of x -bar, is it always normal?

Ah, here it gets interesting.

There are two main scenarios.

Scenario one,

if the population distribution itself is normal, then the sampling distribution of x -bar will also be normal, no matter what the sample size is, even for n2.

So if individual heights are normally distributed,

then the distribution of average heights from samples of say 10 people will also be perfectly normal.

Perfectly normal, it will just be narrower, more peaked, because its standard deviation is squirt, which is smaller than the population.

Okay, that's scenario one.

Normal population implies normal sampling distribution for x -bar, that's scenario two.

Scenario two is where the magic happens.

This is the central limit theorem, or CLT.

It's one of the most amazing results in statistics.

Sounds important, what does it say?

The CLT says that even if the population distribution is not normal, it could be skewed, bimodal, whatever, as long as the sample size n is large enough, the sampling distribution of the sample mean x -bar will be approximately normal.

Whoa, so the shape of the sampling distribution becomes normal even if the population it came from wasn't, just by taking a big enough sample.

Isn't that wild?

The process of averaging across many individuals in a large sample somehow smooths out the irregularities of the original population and produces that bell curve shape for the sample means.

That is remarkable, how large does it have to be for the CLT magic to work?

The common rule of thumb, widely used in AP statistics, is that if your sample size n is greater than or equal to 30, you can generally assume the sampling distribution of x -bar is approximately normal, regardless of the population shape.

And 130.

Got it.

So if n30, we can only assume normality for x -bar if we know the original population was normal.

That's the safe approach, yes.

Below 30, you need information about the population shape.

At or above 30, the CLT generally kicks in.

Okay, let's try an example using the CLT.

The book mentions Keith's Auto Center, where oil change times are right skewed.

The population mean is 30 minutes, and the population standard deviation is 20 minutes.

He takes a sample of n40 customers.

Right, so population is skewed, but n40, which is 30.

First, let's find the mean and standard deviation of the sampling distribution of x -bar.

Okay, the mean a sub x -bar is just, so 30 minutes.

The standard deviation sigma sub x -bar is score t, so 20 score t40, that's 20 divided by about 6 .32, comes out to roughly 3 .16 minutes.

And we need the 10 % condition.

Is 40 customers less than 10 % of all Keith's customers?

Let's assume yes.

A fair assumption for a business.

Now, the shape.

Since n40 is a 30, the central limit theorem tells us the sampling distribution of x -bar is approximately normal.

So approximately normal with mean 30 and standard deviation 3 .16.

Now we can calculate probabilities.

What's the chance the average time for these 40 customers exceeds 35 minutes?

We find the z score for 35.

Z being greater than 1 .58 on a standard normal curve is looking it up about 0 .0571.

Right, so there's only about a 5 .7 % chance that the average time for 40 customers will exceed 35 minutes, even though individual times are quite variable and skewed.

The averaging really pulls things toward the center and makes the distribution more predictable and normal thanks to the CLT.

Exactly, it's a powerful tool for inference about means.

Okay, let's quickly recap and compare p -hat and x -bar side by side.

What are the key similarities and differences in their sampling distributions?

Similarity one, center.

Both p -hat and x -bar are unbiased estimators.

The mean of the sampling distribution of p -hat is p and the mean for x -bar is b.

They target the right value on average.

Similarity two, variability.

For both, the variability, standard deviation, decreases as sample size n increases because n is in the denominator of both standard deviation formulas under a square root.

More data means more precision.

And the 10 % condition is needed for both standard deviation formulas to be accurate.

Okay, main difference then must be the shape conditions.

Exactly.

For p -hat, we need the large counts condition n -po10 and n1 -po10 for the sampling distribution to be approximately normal.

Whereas for x -bar, it's approximately normal if either the original population was normal or if the sample size is large in Uther because of the central limit theorem.

That's the key distinction on shape.

Perfect summary.

And just hammering home some final general tips.

Always, always specify which distribution you mean.

Population, sample data, or sampling distribution of statistic name.

Use notation correctly.

PVS p -hat varies as x -bar versus S.

If you're unsure,

write it out in words like the population proportion or the sample mean.

Check those conditions.

10%, large counts, n -030 for CLT.

They justify your calculations, don't skip them.

And finally, interpret your results.

Don't just give a number.

Explain what it means in the context of the problem.

Absolutely, context is king.

Okay, wow.

We've really unpacked a lot in this deep dive into chapter seven.

We've seen how sampling distributions bridge the gap between a single sample and the whole population.

You should now have a solid grasp on how statistics like p -hat and x -bar behave across many samples, why they're useful of estimators, and how their reliability changes with sample size.

Yeah, the big takeaways for me are that variability shrinks with bigger samples, making our estimates more precise.

And that incredible central limit theorem, allowing us to use the normal distribution for sample means so often, even when we don't start with a normal population, it's like a statistical superpower.

So what does this all mean for you, the listener?

Well, the next time you encounter a poll result, a study's average finding, anything based on a sample.

You can think beyond just the number reported.

You can ask, okay, but how much sampling variability might there be?

How precise is that estimate likely to be, given the sample size?

Did they meet the conditions to use the methods they used?

This chapter really gives you the tools to be a much more critical and informed consumer of data, which is everywhere.

It's about understanding not just the what, but the how reliable.

From all of us at the deep dive and the last minute lecture team, thank you for joining us on this exploration of sampling distributions.

We hope it helps you nail these.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Sampling distributions form the essential foundation linking probability concepts to statistical inference methods used in research and data analysis. The distinction between population parameters and sample statistics establishes why understanding how statistics vary across repeated samples proves critical for drawing reliable conclusions. When a statistic's expected value equals the true population parameter across many samples, it qualifies as an unbiased estimator, a fundamental property ensuring accuracy in estimation procedures. Sample size emerges as a primary determinant of estimation precision, with larger samples consistently producing less fluctuation in calculated statistics and consequently more dependable inferences about populations. The sampling distribution of proportions demonstrates predictable behavior as sample sizes increase, approaching a normal shape under specific conditions related to both sample magnitude and proportion magnitude. Students calculate the mean and standard error of proportions to determine probabilities associated with observed sample values, essential skills for practical statistical applications. The Central Limit Theorem stands as the chapter's transformative principle, establishing that sample means follow approximately normal distributions regardless of the original population's actual shape when samples are sufficiently large. This remarkable result enables researchers to make probability-based inferences about population means even when underlying data display substantial skewness or deviation from normality. Correctly interpreting sampling distributions requires distinguishing between the distribution of individual observations within a single sample and the distribution of statistics computed from many independent samples, a conceptual distinction frequently misunderstood. The finite population correction adjusts standard error calculations when sampling represents a substantial proportion of a finite population. Understanding these principles provides the necessary prerequisite knowledge for constructing confidence intervals and performing hypothesis tests, converting theoretical probability knowledge into practical statistical tools capable of generating valid conclusions about unknown population characteristics from limited sample data.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 7: Sampling Distributions

Related Chapters