Chapter 8: Estimating with Confidence

0:00 / 0:00
Report an issue

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome to the Deep Dive.

Today we're diving into some questions that, well, they feel almost impossible to answer directly.

Yeah, things like, what's the actual average battery life on the newest iPhone model?

Not just one, but all of them.

Exactly.

Or, you know, what proportion of college students really went to all their classes last week?

Across the whole country.

And even something like,

how much does a quarter pound hamburger patty actually weigh after it's cooked?

Does it very much?

You see the problem, right?

Trying to measure every single phone or pull every single student or weigh every single burger?

It's just not feasible.

Not practical at all.

You can't get a full census most of the time.

But here's the good news.

That doesn't mean we're stuck.

And that's where the deep dive comes in today.

We're going to give you a clear path, a shortcut really, to understand how statisticians tackle these big questions.

We're moving into what's called statistical inference.

So instead of just describing data we have, we're using a smaller piece, a sample, to draw conclusions about the whole population.

And our focus today is estimating with confidence.

We're not just making a single guess.

No, definitely not.

We're giving you a range of believable values, plausible values for those population characteristics we can't measure directly.

And our guide for this journey is The Practice of Statistics, Sixth Edition, by Starnes and Tabor.

Think of this as your essential kit for Chapter 8.

Yeah.

We'll break down the key ideas, use some real examples, and throw in some tips, especially if you're thinking about AP stats.

Okay.

So let's just jump right in.

If you can't measure everyone, what's your single best guess?

What do you start with?

That's your point estimator.

It's the statistic you calculate from your sample data, like a sample mean or maybe a sample proportion.

And the actual number you get from your specific sample.

That number is the point estimate.

So the estimator is the recipe.

The estimate is the cake you baked this time.

Got it.

So for I mean iPhone battery life, the population mean more.

Our best point estimate is the sample mean, X bar, from the phones we tested.

Exactly.

And for the proportion of students attending class, the population proportion,

P, our best point estimate is the sample proportion, P hat, from the students we surveyed.

But here's something that trips people up.

If I took a different random sample of iPhones, I'd probably get a different sample mean, right?

A different point estimate.

Almost certainly, yeah.

That's called sampling variability.

Just due to chance, different samples will give slightly different results.

So how do we deal with that?

How can we be confident if our estimate jumps around depending on the sample?

That's the perfect question.

That chance variation, that sampling variability is precisely why we need more than just a single point estimate.

It leads us straight to the idea of a confidence interval.

OK.

Confidence interval.

What is that exactly?

So a confidence interval gives you an interval, a range of plausible values for the parameter you're trying to estimate.

And it's based entirely on your sample data.

Plausible values.

Meaning values we wouldn't be shocked to find are the true value.

That's a great way to put it.

It's a range where we think the true population parameter might reasonably lie.

You said they have a standard structure.

Yep.

Always the same basic form.

It's your point estimate plus or minus a margin of error.

OK.

Point estimate plus margin of error.

Can you give an example?

Sure.

Let's say a poll finds a 95 % confidence interval for the proportion of U .S.

adults facing some financial difficulty is like 0 .613 to 0 .687.

So the point estimate will be right in the middle.

Exactly.

The midpoint there is 0 .65.

That was likely their sample proportion, their p hat.

And the margin of error is how far you go out from the middle.

So 0 .65 up to 0 .687 is 0 .037.

Perfect.

And 0 .65 down to 0 .613 is also 0 .037.

So the interval is 0 .65 plus or minus 0 .037.

That 0 .037 is the margin of error.

OK.

That makes sense structurally.

But the confidence level, that 95 % confident part, what does that really signify?

It's tricky.

It is tricky and it's super important to get right.

The confidence level, let's call it C%, tells you about the overall success rate of the method you used to create the interval.

The method, not this specific interval.

Right.

Imagine you could repeat the whole process, take a random sample, calculate the interval many, many times.

If you did that, about C % of all the intervals you constructed would successfully capture the true unknown population parameter.

Ah.

OK.

So it's about the long run performance of the technique.

Precisely.

It's how confident we are that our procedure works in the long run.

So let's try interpreting.

For an interval,

say, 0 .48 to 0 .54 for candidate A's support, we'd say, we are 95 % confident that the interval from 0 .48 to 0 .54 captures the true proportion of voters who support candidate A.

Perfect.

That's interpreting the interval, making a statement about where the parameter likely is.

And interpreting the level itself, the 95 % part.

That would sound like, if we were to take many random samples of voters and construct a 95 % confidence interval from each sample, then about 95 % of those intervals would capture the true proportion of voters who support candidate A.

It's about the process.

OK.

Crucial distinction.

And the common mistake is thinking there's a 95 % chance this specific interval we just calculated holds the true value.

Exactly.

Once you've calculated your interval, say, 0 .48 to 0 .54, the true population proportion either is in that specific range or it isn't.

The probability is 1 or 0.

The 95 % refers to our confidence in the method before we collected the data and calculated the specific interval.

Got it.

So we always want our intervals to be precise, meaning a smaller margin of error.

What makes that margin of error bigger or smaller?

Good question.

Two main things influence the margin of error, which affects the width of your interval.

First is the confidence level itself.

How does that work?

Well, if you want to be more confident, say, 99 % confident instead of 95%, you need to cast a wider net to be more sure of capturing the true value.

So higher confidence level means a wider interval, a larger margin of error.

Makes sense.

More certainty requires more wiggle room.

What's the other factor?

The big one.

Sample size.

If you increase your sample size, you get more information about the population.

And more information means?

More precision.

A larger sample size leads to a smaller margin of error and a narrower, more precise confidence interval.

It's like getting a sharper picture with more data points.

OK, so bigger sample, narrower interval.

That seems intuitive.

But is there anything the margin of error doesn't cover?

Sometimes polls seem way off.

Oh, absolutely.

This is a critical limitation.

The margin of error only accounts for the variability we expect due to random sampling or random assignment, pure chance variation.

So it doesn't account for mistakes or problems with the survey itself.

Exactly.

It doesn't account for practical problems like non -response bias, people refusing to answer under coverage, some group being left out of the sampling process, or response bias, people lying or giving inaccurate answers.

Like if students lied about their GPA in a survey.

Perfect example.

If lots of students inflate their GPA, your calculated confidence interval might be centered way higher than the true average GPA.

And the margin of error gives you no warning about that systematic bias.

OK, that's a really important caveat.

Margin of error isn't everything.

Not at all.

It only handles the random chance part.

Right.

Let's switch gears now and apply this to estimating proportions.

So we're looking at percentages like the proportion of adults who recycle or the proportion of emails that are spam.

OK.

So for proportions, the general formula point estimate credion margin of error gets more specific.

Our point estimate is the sample proportion.

And the margin of error part is critical value standard deviation of statistic.

For proportions, the critical value comes from the standard normal distribution.

We call it Z.

Z star.

OK.

And the standard deviation of the statistic or more accurately, the standard error when we use poo is 1p.

So the whole formula is poo of poo.

OK.

That looks like the formula.

But you mentioned conditions before.

Are there conditions we need to check before using this?

Absolutely crucial.

Yes.

Three conditions you must check and state before you calculate a one sample Z interval for a proportion.

What are they?

First, random.

The data have to come from a well -designed random sample or randomized experiment.

This is key for generalizing results.

Makes sense.

Randomness is fundamental.

What's next?

Second, the 10 % condition.

When you're sampling without replacement, which is almost always the case in practice, your sample size n should be no more than 10 % of the total population size n.

Why is that important?

It ensures that even though you're not replacing individuals, the selections are still nearly independent.

If your sample is too large relative to the population, the standard error formula isn't quite accurate and your interval might be a bit too wide.

OK.

N less than 10 % of n.

Got it.

And the third?

The large counts condition.

This one is about ensuring the shape of the sampling distribution of p is approximately normal so we can actually use that Z critical value reliably.

How do we check that?

You need to check that the number of successes in your sample, mp, and the number of failures, n1p, are both at least 10.

Both need to be 10 or more.

Successes and failures, both at least 10.

What if they aren't?

If they aren't, the normal approximation isn't safe to use and the confidence interval calculated using Z might not actually achieve the advertised confidence level.

Its true capture rate could be lower.

OK.

Those conditions are vital.

Let's walk through an example using that four -step process.

State, plan, do, conclude.

Sounds good.

Let's use that distracted walking example.

A poll randomly sampled 738 cell phone users and 170 admitted walking into something while talking.

We want a 95 % confidence interval.

Step one.

State.

What parameter are we estimating and at what confidence level?

We want to estimate p, the true proportion of all cell phone users who would admit to walking into something or someone while talking on their phone.

And we want to do this with 95 % confidence.

Perfect.

Step two.

Plan.

Name the inference method and check the conditions.

OK.

The method is a one -sample Z interval for proportion p.

Now, conditions?

Random.

Yes.

The problem states it was a random sample of 738 users.

10 % condition.

738 users.

Is that less than 10 % of all cell phone users?

Yeah.

Definitely.

There are way more than 780 cell users out there.

So yes.

Good.

Large counts.

We had 170 successes, admitted distracted walking, and 738, 170 equal 568 failures.

Both 170 and 568 are way bigger than 10.

So yes, large counts is met.

Conditions are met.

Step three.

DO.

Perform the calculations.

All right.

First, PO 107 0 7 38, which is about 0 .230.

For 95 % confidence, the standard Z value is 1 .960.

Right.

Now, plug those into the formula, 0 .230, 1 .960, A11, 0 .230, S738.

OK.

Calculating that square root part of the standard error, it comes out to a 0 .0154 and then multiplied by 1 .960, gives a margin of error of about 0 .030.

So the interval is 0 .230, 0 .000, 0 .260.

Great.

Last step.

Consolute Z.

Interpret the interval in context.

We are 95 % confident that the interval from 0 .200 to 0 .260 captures the true proportion of all cell phone users who would admit to walking into something or someone while talking on their cell phone.

Beautifully stated.

That force of process really keeps things organized.

It does.

Now, thinking about planning, what if you need to figure out the sample size before you do this survey?

Like, how many people do you need to poll to get a specific margin of error?

Very practical question.

You essentially work the margin of error formula backwards.

You set the margin of error part, 0 .040 N, to be less than or equal to your desired margin of error, ME.

But wait.

The formula has Poo in it.

We won't know Poo until after we collect the data.

How do you solve for N then?

That's the clever part.

We need a value for Poo to plug in.

Since we don't know the real Poo, we use the value that makes the required sample size largest.

What value is that?

0 .5.

If you plug in 0 .5 for Poo, the term Poll1Poo becomes 0 .25, which is its maximum possible value.

This gives you a conservative sample size estimate.

It guarantees your margin of error will be no larger than you want, regardless of the true Poo.

Ah, okay.

So using P hat 2 is a safe bet to ensure you meet your goal.

Exactly.

Unless you have a good reason from prior research to believe Poo is definitely closer to 0 or 1, use 0 .5.

Let's try one.

A company wants to estimate customer satisfaction.

They want a margin of error of no more than 3 percentage points, 0 .03, with 95 % confidence.

How many customers do they need?

Okay.

We want ME 0 .0 .03.

Confidence is 95%, so 0 .960, we'll use P0 .5.

So 1 .960, 7 .5 error, 0 .5, N, 0 .03.

Now we solve for N.

Right.

Square both sides, rearrange.

You end up with N, 1 .960, 0 .032, 0 .5.

Do the math.

1 .960, 0 .032, that comes out to about 1067 .111.

Okay, so N must be greater than or equal to 106 of N, 0 .111.

Now here's a crucial rule for sample size.

You can't survey 0 .111 of a person.

Exactly.

And you always need to round up to the next whole number, always.

So even though it's 1067 .111, we round up to N of 68.

Yes.

You need to survey at least under 67 .111 people.

So you need 168 whole people to guarantee the margin of error is at most 0 .03.

Rounding down would mean your margin of error might be slightly larger than planned.

Always round up for sample size.

Got it.

Okay.

Let's shift gears completely now.

We've done proportions, percentages.

What about estimating means averages, like average GPA, average height, average cost?

Right.

Estimating a population mean.

You'd think we could just swap out PO for exit, the sample mean, and use a similar formula with ZID.

Yeah.

Seems logical.

Exit ZID times something.

The problem is the something, the standard deviation part.

For proportions, the standard deviation depended only on P, which we estimated with P.

But for means, the standard deviation of the population, sigma, is usually unknown.

Oh, right.

We usually don't know the true population standard deviation either.

Exactly.

So the natural thing is to estimate using the sample standard deviation, 6X.

But if you just plug 6X in place of SES and still use Z, it turns out your intervals won't capture the true mean E as often as your confidence level promises.

Why not?

What goes wrong?

Using 6 introduces extra variability because 6X itself varies from sample to sample.

The Z values from the normal distribution don't account for this added uncertainty.

Your intervals end up being a bit too narrow too often.

So using Z with the sample standard deviation gives you false confidence.

In a sense, yes.

The actual capture rate will be lower than, say, 95 % if you use Z.

So statisticians needed a fix.

And that fixes.

The t -distribution, developed by William Gossett, who published under the pseudonym student, hence sometimes called students to distribution.

Okay, the t -distribution.

How is it different?

It's actually a whole family of distributions, similar in shape to the normal distribution bell -shaped, symmetric around zero, but they're more spread out.

They have heavier tails.

More spread out.

Why?

To account for that extra uncertainty from using sex to estimate,

the amount of The extra spread depends on the sample size through something called degrees of freedom.

Degrees of freedom.

What's that?

For a one sample t interval for a mean, the degrees of freedom, df, are simply n1, sample size minus 1.

So smaller sample size means fewer degrees of freedom.

Right.

And fewer degrees of freedom means the two distribution is more spread out, its tails are thicker, as n and thus df gets larger, the t -distribution gets closer and closer to the standard normal Z -distribution.

Ah, okay.

So instead of Z, we use a critical value from this t -distribution, which we call t.

Exactly.

The formula becomes x10, xxxn.

That t value depends on both the confidence level and the degrees of freedom, n1.

And that term sex, does that have a name?

Yes.

That's called the standard error of the mean, often abbreviated as SEM or sex.

It's our estimate of the standard deviation of the sampling distribution of x.

Okay.

So xA t SEM.

How do we find the right TED value?

Is there a table?

Yep.

There's usually a table B in statistics texts, or more commonly now, you'd use technology like the NVT function on a calculator or software.

If using the table, you look up based on degrees of freedom and confidence level.

Correct.

Find the row for your df eis as n1 and the column corresponding to your confidence level at the bottom.

Where they intersect, that's your t.

What if your exact degrees of freedom aren't in the table?

Like, if df'd is heavily 45, the table jumps from 40 to 50.

Good point.

In that situation, to be safe, you always use the critical value for the next lowest degrees of freedom listed.

So if you had df'd 45, you'd use the t for df 40.

Why the lower one?

Using the lower df gives you a slightly larger t value, which results in a slightly wider more conservative interval.

It ensures your confidence level is at least what you claimed.

Got it.

Be conservative.

Use the lower df if yours isn't listed.

And just to be clear, proportions use z, means use t.

Absolutely vital distinction.

Keep that straight.

Okay, what about conditions for the t interval for a mean?

Are they the same as for proportions?

Two are the same.

One is different and quite important.

The random condition and the 10 % condition are exactly the same as before.

Check for random sampling assignment and check n0 .10n if sampling without replacement.

Okay, so what's the third one for means?

It's the normal large sample condition.

This is about ensuring the sampling distribution of x is approximately normal, so we can use the t distribution.

How do we check this one?

It sounds more complicated.

It has a few parts.

First, if the population distribution itself is stated to be normal, then you're good, regardless of sample size.

The sampling distribution of x will also be normal.

Okay, that's easy if we know the population is normal.

What if we don't?

Then we look at the sample size, n0.

If n is large, a common guideline is n0 .30, then the central limit theorem, CLT, kicks in.

Ah, the CLT.

That says...

The CLT states that if n is large enough, the sampling distribution of the sample mean approximately normal, even if the original population distribution isn't normal.

So n at 30 is usually good enough to proceed.

Okay, large sample size saves us thanks to the CLT.

But what if the population shape is unknown and the sample size is small, like n15 or n20?

This is the critical case.

If n30 and the population shape is unknown, you must examine the sample data itself.

You need to make a graph, like a histogram, dot plot, or box plot of the sample data.

And what are we looking for in that graph?

You're looking to see if it's plausible that the data came from a roughly normal population.

Specifically, you need to check that the graph shows no strong skewness and no outliers.

Moderate skew might be okay if there are no outliers, but strong skew or clear outliers are red flags.

So I actually have to draw a graph and comment on it?

Yes.

If you have the raw data in n30, you need to include a sketch of the graph and a sentence describing its shape, like the dot plot shows roughly symmetric data with no outliers, or the box plot indicates moderate right skew but no outliers.

You need to state that because there's no strong skew or outliers, it's reasonable to use the T procedure.

Wow.

Okay.

That's a key step for small samples.

Let's try the four -step process for means.

Example, a 2016 study randomly sampled 1 ,520 adults found they read an average of ex -ecos 12 books last year with sex -ecos 18 books, divided 95 % CI.

Okay.

State.

Define the parameter and confidence level.

We want to estimate the true mean number of books read by all American adults in the previous 12 months with 95 % confidence.

Plant.

Name the method and check conditions.

Method is a one -sample T interval for conditions.

Pliantum.

Yes.

Stated random sample of 1 ,520 adults.

10%.

Yes.

1 ,520 is less than 10 % of all U .S.

adults.

Normal large sample.

Here, n equals 1 ,520, which is much larger than 30.

So by the CLT, the sampling distribution of X is approximately normal condition met.

Great.

DO calculations.

We need T for 95 % confidence with DF equals n1 equals 1 ,519.

Using technology or a table, it will be very close to Z, T is about 1 .962.

Okay.

Now plug in to XETSXM.

That's 12 .9 knowledge, 1 .962, 3 .520.

Calculating 18 .7.

1 .20 gives about 0 .462.

Then 1 .962.

0 .462 is about 0 .91.

So the interval is 12 .9 is 0 .91.

Which is 11 .09, 12 .91 books.

Perfect.

Condense a lead.

Interpret in context.

We are 95 % confident that the interval from 11 .09 to 12 .91 books captures the true mean number of books read by all American adults in the previous 12 months.

Excellent.

And notice, if someone asked if this data suggests the mean is different from, say, 14 books, maybe a previous average.

Well, 14 is outside our interval.

11 .09, 12 .91.

So yes, this interval provides evidence that the true mean is likely lower than 14.

Exactly.

The interval gives us a range of plausible values, and values outside that range appear less plausible.

And just quickly, if that book example had only sampled, say, 20 people, the plan step, we would have needed the raw data, made a graph, checked for strong SKU outliers, and commented on it before proceeding.

Absolutely.

That graphical check is non -negotiable for small and when the population shape is unknown.

OK.

Last piece.

Sample size planning for means.

How do we figure out, and if we want to estimate a mean mean, with a specific margin of error?

Similar logic to proportions.

We start with the margin of error formula, but the one involving sigma.

M e equals zin.

We want this to be less than or equal to our target M e.

Wait.

You use z there, not t, and you use sigma, which we said we usually don't know.

Good catch.

Yes, for planning sample size for a mean, we use z instead of t, because we don't know in yet, so we don't know the degrees of freedom to find t.

Using z is simpler and standard practice here.

OK, use z for planning.

But what about the unknown sigma?

That's the tricky part.

You must have some reasonable estimate for the population standard deviation before you can calculate the required sample size.

Where would you get that estimate?

Usually from a previous study on a similar population, or maybe a small preliminary pilot study, just to get a rough idea of the variability.

Without some estimate for us, you can't determine the sample size needed for a specific margin of error for a mean.

So you need a starting guess for sigma.

Let's try an example.

Researchers want to estimate the mean cholesterol level of a species of monkey within one mil of GDL, so M e 1, with 95 % confidence.

A previous study suggests it's about five mil GDL.

OK, M e, 95 % confidence means z equals 1 .960, and we'll use the estimate 5.

We set 1 .96051.

Right.

Now solve for n.

Rearing n gives 1 .96051, which is n here 9 .8.

Squaring both sides, n here 9 .82, which is n and 96 .04.

And what's the rule for sample size?

Always round up.

So they need at least 97 monkeys.

Precisely.

N 97 monkeys are needed to achieve a margin of error of at most one mil of GDL, with 95 % confidence, assuming it's around five.

Wow, OK.

We covered a lot there.

From the basic idea of point estimates through confidence intervals, the meaning of the level, margin of error.

To the specific formulas, and importantly, the conditions for checking proportions using z intervals and means using t intervals.

And finally, how to figure out sample sizes needed for both situations.

It really feels like a complete toolkit for estimation.

It is.

Statistical inference, starting with confidence intervals, is all about using that sample data, that small piece of the puzzle, to say something meaningful and reliable about the bigger picture, the whole population.

It's a powerful idea,

but definitely relies on doing things carefully, checking conditions, understanding what the margin of error does and doesn't account for.

That's key.

Remember, the margin of error only covers random sampling variability.

It doesn't magically fix biases from bad sampling methods or untruthful responses.

So the next time you see a poll result or a study finding, you can now ask, what was the confidence level?

What was then?

And critically,

what potential biases might not be captured by that reported margin of error?

You're thinking like a statistician.

That's our deep dive into estimating with confidence, guided by chapter F of the practice of statistics.

Thanks so much for joining us on this learning journey.

Yeah, thanks for tuning in.

Keep exploring these ideas, keep asking questions, and keep making those statistical connections.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers
Confidence intervals serve as the fundamental bridge between observed sample data and unknown population parameters, enabling statisticians and researchers to quantify uncertainty in their estimates. At the heart of this approach lies sampling variability—the inherent reality that different samples drawn from the same population will yield different sample statistics. A confidence interval expresses this uncertainty by combining a point estimate derived from sample data with a calculated margin of error, presenting a range of values that likely contains the true but unknown population parameter. Understanding the confidence level proves essential; it reflects the long-run frequency with which such intervals would capture the true parameter if the sampling procedure were repeated indefinitely under identical conditions. This frequentist interpretation often diverges sharply from common misinterpretations that incorrectly assign a fixed probability to any single computed interval. Constructing valid confidence intervals requires satisfying three foundational conditions: observations must originate from a random sampling mechanism, the sampling distribution of the statistic must be approximately normal (either established through theoretical grounds or validated by adequate sample size), and individual measurements must satisfy independence, typically verified through the ten percent condition. The technical machinery differs depending on whether the population standard deviation is known or unknown. Known population standard deviations lead to z critical values extracted from the standard normal distribution, whereas unknown population standard deviations necessitate t critical values from Student's t distribution, which provides appropriately wider margins of error to account for additional estimation uncertainty. Sample size and confidence level operate in opposite directions relative to interval width; increasing sample size or lowering the confidence level narrows the interval, while decreasing sample size or raising the confidence level widens it. These tradeoffs reflect fundamental statistical principles about precision and certainty. Mastering confidence interval construction, interpretation, and assessment forms the essential foundation for subsequent hypothesis testing frameworks and enables practitioners to make evidence-informed decisions grounded in quantifiable statistical reasoning.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥