Chapter 9: Testing a Claim

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement, not replace, the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome to The Deep Dive, the show that cuts through the noise and gets straight to the most important insights from the information you need to know.

Today, we're tackling a cornerstone of statistical inference, how we use data to test specific claims.

We've previously explored confidence intervals as a way to estimate population parameters.

Now we're embarking on a deep dive into the second crucial method,

significance tests.

Our mission today is to make Chapter 9, Testing a Claim from the Practice of Statistics, 6th edition, crystal clear.

Consider this your essential guide to mastering significance tests, especially if you're an AP statistics learner.

We're going to untangle these complex concepts, show you their practical power, and yeah, make it engaging.

That's precisely it.

While confidence intervals give you a range,

significance tests offer a formal way to, well, weigh evidence for or against a very precise claim.

Like a specific number.

Exactly.

Imagine a company advertising a specific battery life, or maybe a scientist claiming a new drug reduces a side effect by a certain percentage.

Significance tests give us the framework to use data to evaluate those claims and make informed decisions.

Okay, let's untack this.

At its core, what is a significance test?

Well, formally, it's a formal procedure for using observed data to decide between two competing claims.

And these claims are called hypotheses.

Right.

Hypotheses.

You might also hear it called a test of significance or a hypothesis test.

Same thing, really.

Got it.

And the logic behind it.

The underlying logic is surprisingly intuitive,

actually.

Think about a criminal trial.

Innocent until proven guilty.

Precisely.

In statistics, we begin with a kind of default assumption, a presumption of no effect or no difference.

This is our null hypothesis.

Okay, the starting point.

Yeah.

And we only abandon that presumption if the data provides strong, convincing evidence to the contrary.

We need compelling evidence to reject our initial assumption.

Makes sense.

So let's talk about stating hypotheses.

This is where it all begins, right?

Absolutely fundamental.

First up is the null hypothesis, which we write as H0.

H0.

Okay.

This is the specific claim we're gathering evidence against.

It always states that a parameter is equal to some claimed value.

Think no difference, no change, or the status quo value.

So like, if a basketball player claims they make 80 % of their free throws.

Then H0 would be P equals 0 .80, where P is their true long run proportion of made free throws.

Got it.

And the alternative.

That's the alternative hypothesis, huh?

This is the claim we suspect, or hope might be true instead.

So if we think that players may be exaggerating.

Then ha might be P0 .80.

We suspect their true percentage is lower.

And these alternatives can be one -sided or two -sided.

Exactly.

One -sided means you're specifically looking for evidence that the parameter is less than the null value, like our basketball player example, or greater than the null value.

A two -sided alternative just claims the parameter is different from the null value.

Not equal to it.

Could be higher.

Could be lower.

We're just looking for any difference.

Right.

And here's a big AP exam tip, isn't it?

About parameters versus statistics.

Yes.

Absolutely crucial.

Hypotheses always refer to a population parameter like P for proportion, or Me for mean.

Always.

So you'd never write something like H0 P0 .80 using the sample stat?

Never.

P hat, P or X bar belong to your sample data.

Hypotheses are about the bigger picture.

The whole population parameter.

That's a common mistake to avoid.

Let's use an example.

The juicy pineapples one.

Good one.

So last year, the mean pineapple weight was 31 ounces.

They installed a new irrigation system.

And the managers just wonder if it will affect the mean weight.

Not necessarily increase or decrease it.

Just change it.

Exactly.

So because they're asking if it will change it, that makes it two -sided.

Our hypotheses would be H0 Me 31.

The old mean.

Right.

And ha!

Me 31, where it was the true mean weight of all pineapples grown this year with the new system.

And a key point here.

Don't peek at the data first, right?

Like if the sample mean happened to be 31 .9, you shouldn't then change how to be 31.

Definitely not.

You formulate your hypotheses based on the question before you collect or analyze the data.

Okay, moving on.

This is where it gets really interesting.

Interpreting P values.

Ah, the P value.

Often misunderstood, but so important.

Let's go back to that free throw shooter.

Say they took 50 shots and made 32.

So their sample proportion, P, is 0 .64.

Okay, 64%.

But their claim, H0, was P equals 0 .80.

Right.

So the P value answers the question, if they really are an 80 % shooter, how likely is it they'd shoot 64%, or even worse, just by chance in a random sample of 50 shots?

That's the essence of it.

Formally, the P value is the probability of getting evidence for the alternative hypothesis as strong or even stronger than what you actually observed.

Assuming the null hypothesis, H0, is true.

Exactly.

That assuming H0 is true part is critical.

So what does a small P value tell us?

A small P value means your observed result is pretty surprising, pretty unlikely if the null hypothesis were true.

That gives you convincing evidence against H0 and in favor of H.

And a large P value.

A large P value means your observed result isn't very surprising at all.

It's the kind of thing that could easily happen just by chance if H0 were true.

So you fail to find convincing evidence against H0.

Like in that simulation for the shooter, the P value was tiny, like 0 .00075.

Right.

Very small.

It means if the player really is an 80 % shooter, there's less than a 1 % chance they perform as poorly as 64 % or worse in 50 shots, just due to random luck.

That low probability makes us doubt their 80 % claim.

Okay.

Another example.

Healthy bones.

Teens need 1300mg calcium.

Researchers suspect they get less.

H0, milio, 1300, honey, 1300.

One -sided test.

A sample of 20 teens had a mean X equal 1198mg and the calculated P value was band

21404.

How do we interpret that?

You'd say, assuming the true mean daily calcium intake for all teens is 1300mg,

H0 is true, there's about a 14 % probability of getting a sample mean as low as 1198mg or even lower, just by chance, in a random sample of 20 teens.

So a 14 % chance.

That's not super unusual, is it?

Not really.

It doesn't give strong evidence against the claim that the true mean is 1300mg.

And for two -sided tests, we need to consider extremes in both directions.

Correct.

If H is not equal to, say, Hp0 .5 and your sample gives p0 .65, that's B0 .15 above 0 .5.

You need to find the probability of being at least 0 .15 away from 0 .5 in either direction.

So you'd find pp is 0 .65 and ppr is 135, assuming p0 .5, and add those probabilities together.

Or, often, you can just find the probability in one tail and double it, because the sampling distribution is usually symmetric.

Right, okay.

So T values measure the strength of evidence against H0.

How do we make a final decision?

That brings us to making conclusions.

Yes, the decision part.

This involves comparing your P value to a predetermined threshold called the significance denoted by alpha.

Alpha.

And that's usually 0 .05.

Often, yes.

0 .05 is a common convention, but it could be 0 .10 or 0 .01 or something else, depending on the context.

The key is you choose alpha before you see the data.

To avoid bias.

Okay, so what's the rule?

Simple.

If your P value is less than alpha, you reject H0.

If your P value is greater than or equal to alpha, LPO, you fail to reject H0.

Reject H0 or fail to reject H0.

And your conclusion should always be stated in two parts.

First, state your decision about H0 based on the P value and alpha.

Like since the P value is less than alpha, we reject H0.

Or since the P value is greater than alpha, we fail to reject H0.

And the second part.

State your conclusion about ha in the context of the problem.

So if we reject H0, we say, there is convincing evidence that state ha in words.

And if we fail to reject H0, there is not convincing evidence that state ha in Back to the trial analogy.

Rejecting H0 is like a guilty verdict.

Failing to reject H0 is like not guilty.

Precisely.

And crucially, not guilty doesn't mean proven innocent.

It means the prosecution didn't provide enough evidence for a conviction.

So failing to reject H0 doesn't mean H0 is true.

Absolutely not.

Never say you accept H0 or conclude H0 is true.

That's a major statistical ono, especially on the AP exam.

Just stick to fail to reject H0 and not convincing evidence for ha.

OK, let's take better batteries.

They test if their batteries last longer than 30 hours.

Ha ha.

Say they get a P value of zero point zero seven one seven and they chose open zero point zero five beforehand.

OK, P value point zero seven one seven is greater than point zero five.

Right.

So we fail to reject H0.

And the conclusion in context.

Because the P value point zero seven one seven is greater than I said point zero five.

We fail to reject H0.

There is not convincing evidence that the true mean lifetime of the company's deluxe AAA batteries is greater than 30 hours.

Makes sense.

Now things could go wrong, right?

We can make mistakes.

Yes.

That brings us to type I and type two errors.

These are the two ways our conclusion from the test might be incorrect.

OK, what's a type I error?

The type I error happens if you reject H0 when in reality H0 is true.

It's like a false alarm or a false positive.

You conclude ha is true when it really isn't.

And the probability of making a type I error.

That's exactly equal to your significance level, Acker.

If IO point zero five, you have a five percent chance of making a type I error if H0 is true.

Got it.

And type two.

A type two error happens if you fail to reject H0 when in reality ha is true.

This is like a missed opportunity or false negative.

The test doesn't find convincing evidence for ha, even though ha is actually correct.

OK.

Fail to reject when you should have.

How to remember which is which.

The textbook has a nice mnemonic.

Fail to goes with type two.

If your conclusion involves fail to reject H0, the only error you could have made is a type two error.

Let's use the perfect potatoes example.

Producer tests if more than eight percent are blemished.

HP zero point zero eight ha a PS point zero eight.

IO point zero five.

What would a type I error look like here?

OK, type I is rejecting H0 when H0 is true.

So the producer concludes more than eight percent are blemished and maybe rejects the truckload.

When actually the true proportion was only eight percent or less.

Right.

The consequence.

They needlessly rejected a good batch of potatoes,

wasted time, maybe damaged relations with the supplier and a type two error.

Type two is failing to reject H0 when ha's is true.

So the producer concludes there's not enough evidence that more than eight percent are blemished and they accept the truckload.

When in reality, the true proportion was greater than eight percent.

Exactly.

The consequence.

They end up using substandard potatoes, which could lead to lower quality chips, customer complaints, maybe lost sales.

And in this case, the type two error seems more serious for the producer's reputation, maybe.

Could be.

Deciding which error is more serious depends entirely on the specific context and the consequences of each mistake that often influences the choice of I.

Because lowering alpha reduces type I error risk.

But it increases the risk of a type two error.

It's a trade off.

Making it harder to reject H0 makes it less likely you'll reject it wrongly type I, but more likely you'll fail to reject it when you should have type two.

OK, let's put this all together.

Section nine point two tests about a population proportion.

We need a structure, right?

Yes, the four step process state plan do conclude.

This organizes your work perfectly for any significance test.

State is defining hypotheses and alpha.

We've covered that.

What's critical in the plan step for proportions?

Checking the conditions for inference.

There are three one random data must come from a random sample or a randomized experiment to 10 percent condition.

When sampling without replacement, your sample size n should be no more than 10 percent of the population size n ensures independence.

Three large counts condition.

This is the one for checking normality of the sampling distribution.

Both the expected number of successes and P0 and failures and one P0 must be at least 10.

And crucially for the large counts check in a test, you use P0 from the null hypothesis, right?

Not P from the sample.

Exactly, because the whole test operates under the assumption that H0 is true.

So we check if the sample size is large enough given that assumption.

OK, example, get a job.

National rate of high schoolers with jobs is 0 .25.

An administrator suspects it's lower at her school.

H0 P0 0 .25 haha P0 0 .25 sample.

200 students, 39 have jobs.

Let's plan.

OK,

rando.

Assume it was a random sample.

10 percent.

200 students is likely less than 10 percent of all students in a large high school.

Large counts.

Use P0 0 .25 and P0 used to hit 2000 .25, which is 50, which is an CRE 10 and N1 P0 equals 2000 0 .75.

Also N10.

Perfect conditions met.

Now that you pay a step calculations.

Right.

First, calculate the sample statistics here.

P NS 39200 and 0 .195.

Then the test statistic.

Yes, the standardized test statistic, which is a Z score for proportions.

The formula is Z P D equals 0 P P 0 1 P 0.

So for this example.

The Z boys 0 .195 0 .25 escort T to 0 .250 0 .7200 plug that in comes out to 0 .780.

OK, Zio honest 1 .80.

What does that tell us?

It tells us our sample result Pt point one nine five is one point eight zero standard deviations below the value assumed in the null hypothesis easier point two five.

And then we find the P value from the Z score.

Yes.

Since Harris P point two five a less than alternative, the P value is the probability of getting a Z score of Magnus one point eight zero or even less.

P zero one point eight zero.

You'd use a standard normal table or technology.

And that probability is zero point zero three five nine.

Correct.

P value point zero three five nine.

So final step.

Yeah.

Conclude.

Let's assume max to point zero five.

OK, compare P value to alpha point zero three five nine is less than point zero five.

So we reject H zero.

Right.

And the conclusion in context.

Because the B value point zero three five nine is less than opposite zero point zero five.

We reject H zero.

There is convincing evidence that the true proportion of students at this high school who hold part time jobs is less than the national rate of point two five.

Perfect conclusion.

That whole procedure is called the one sample Z test for a proportion.

Remember those steps and conditions.

Let's quickly run through one potato to potato using the four steps.

Claim eight percent blemish P zero point zero eight.

Sample five hundred potatoes.

Forty seven blemished P zero point zero nine four.

Test if more than eight percent are blemished at all point one zero.

OK, quick fire.

State H zero P zero point zero eight.

Half P zero point zero eight.

Half P equals true proportion blemished.

Plan one simple Z test for P random assume yes.

Ten percent five hundred ten percent shipment.

Large counts five thousand point narrow eight forty zero ten five thousand point nine two four sixty euro ten conditions met.

E value point zero nine four zero nine zero point zero nine four point zero one zero two R zero point zero eight zero point nine two five hundred zero.

T value one point one point zero one point one two five one clue.

T value zero point one two five one one H zero point one zero H zero.

There's not convincing evidence that the true proportion of blemished potatoes in the shipment is greater than a point zero eight.

Nicely done.

What happens if the data clearly goes the wrong way?

Like if only six point six percent of potatoes were blemished.

P equals zero point zero six six.

But H was still P zero point zero eight.

Good question.

If your sample statistic P point zero six six is actually less than the null value P zero point zero eight.

It offers zero support for an alternative hypothesis that says the parameter is greater than the null value H P zero point zero eight.

So you just stop.

You should recognize immediately there's no evidence for H.

You don't even need to calculate the Z statistic or P value formally.

Your P value would be very large anyway leading you to fail to reject H zero.

But recognizing the direction saves you work.

Makes sense.

And we touched on two -sided tests earlier.

For the non -smokers example.

Eight zero P zero point six eight hot P zero point six eight.

We found zero point zero point one zero and P value of zero three five eight.

Right.

And since point zero three five eight zero point zero five we rejected H zero.

Concluding there was evidence the proportion at that school differed from point six eight.

Exactly.

And remember that connection to confidence intervals.

Oh yes.

A 95 % confidence interval for P in that example was warrant five two two point six seven eight.

And notice that point six eight the value in H zero is not inside that interval.

So the confidence interval gives the same conclusion as the two -sided test.

Reject H zero because zero point six eight is not a plausible value based on the interval.

Precisely.

For a two -sided test a 100 % confidence interval gives the same conclusion.

And the interval gives you more info that range of plausible values.

Okay.

Switching gears slightly.

Section nine point three.

Tests about a population mean.

Similar structure but different details.

That's right.

Same four -step process.

State plan do conclude.

What changes in the plan step for means?

The random and 10 % conditions are the same.

The big difference is the normal large sample condition.

How does that work for means?

Well if your sample size N is large say N or 30.

The central limit theorem usually kicks in and the sampling distribution of X will be approximately normal regardless of the population shape.

So the condition is met.

Okay N30 is good.

What if N30?

If N30 you can only proceed if the original population distribution itself is approximately normal or at least not strongly skewed and without outliers.

And how do we check that if we don't know the population shape?

You graph your sample data.

Make a histogram, a dot plot, a stem plot,

maybe a normal probability plot.

Right.

You look for obvious departures from normality strong skewness or major outliers.

And for the AP exam you actually need to show the graph and comment on it.

Yes sketch the graph and write something like the dot plot shows rough symmetry and no outliers so we can proceed with the t -test.

You need to justify using the procedure.

Let's use better batteries again.

They tested 15 batteries, N15.

Random sample 10 % condition okay but N15 is small.

Right so we'd need to look at the graph of the 15 sample lifetimes.

The book shows a stem plot, histogram, box plot, and normal probability plot.

And describing those verbally.

You'd say the stem plot and histogram look roughly symmetric.

The box plot confirms no outliers and the normal probability plot is reasonably straight.

So even though N15 it seems plausible the underlying population of battery lifetimes is approximately normal.

Condition met.

Okay conditions checked.

Now for the DO step for means.

We don't usually know sigma, the population standard deviation.

Almost never in practice.

So we have to estimate it using the sample standard deviation sx.

And using sx instead of sus means we don't use a z statistic anymore.

Correct we use a t statistic.

The formula is tx s sx st.

Notice it's the same structure statistic parameter standard error of statistic.

And this t statistic follows a t distribution.

Yes a t distribution.

It looks a lot like the standard normal curve symmetric bell shaped centered to zero.

But it has slightly more spread thicker tails.

Why thicker tails?

Because we introduced extra variability by using sx to estimate negs s.

The amount of extra spread depends on the sample size.

Specifically the degrees of freedom which for one sample t test is df and 1.

So smaller sample size smaller df thicker tails.

Exactly as df gets larger as n increases the t distribution gets closer and closer to the standard normal z distribution.

For better batteries N15 so df14 they found xx 33 .93 and sx 9 .82.

80 was 5 is 30.

So the t statistic is t 33 .9330 9 .82 is correct of 25221 .55.

Now how do we find the p value from t 1 .55 with df14?

Can we use table b?

You can use table b but it only gives ranges for p values.

You'd find row df14 look for 1 .55 and see which columns it falls between.

We tell you the one -sided p value is between 0 .05 and 0 .10.

So not exact.

No table b is limited technology like a calculator's tcdf function is much better for finding precise p values from t distributions.

Okay so this whole thing is the one sample t test for a mean.

Check conditions especially normal large sample using graphs fm30.

Calculate t x86xuc or tn.

Find p value using t distribution with df and 1.

You got it.

Let's run through healthy streams.

DO level 5 is bad.

Sample of 15 locations test 805 versus half 5 at 8 let 0 .05.

Okay state 805 half half 5 in publain DO a plias.

Plan.

One sample t test for Ella random locations yes 10 percent.

Assume many locations normal large sample nf15 is small but histogram looked okay.

Roughly symmetric no outliers conditions met.

Calculate sample stats xe4 .771 xxc 0 .9396.

Calculate tt 0 .93 on 5 0 .9396.

Recurre t14 degrees of freedom dftp14.

Find p values in technology ptl om at 294 with df14 is way 0 .1816.

Conclude p value 0 .1816 euro 0 .05.

Fail to reject h0.

There is not convincing evidence that the true mean dissolved oxygen level in the stream is less than 5 milliliter.

Excellent and what type of error might have occurred here?

We failed to reject h0 so we might have made a type 2 error.

Meaning?

Meaning the true mean DO level is actually less than 5 but our test wasn't sensitive enough wasn't powerful enough to detect it.

The consequences aquatic life might be at risk and we didn't sound the alarm.

Okay and the link between two -sided t -tests and confidence intervals for means still holds?

Yes and it's even stronger here because both the t -test and the t -interval use the sample standard deviation 6 in their calculations.

The rule is the same.

A 101 % confidence interval for OO contains all the values of zero ray for which you would fail to reject h0 in a two -sided test at significance level IRO.

So back to juicy pineapples.

H0 went 31?

How is it going to 3 when if a 95 % CI for error was 31 .255, 32 .616?

Does that interval contain the null value 31?

No it doesn't.

It starts above 31.

So based on the interval you'd reject h0.

Each yields 31 at the acuole 0 .05 level.

There is convincing evidence the true mean weight is not 31 ounces.

It matches the test conclusion.

We still can't say that irrigation caused the change.

Absolutely crucial point.

It was an observational study, not a randomized experiment.

The change in weight is associated with a new system but other things could have caused it like different weather that year.

Association isn't causation.

Okay final topic.

The power of a test.

What exactly is power?

Power is the probability that your test will correctly reject the null hypothesis h0 when a specific alternative hypothesis is actually true.

So it's the probability of detecting an effect when there really is an effect.

Yes.

It's p reject h0 is true at a specific value.

It's your test sensitivity and it's directly related to type 2 error.

Power equal 1 p type 2 error.

If the power to detect that the new batteries last no 31 hours when h0 was only 30 is 0 .762.

It means if the true mean lifetime is 31 hours there's a 76 .2 % chance that our significance test of the chosen vators will find convincing evidence that the mean is greater than 30.

So a 76 .2 % chance of making the right conclusion in that specific scenario.

Correct.

And what affects the power?

How can we increase it?

Several factors.

Think about the powerful batteries activity.

One increases sample size and more data more information better chance of detecting a true difference.

Power goes up.

Two increase the significance level making it easier to reject 800 d or d f sample 0 .10 versus 0 .05 increases power.

But it also increases the chance of a type by error that trade off again.

Exactly.

Three increase the difference between the null value and the true alternative value.

We call this the effect size.

It's just easier to detect big differences than small ones.

Taking 35 versus these 30 is easier.

Higher power than taking 31 versus 30.

Four decrease variability.

Anything you do in designing the study like blocking and experiments or using more precise measurements that reduces random variation will make real effects stand out more clearly increasing power.

Okay.

So if we used i .01 instead of equal 0 .05 power would.

Decrease harder to reject age zero.

If the true effect was bigger than we thought.

Power increases easier to detect.

If we used a smaller sample size power decreases less information.

Got it.

Finally, some advice on using tests wisely.

Yes.

Very important.

First statistical significance versus practical importance.

Just because a p value is small doesn't mean the result matters in the real world.

Exactly.

Especially with very large sample sizes.

A tiny difference can become statistically significant if you have enough data.

That antibacterial cream reducing healing time by 0 .1 days might have p .05.

But who cares?

So always look at the actual difference.

Maybe with a confidence interval.

Plot your data.

Calculate the confidence interval.

See the magnitude of the effect.

Don't just rely on the p values.

Yes.

No significance.

And the other warning.

Beware of multiple analyses.

Sometimes called p hacking.

Running lots and lots of tests on the same data until something pops up as significant.

Yes.

If you run 20 tests where H0 is actually true for all of them using IRO 0 .05, you'd expect about one test, 5 % of 20, to be significant just by pure chance.

Like that funny XKCD comic about jelly beans causing acne but only the green ones after testing 20 colors.

Precisely.

If you torture the data long enough, it will confess to something.

That's why it's crucial to decide which hypotheses you're testing before looking at data.

Be skeptical of studies that seem to have looked for significance everywhere.

Wow.

Okay.

We've covered a huge amount today.

Hypotheses, p -values, errors, the whole process for proportions and means with z -tests and t -tests.

And finally, power and using tests wisely.

That four -step state, plan, do, conclude framework really seems key.

It absolutely is.

It keeps you organized.

And remember, significance tests help us weigh evidence.

But interpreting them correctly, understanding p -values, alpha, type I and II errors, practical importance.

That's where the real statistical thinking comes in.

And definitely be cautious about that p -hacking idea.

So as you go forward, think about this.

How might really understanding significance tests help you not just on the AP exam but also when you see claims in the news, ads everywhere?

How can you critically evaluate them now?

Thank you so much for joining us on this Deep Dive.

This has been a special production from the Last Minute Lecture Team.

We wish you all the best in your studies.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Hypothesis testing provides researchers with a systematic method for evaluating population claims through sample data and probabilistic reasoning. The foundational structure begins by establishing a null hypothesis that represents the status quo or absence of an effect, paired with an alternative hypothesis embodying the specific claim under investigation. The directionality of the alternative hypothesis determines test design: a one-sided test focuses on change in a single direction, while a two-sided test examines change in either direction. Once hypotheses are formulated, researchers compute a test statistic from their sample data and translate this into a p-value, which represents the probability of obtaining results as extreme or more extreme than observed data under the assumption that the null hypothesis is true. The conventional significance level of 0.05 establishes a decision boundary: when the p-value falls below this threshold, evidence is deemed sufficient to reject the null hypothesis and support the alternative. Understanding error types is essential to sound inference. Type I errors represent the rejection of a true null hypothesis, while Type II errors occur when a false null hypothesis is not rejected. Statistical power, defined as the probability of correctly detecting a true effect, complements Type II error probability and indicates a test's sensitivity. Practical applications extend to testing claims about both population proportions and population means, though each requires verification of specific conditions before results can be trusted. Accurate p-value interpretation remains critical—the p-value itself does not indicate the probability that the null hypothesis is true, nor does it measure the size of an effect. The chapter emphasizes distinguishing statistical significance from practical significance and cautions against common misuses of p-values in research contexts. Overall, hypothesis testing balances its power as a decision-making tool with recognition of its limitations and appropriate boundaries for application.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 9: Testing a Claim

Related Chapters