Chapter 15: Holistic Statistics

Search this chapter

Audio Overview

0:00 / 0:00

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome to the Deep Dive, your shortcut to being well -informed.

We cut through the noise, get straight to the insights hidden in the data.

And today, well, we're tackling a big one.

A The average human body temperature is exactly 98 .6 degrees Fahrenheit.

You've heard it forever, right?

Since you were a kid.

But have you ever really stopped to question it?

Our mission today is exactly that.

We're going to play statistical detective, unearth the real story behind 98 .6.

We'll investigate using the tools and critical thinking from a specific chapter in Mario Triola's Elementary Statistics.

The goal isn't just to see if 98 .6 is right or wrong.

It's really to show you how good data analysis cuts through assumptions, how it helps us understand things.

By the end of this, you won't just know the truth about 98 .6.

More importantly, you'll get the hang of the systematic approach to statistics, learn how to dodge common pitfalls, and see how data challenges even these really deep -seated beliefs.

Think of it as upgrading your critical thinking skills.

Okay, let's unpack this common belief and see what the data really says.

So before anyone even thinks about crunching numbers, good statisticians, they kind of prepare the scene, right?

They start by preparing the analysis.

Exactly.

It's all about understanding the context first.

What's the data?

Who's it from?

How was it collected?

Right, like checking the ingredients before you bake the cake.

Precisely.

If you don't do that, well, your results might not mean much.

So for this deep dive, what exactly were they measuring?

Okay, so we're looking at actual body temperatures measured in Fahrenheit, and the data comes from a specific study designed to test this exact 98 .6 belief.

And who took part?

You mentioned that that's important.

Crucial detail.

These were healthy volunteers.

They were recruited for in -patient vaccine trials at the University of Maryland School of Medicine.

Healthy volunteers.

Okay, so not people already sick with a fever.

Right.

The study's direct goal was to check the validity of that common 98 .6 number.

Got it.

And where did this data come from?

The source?

The primary source is a 1992 journal article.

It was published in the Journal of the American Medical Association, JAMA, which is, you know, highly respected, by McCoyak, Wasserman, and Levine.

Okay, JAMA.

That sounds solid.

What about funding?

Sometimes that matters.

Good point.

It was supported by the U .S.

Department of Veterans Affairs and the U .S.

Army.

So from a credibility perspective.

It looks very credible, unbiased.

That's a really vital first check for any kind of serious analysis.

Definitely.

And the sampling method.

How they actually got the temperatures.

Yeah, key details here.

First, like we said, subjects were healthy.

That avoids confounding factors like an illness messing with the temperature.

Makes sense.

Also, and this is a big deal, the temperatures were directly measured by the researchers, not self -reported.

Ah, okay.

So no one just writing down what they thought their temperature was.

Exactly.

That reduces potential for bias or just plain inaccuracy.

People aren't always great at measuring themselves or they might round things.

So why does all this groundwork matter so much?

Because a sound, unbiased sampling method is absolutely essential for reliable conclusions.

If your data collection is shaky,

well, anything you build on top of it is going to be shaky too.

It's really fascinating how these initial prepare steps are so fundamental.

They make sure the data we're about to dig into is reliable, relevant to the question,

all before you even, you know, open a spreadsheet.

Right, before a single calculation.

Yeah.

It's about setting the stage properly, making sure your analysis isn't built on, well, sand.

It's just vital critical thinking for any data you come across.

That makes perfect sense.

So once that groundwork is solidly in place, we move into the analyze phase.

Right.

This is where we really dive into the data itself, explore it, see what it's telling us, and then apply the right statistical tools.

So exploring first, what does that actually involve?

You don't just jump straight to a fancy test.

No, definitely not.

Good statisticians are like, uh, detectives examining the scene first.

Look at the raw evidence.

They visualize the data.

Like charts and things.

Yeah, exactly.

Using graphs like histograms, box plots.

These help you see the overall shape of the temperature distribution.

Is it symmetric?

Spread out, clump together.

They also use something called normal quantile plots.

That's to check if the data seems to come from a normally distributed population, the classic bell curve.

That assumption matters for some statistical tests later on.

Okay.

So what did this initial look reveal?

Anything jump out?

Oh yeah.

Right out of the gate.

The main finding was that the sample mean body temperature in this study was 98 .2 degrees Fahrenheit.

98 .2.

Okay.

So already different from 98 .6.

Noticeably different.

Yeah.

Even before any testing.

Do they look for any weird values outliers?

They did.

That's part of exploration.

For instance, there was a potential minimum temperature of 96 .5 degrees.

Sarah mentioned that sounds quite low.

It does.

But when they looked at it alongside the other lowest temperatures, they determined it wasn't so extreme, not statistically speaking, that it would significantly throw off the results.

It fit the pattern.

So this careful look gives you a feel for the data, any quirks before you draw big conclusions.

Okay.

So exploration hints that 98 .6 might be off.

Now get to the more formal analysis, right?

Applying the methods.

Yes.

And this brings us to a really crucial takeaway from the chapter we're using as our guide.

Go beyond the p -value.

Go beyond the p -value?

What does that mean?

Well, it's a common mistake, a pitfall really.

To just run one test, get a p -value, which basically gives a yes -no answer to your hypothesis and stop there.

Like is it different?

Yes.

Okay.

Done.

Pretty much.

But a really robust analysis uses a variety of methods.

It looks at things like confidence intervals, uses graphs for deeper insight.

You want to understand the how much and the what range, not just if.

It's about building a much more complete picture.

Makes sense.

Don't rely on just one piece of evidence.

Exactly.

So let's look at the different tools they used here.

First up was a parametric hypothesis test, specifically a t -test.

A t -test.

Okay.

What's the idea there?

Think of it like putting the 98 .6 belief on trial.

In statistics, you start with a null hypothesis.

That's your default assumption, the status quo.

So in this case, the null hypothesis would be?

The true average body temperature is 98 .6 degrees Fahrenheit.

Okay.

And the alternative?

What you suspect might be true.

The alternative hypothesis here was the true average body temperature is not 98 .6 degrees Fahrenheit.

The test test then uses the data to help us decide which story is better supported.

And the verdict?

What did the test show?

The results were, well, pretty striking.

The test statistic was minused 6 .611.

And the p -value,

it was incredibly small.

0 .0000.

Basically zero.

Whoa.

Okay.

Remind me what a p -value means again, especially a tiny one like that.

The p -value tells you the probability of getting your observed results or something even more extreme if the null hypothesis were actually true.

So a tiny p -value means?

It means it's extremely unlikely you'd get a sample mean of 98 .2 degrees Erino if the true population mean really was 98 .6 degrees Fahrenheit.

It's like the data is shouting, this doesn't fit.

Okay.

So tiny p -value leads to?

A clear conclusion.

Rejects the null hypothesis.

There was overwhelming evidence to say the mean body temperature is definitely not 98 .6 degrees Fahrenheit.

Right.

So the two test says, nope, not 98 .6.

But if it's not 98 .6, then what is it?

Or what's the likely range?

Excellent question.

And that's exactly where the 95 % confidence interval comes in.

This also uses the t distribution like the test.

Okay.

Confidence interval.

So it doesn't give one single number.

No, it gives a range, a window.

We calculate it so that we're 95 % confident that the true population mean lies somewhere within that window.

Like saying we're pretty sure the actual average is somewhere between x and y.

Exactly that.

And for this data, the 95 % confidence interval for the mean body temperature was calculated to be from 98 .08 degrees higher up to 98 .32 degrees higher.

98 .08 to 98 .32.

Okay.

And the key insight there.

The crucial thing is that 98 .6 degrees higher for falls outside this interval, way outside.

So the range where we're pretty sure the true mean lies doesn't include 98 .6.

Correct.

Which provides really strong supporting evidence against that old belief.

It backs up what the t -test told us.

So we have the t -test and the confidence interval both pointing away from 98 .6.

Did they use other methods too?

You mentioned variety is important.

They did.

They also use some really powerful, more modern techniques called resampling methods.

Specifically, randomization and bootstrap resampling.

Resampling.

Sounds interesting.

What's the advantage?

Well, one big advantage is they often don't rely on the same strict assumptions about the data's distribution, like needing that perfect bell curve.

Makes them very flexible.

Okay.

Tell me about randomization.

How does that work?

The idea is you take your actual sample data and you mathematically kind of shift it so that its mean does equal the value you're testing in this case, 98 .6.

Okay.

So you pretend 98 .6 is true for a moment using your data shape.

Sort of.

Then, from the suggested data, you simulate drawing many, many samples, like a thousand or more.

You see how often, just by chance, you get a sample mean as extreme as the one you actually observed.

98 .2.

And what happened here?

They found that almost none of the simulated samples, like zero out of a thousand, or maybe one if you were generous, had a mean as low as 98 .2.

Wow.

So it shows it's incredibly unlikely to get 98 .2 if 98 .6 were the real average.

Precisely.

The logical conclusion, then, is that the starting assumption that the mean is 98 .6 must be wrong.

Okay.

That's randomization.

What about the other one?

Bootstrap resampling.

Bootstrap is another clever way to use your sample data.

It's mainly used to create confidence intervals.

You take your original sample and you repeatedly draw new samples from it with replacement.

Meaning you can pick the same value more than once in a new sample.

Yes.

It's like putting all your data points in a hat, drawing one, writing it down, putting it back in the hat, and repeating until you have a new sample of the same size.

You do this thousands of times.

Okay.

Why?

It creates thousands of new samples that reflect the variation in your original data.

From these thousands of bootstrap samples, you can calculate a very robust confidence interval.

And what did the bootstrap confidence interval look like here?

It came out as 98 .07 degrees already to 98 .31 degrees already.

Wait.

That's almost identical to the one from the t -distribution method.

98 .08 to 98 .32.

Remarkably close, isn't it?

And that's fantastic.

When different methods using different assumptions give you basically the same answer, it dramatically increases your confidence in the findings.

That's the power of using multiple approaches.

Okay.

So t -test, confidence interval, randomization, bootstrap, they covered a lot of ground.

Anything else?

Just to be thorough, they also employed non -parametric tests like the Wilcoxon signed ranks test and the signed test.

Non -parametric, meaning they don't assume a specific distribution like the normal curve.

Exactly.

They're often less sensitive to outliers or skewed data.

They provide another angle, another way to check the central tendency.

Did they agree with the others?

They did.

Both tests, the Wilcoxon came back with a Z score of 95 .67.

The signed test with Minigas 4 .61 also strongly led to rejecting the claim that the median body temperature is 98 .6 degrees Fahrenheit.

So whether you look at the mean or the median using different kinds of tests.

The conclusion was consistent.

98 .6 just doesn't seem to be the right number for the center of this data.

It's like having multiple independent witnesses all confirming the same story really strengthens the case.

Wow.

Okay.

One more method mentioned was simulation.

How did that fit in?

Simulation here was used as another way to gauge likelihood.

The idea was, let's assume the population mean really is 98 .6 degrees Fahrenheit, and let's assume the spread, standard deviation, is similar to what we found in our sample.

Okay, setting up a hypothetical world where 98 .6 is true.

Exactly.

Then you generate many hypothetical samples from this world, say 50 samples.

You see what their means look like.

What do those simulated means look like?

Well, the source data examples, they clustered around 98 .5 degrees alpha, 98 .7 degrees alpha, 98 .8 degrees alpha, values close to the assumed 98 .6.

And how did our actual sample mean of 98 .2 compare?

It wasn't even close to any of those simulated means.

It was way lower.

So the reasoning is?

If 98 .6 were true, getting a sample mean like 98 .2 would be incredibly improbable, almost impossible based on these simulations.

Since we did get 98 .2, it's far more likely that our initial assumption that the mean is 98 .6 is incorrect, rather than thinking our one actual sample was some kind of extreme statistical fluke.

Okay, this is where it gets really interesting, seeing all these different approaches,

parametric resampling, non -parametric simulation,

all pointing to the exact same conclusion.

Right.

It wasn't just one test whispering doubt.

It was a whole chorus of statistical methods basically singing 98 .6 is not the average.

That convergence of evidence is incredibly powerful in statistics.

Like different roads, all leading to the same surprising destination.

All right.

So we've prepared meticulously.

We've analyzed using a whole toolkit of methods.

Now we need to conclude.

What does all this analysis actually mean?

Right.

Interpretation is key.

First, let's talk statistical significance.

What's the verdict there?

It's crystal clear.

Every single method we discuss, the t -test, the confidence intervals, randomization, bootstrap, non -parametric tests, simulation, all consistently showed sufficient, frankly, overwhelming evidence to reject the common belief that the population mean body temperature is 98 .6 degrees hitter height.

Statistically, it's a slam dunk.

98 .6 is out.

Pretty much.

The statistical evidence is incredibly strong.

Okay.

But then there's practical significance.

Is the difference actually meaningful in the real world?

We found the mean was 98 .2 and the confidence interval was roughly 98 .1 to 98 .3.

That's a difference of maybe 0 .3 to 0 .5 degrees from 98 .6.

Does that matter?

That's a great question.

Is that difference substantial enough?

Well, the discrepancy seems to be between roughly 0 .28 degrees error and 0 .55 degrees error based on those results.

Half a degree, maybe a bit less.

Yeah.

And you could argue about whether that's big in everyday life, but in medical context, when deciding if someone has a fever or monitoring health, even half a degree can be clinically relevant.

So I'd argue, yes, this difference does appear to have practical importance.

It's not just a statistical curiosity.

Interesting.

So statistically significant and practically significant.

What about other studies?

Has this been looked at again since 1992?

Science likes replication, right?

Absolutely.

Replication is crucial.

So the 1992 MACOEX study gave us that mean somewhere between 98 .08 degrees error and 98 .32 degrees Fahrenheit.

But then fast forward.

A more recent study from 2018 by Jonathan Houseman and colleagues used a different approach, smartphone crowdsourcing.

They gathered over 5 ,000 temperatures from about 330 people.

Wow.

Modern tech.

What did they find?

Their study suggested the population mean was actually around 97 .7 degrees.

97 .7?

That's even lower.

It is.

So here's the interesting part.

Both studies strongly agree.

98 .6 degrees ferrous is incorrect.

Right.

They both debunk them.

But they don't agree on a single new true value.

One says maybe 98 .2 ish.

The other says 97 .7 ish.

So what did that tell us?

It highlights a really important nuance.

Body temperatures vary.

They vary between people.

They vary throughout the day for the same person.

There probably isn't one single normal body temperature for everyone all the time.

Our understanding is evolving, getting more refined.

OK, this is fascinating.

If 98 .6 isn't right, and maybe there isn't one single number anyway, where did 98 .6 even come from in the first place?

It had to start somewhere.

It did.

It traces back to the 19th century.

A German physician named Karl Wunderlich.

Wunderlich.

And how did he arrive at 98 .6?

Get this.

He reportedly collected data from about 25 ,000 patients.

But his method,

he used these huge like foot long thermometers and he held them in patients armpits for 20 minutes to get a reading.

20 minutes in the armpit.

With a foot long thermometer.

Wow.

Contrast that with today's quick digital often oral thermometers.

The measurement techniques are just vastly different and likely much more accurate now.

So the original number came from 19th century methods that might not have been super precise by today's standards.

Seems likely.

And yet.

And yet 98 .6 persists.

Why do you think despite all this evidence, the 1992 study, the 2018 study, the statistical rigor, so many people still cling to 98 .6 like it's absolute fact?

It's a great question.

I think part of it is just inertia.

It's been taught for so long.

It's deeply ingrained in our culture, in textbooks, doctor's offices.

It's just known.

Exactly.

It becomes a kind of medical folklore and it shows how hard it can be for scientific evidence to overturn a long established fact.

Even when the evidence is really strong, people just get comfortable with what they think they know.

This whole deep dive has been really eye opening.

So wrapping up, what are the key takeaways for listeners?

How can they apply this kind of thinking?

I think there are a few really crucial lessons here straight from the statistical playbook we followed.

First,

preliminary analysis is key.

Always, always start by questioning the context, the source, the sampling method before you accept any data or analysis.

Explore it first.

Don't just jump to the conclusion.

Check the foundation.

Precisely.

Second, and we really hammered this one, go beyond the p -value.

Don't let a single yes -no answer be the end of the story.

Look deeper.

Yes.

Look at confidence intervals to understand the range of possibilities.

Use graphs to get visual insights.

And importantly, apply a variety of statistical methods, parametric, non -parametric resampling.

Build a convergence of evidence.

That's how you avoid pitfalls and get a robust understanding.

Makes sense.

More tools give a better picture.

Definitely.

Also, related to that, consider using multiple technologies or software if you're doing analysis yourself.

Sometimes one program might have quirks or present things confusingly.

Cross -checking is smart.

Good practical tip.

And finally, the big principle in science.

Replicate, replicate, replicate.

Repeating studies with new data, comparing results like we saw with the 1992 and 2018 studies, that's how scientific knowledge gets stronger and more reliable over time.

So there you have it.

That number you've known your whole life, 98 .6 degrees

Well, the data says it's pretty much a myth and it took this whole multi -layered statistical approach to really show why, convincingly.

This deep dive into just one number, body temperature, really shows the power of critical thinking and solid data analysis, doesn't it?

It really does.

And it leads us with a thought.

If current research shows there isn't one single normal temperature and it varies by person and time of day, what does that mean for how you track your own health, how you interpret health advice, and maybe how will new technologies like wearables and continuous monitoring keep changing our understanding of something as basic as human physiology?

Something to mull over.

Thank you for joining us on this deep dive.

We hope you leave feeling not just more informed, but also a little more curious about the numbers all around you.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Comprehensive statistical reasoning emerges through investigation of a widely accepted medical claim: whether normal human body temperature truly equals 98.6°F. By examining temperature data from a 1992 study, the analysis demonstrates how sound statistical practice integrates multiple complementary perspectives to examine a single research question. The investigation begins with foundational considerations of data origin and collection procedures, then proceeds through visual exploration of distributions to identify potential anomalies before selecting appropriate analytical methods. The dataset reveals a sample mean of 98.2°F, markedly lower than the conventional standard. Parametric hypothesis testing and confidence interval estimation both contradict the 98.6°F benchmark, with confidence bounds failing to encompass this traditionally cited value. Robustness of these findings strengthens through application of nonparametric alternatives including the Wilcoxon signed-rank test and sign test, which yield consistent conclusions independent of distributional assumptions. Bootstrap resampling and randomization procedures replicate the parametric results, while simulation analysis quantifies the improbability of observing a sample mean as low as 98.2°F if the true population parameter were 98.6°F. Historical context reveals that Carl Wunderlich, a nineteenth-century physician credited with establishing the 98.6°F standard, relied on measurement instruments and techniques far inferior to modern technology. Contemporary smartphone-based measurements suggest even lower estimates near 97.7°F, raising fundamental questions about the universality of this historical benchmark. The investigation demonstrates that human body temperature encompasses natural biological variation without a single definitive normal value, and statistical evidence decisively refutes the ubiquitous 98.6°F assertion. Beyond the specific temperature finding, the chapter emphasizes core statistical principles applicable to diverse research contexts: employ multiple analytical approaches to triangulate conclusions, interpret results beyond isolated P-values, validate findings through independent methods, and carefully evaluate whether statistical significance translates to meaningful practical implications in real-world settings.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 15: Holistic Statistics

Related Chapters