Chapter 10: The Binomial Distribution

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

So, imagine the universe is just secretly slipping billions of microscopic coins right now and, well, those invisible coin flips are the exact reason we can confidently measure anything at all in a physics lab.

I mean, it sounds completely like science fiction, right?

Totally.

But today, we're going to prove why that is, like a mathematical reality.

So, welcome to the Deep Dive.

Glad to be here.

Yeah, and our mission for this session is to basically conquer Chapter 10 of Introduction to Error Analysis.

So, for anyone listening, just consider this your personalized, you know, one -on -one tutoring session.

Exactly.

We are taking all the concepts surrounding the binomial distribution and really breaking them down.

Right, so that by the time you sit for your next exam or write up your lab report, you actually understand the mechanics of what's happening under the hood of your data.

And to ground this for you, think back to the core concept of a distribution, which you encountered earlier in your textbook.

That was back in Chapter 5, I think.

Yeah, Chapter 5.

A distribution simply describes the spread of answers you get when you repeat a measurement over and over.

Okay, so, like, if you measure the voltage of a circuit 50 times,

a distribution maps out how those 50 answers sort of scatter around a central value.

Precisely.

And we also explored the theoretical ideal, which is the limiting distribution.

Right, the limiting distribution.

That's the mathematical shape your data would take if you could somehow make an infinitely large number of measurements without your equipment breaking down.

Okay, but I'm struggling a bit to see the specific path here, though.

I mean, we already know that the

Gaussian distribution.

The classic bell curve.

Yeah, that elegant bell curve.

We know it's the most practical limiting distribution for an experimental physicist.

It's like the star of the show.

It absolutely is.

So, if Gauss is the ultimate destination, why are we dedicating an entire session to its humbler cousin, the binomial distribution?

Like, why stop to look at the bricks when we already know what the building looks like?

Well, because the bricks dictate the integrity of the building.

I mean, the binomial distribution might seem simple because it deals with highly constrained scenarios.

Like flipping coins or rolling dice?

Exactly.

But its theoretical power is massive.

By understanding the mechanics of a simple coin flip, we can mathematically derive the Gaussian bell curve from scratch.

Oh, wow.

So we don't just have to accept that experimental errors form a bell curve.

Right.

The binomial distribution allows us to prove exactly why the complex universe of random errors behaves the way it does.

Okay, going from a literal coin flip to the fundamental nature of experimental error is a massive leap.

So let's build that bridge.

Let's do it.

I think we should start moving from these big abstract ideas to something super concrete.

Let's roll some dice.

That's good.

The textbook sets up a very specific thought experiment.

We throw three standard dice and our goal is to count how many aces, which just means rolling a one show up.

Right, just ones.

And since we only have three dice, the possible outcomes are pretty limited.

We could get zero aces, one ace, two aces, or three aces.

And calculating the extreme case is a great warm up here.

Think about the physical reality of the dice.

They don't interact at all.

Yeah, the first die doesn't care what the second die is doing.

Exactly.

Because they are entirely independent events, getting three aces means the first die must be an ace, and the second and the third.

And the chance of a single die landing on an ace is, well, one in six.

Right.

And mathematically independent probabilities multiply.

You take one sixth for the first die, multiply it by one sixth for the second, and again by one sixth for the third.

Which is one over 216, right?

Correct.

That gives you roughly a 0 .5 % chance.

It is a highly improbable event.

Yeah, that makes intuitive sense.

But calculating the probability of getting exactly two aces feels like a trap for a student.

How so?

Because it's not just a simple multiplication problem anymore.

Ah, right.

It requires a shift in how you frame the problem.

You need a two -step argument here.

Okay, walk me through it.

First, imagine a very specific sequence.

You roll the first die, and it is an ace.

You roll the second, and it is also an ace.

But the third die is not an ace.

Exactly.

The probability for those first two rolls is one sixth squared.

But that third die, the failure die, has a five in six chance of being any other number.

Right, because it could be a two, three, four, five, or six.

Right, so the mathematical probability of that exact sequence, ace, ace, not ace, is one sixth the squared multiplied by five sixths.

Okay, let me pause you there and unpack this.

So the one sixth squared represents our successful aces, and the five sixths represents our single failure.

You got it.

But since we roll them all at once, that failure die could actually be the very first die we rolled, or the second, or the third.

Yes, that the order matters.

It reminds me of a shell game.

The not ace has like three different places it can hide among the three dice.

We can't just calculate one sequence.

We have to account for all three hiding spots.

Let's take that shell game analogy further, because it perfectly captures the mechanics of permutations.

Okay.

If you don't account for every possible order of successes and failures, your probability will be artificially low.

Because you're missing valid outcomes.

Exactly.

Because there are three distinct ways this combination can occur.

You multiply your initial calculation by three.

So the final equation for getting exactly two aces is three times one sixth squared times five sixths.

Run those numbers, and it jumps to about a 6 .9 % chance.

That's way higher than 0 .5%.

It is.

And following that same logic for all possible outcomes, you find a 34 .7 % chance of getting exactly one ace.

And a massive 57 .9 % chance of getting zero aces, right?

Precisely.

So if you were to draw this out as a bar graph, it paints a very clear picture.

You'd have this massive skyscraper at zero aces, a mid -size building at one ace, a tiny shed at two aces, and practically a pebble at three.

That's a great way to visualize figure 10 .1 from the text.

It gives you a great feel for the lopsided nature of these specific odds.

But obviously, we can't manually chart out shell games for every single experiment.

If we are testing 100 firecrackers to see if they ignite or analyzing thousands of particle collisions, we need a master formula.

We do.

So how do we generalize this beyond just three dice?

We need to formalize the terminology first to build that master formula.

Imagine you're running a certain number of independent trials, which we will call an annuller.

Okay, annuller is the number of trials.

And for every single trial, there are only two possibilities.

There is a success, which happens with a probability we call nullers.

And there is the failure, which happens with a probability we call nonnullers.

Because those are the only two options, they must add up to 100%.

Or just one in math terms.

Exactly.

Therefore, nullers is always simply one minus two doll.

So the formula the textbook introduces is meant to find the probability of getting exactly one non along the Greek letter note of successes out of not trials.

Yes, the binomial distribution formula.

And I have to push back here on behalf of anyone looking at this math for the first time.

The formula starts with this massive fraction made entirely of factorials.

Ah, yes.

Null of factorial over the product of not a factorial and a nine as a not a factorial.

Yeah, to a student, that just looks like terrifying alphabet soup.

Can we break down the actual engine of this formula?

Like what is it functionally doing without reading the raw math?

The best way to view that intimidating formula is to split it into two distinct machines working together.

Two machines, okay.

The second half of the formula handles the pure probability of your specific streak.

It takes the probability of your successes with the power of a new and multiplies it by the probability of your failures to power of new minus a new.

Okay, so that's just the one sixth squared times five sixths part of our dice roll.

Exactly.

But as we establish with the dice, a streak can happen in many different orders.

Right, the shell game.

So that first half of the formula, the alphabet soup with all the factorials, is just a giant mathematical sorting machine.

Oh, I see.

It calculates every possible order those successes and failures could occur in, ensuring nothing gets missed.

It's just a generalized version of you multiplying by three for this shell game.

That reframes it perfectly.

It's not trying to confuse you.

It's just doing the heavy lifting of counting the combinations for you.

And if we ground this in a simpler scenario, like tossing four coins, the mechanics become really highly visible.

With a coin, the probability of success is perfectly balanced with failure.

Yes.

Keel A equals one half and two day equals one half.

When you graph the probabilities for getting zero, one, two, three, or four heads, the result is aesthetically beautiful.

Because it's perfectly symmetrical.

Right.

Figure 10 .2 shows this perfectly.

Exactly.

Because PLA are completely equal, the resulting bar graph is perfectly symmetrical.

It peaks right in the middle at two heads and drops off in a perfect mirror image on either side.

But you know, laboratory physics rarely gives us perfect 50 -50 odds.

What happens to the shape of our data when the odds are heavily skewed?

Like trying to roll a specific number on a die.

Yeah.

And more importantly, what happens to our data when we run hundreds or thousands of trials instead of just four?

To understand that transition, we have to look at the properties that define the shape of our data.

Two vital metrics come into play here.

Okay.

What's the first one?

First is the expected average number of successes.

That's simply your number of trials multiplied by your probability of success.

So nine times tellers.

Got it.

So if you run 100 trials with a 25 % chance of success, you expect an average of 25 successes.

Exactly.

The second metric is the standard deviation, which measures how widely your actual results will spread out around that average.

Which the book defines as the square root of nine times time times tallers.

Right.

And if our chance of success is only 25%, our graph is going to look deeply asymmetric, right?

Yeah.

It'll lean heavily to one side.

It starts out deeply asymmetric.

If you look at figure 10 .3, it shows exactly this.

If you only run three trials with a 25 % success rate, the graph looks like a steep lopsided mountain leaning heavily to the left.

Right.

Because zero or one success is way more likely than two or three.

But here's the phenomenon that fundamentally shapes modern statistics.

As you increase the number of trials going from three to 12 to 48 and beyond, the binomial distribution undergoes a mathematical metamorphosis.

It changes shape entirely.

Yes.

Those jagged, discrete lopsided steps begin to smooth out.

The sheer volume of data forces the graph to stand up straighter and widen.

Wow.

As your trials grow massive, that lopsided binomial graph is virtually swallowed up and perfectly hugged by the smooth, continuous symmetrical Gaussian bell curve.

So it essentially forces itself into symmetry just through the sheer weight of numbers.

It does.

But I want to challenge the practical application here.

Why is this metamorphosis actually useful for a student sitting in a lab?

I mean, we have computers now.

Does knowing that a binomial graph approximates a Gaussian curve just save us from doing nightmarish math on a calculator?

It certainly saves you from the nightmarish math.

And honestly, that is not trivial.

How bad can it be?

Well, imagine trying to find the exact binomial probability of getting exactly 23 heads out of 36 coin tosses.

Okay.

To run that through the binomial formula, your mathematical sorting machine requires you to calculate 36 factorial.

Oh, wait.

That's 36 times 35 times 34 all the way down to 1.

Yes.

Even modern scientific calculators often throw an overflow error when you ask them to compute a number that massive.

Because the number of possible combinations is just astronomically huge.

Unfathomably huge.

But because we know the binomial distribution morphs into the Gaussian curve,

we can bypass the factorial entirely.

By using the mean and standard deviation.

Exactly.

We simply calculate our expected mean, which is 36 times 1 half, giving us 18.

Okay.

Mean is 18.

And our standard deviation is the square root of 36 times 1 half times 1 half, which is 3.

Nice.

Clean numbers.

We then look at our target of 23 heads.

We see that 23 is exactly 5 heads above our expected mean of 18.

Divide 5 by our standard deviation of 3.

And we know our target is roughly 1 .5 standard deviations away from the center of the bell curve.

Oh, I see.

So instead of calculating billions of combinations.

You just look at a standard Gaussian table in the appendix.

Find 1 .5 standard deviation.

And it instantly tells you the area under the curve represents a 6 .7 % probability.

As shown in figure 10 .4.

It takes an impossibly tedious calculation that would literally break a machine and turns it into a quick conceptual lookup.

That is incredibly practical.

It's the bedrock of statistical analysis.

But let's take this back to the physical lab space.

We've proven that large numbers of coin flips or dice rolls turn into a Gaussian bell curve.

How does this abstract probability connect back to actual physical measurements?

We mean.

Well, what do coin flips have to do with reading a spectrometer?

This is where we fulfill the promise of why we study the binomial distribution in the first place.

I want you to imagine measuring a physical quantity in the lab.

And let's say its true perfect objective value is $6.

Okay, $6 is the absolute truth.

Now assume your equipment is perfectly calibrated so there is no systematic error.

But you do have numerous independent sources of random error.

Give me a concrete sense of those random errors.

We're talking about things like a parallax when you're trying to read a needle on a dial.

Or a tiny fluctuation in the ambient room temperature.

Or even your own slight neurological delay in hitting a stopwatch.

Exactly.

Let's assume there are dozens of these microscopic error sources acting on your measurement simultaneously.

And let's say each individual error source pushes your final measurement up or down by a fixed incredibly tiny amount.

We'll call it epsilon.

And crucially, because they are random, let's assume each tiny error has an exact 50 -50 probability of pushing the measurement too high or pushing it too low.

I see where this is going and it is fascinating.

The random errors in our physics lab are literally just microscopic coin flips.

They are.

So every time I measure the swing of a pendulum, a tiny fluctuation in air current flips the coin heads.

It pushes my measurement slightly higher.

Tails, it pushes it slightly lower.

That is precisely the mechanism driving physical reality.

Your final recorded measurement is nothing more than the true value $6 plus the net result of all those tiny independent coin flips.

Some push it up, some push it down, and they constantly battle each other.

The probability of getting a specific net error is perfectly described by the binomial formula we just broke down.

So because our measurements are just a collection of independent 50 -50 microscopic coin flips.

As the number of those microscopic error sources becomes huge and the physical size of each individual error becomes infinitesimally small, the discrete steps of the binomial graph seamlessly transition into the continuous Gaussian curve.

Just like figure 10 .5 shows, going from one male and one all the way to one alien 3200 too.

It is the ultimate proof.

Experimental measurements naturally form a normal distribution because they are the macroscopic result of billions of microscopic binomial events.

That is an incredibly satisfying realization.

The universe is just flipping infinite tiny coins and the bell curve is just the shadow those coins cast on our data.

That's a very poetic way to put it, but mathematically accurate.

But what do we do with this power?

Now that we understand exactly how our experimental errors are distributed, how does a scientist use this knowledge to test if their hypotheses about the real world are actually true?

Well, we use this mathematical foundation to make highly objective statistical decisions about claims.

Removing human bias entirely.

This introduces the testing of hypotheses.

Right, section 10 .6.

The text uses a highly practical example,

evaluating a new ski wax.

Oh yeah, I love this example.

A manufacturer claims their new chemical wax significantly reduces friction compared to normal skis.

As scientists, we cannot just take their word for it, so we set up a controlled test.

Obviously.

We take 10 pairs of skis, treat one ski from each pair with the wax, leave the other untreated, and race them down a hill.

But before we run the races, we have to establish a baseline assumption, right?

Yes, and that baseline is called the null hypothesis.

It is the bedrock of scientific skepticism.

Okay, so what is it in this case?

The null hypothesis assumes that the manufacturer is completely wrong, that the new wax is useless and makes absolutely no difference.

Got it.

If it truly makes no difference, then every single race is essentially a coin flip.

The probability of the treated ski winning a race is exactly 50%.

So we run the races.

If the wax skis win all 10 out of 10 races, our binomial formula tells us the probability of that happening purely by random chance is ½ to the 10th power.

Which is roughly .1%.

Since the 1 in 1000 chance is absurdly unlikely, we can confidently reject the null hypothesis and say the manufacturer is telling the truth.

The wax works.

Correct.

But science rarely gives us 10 out of 10.

Suppose the wax skis only won 8 out of the 10 races.

I want to challenge the rigor here.

To any student, an 80 % win rate feels like a solid victory.

I mean, it's a B minus.

True.

If I win 8 out of 10 races, my human intuition screams that the wax is clearly doing something.

Why doesn't the textbook consider an 80 % win rate strong enough evidence to confidently back the manufacturer's claim?

Because human intuition is a terrible statistical instrument.

We are wired to see patterns where none exist.

We cannot just look at the isolated probability of getting exactly 8 wins.

Wait, why not?

To understand if the wax is a fluke, we have to look at the probability of getting 8 or more wins purely by chance.

Ah, so 8 wins plus the chance of 9 wins plus 10 wins.

Yes.

When you calculate the binomial probabilities for 8 wins, 9 wins and 10 wins and add them all together, the total chance of this happening randomly is 5 .5%.

And 5 .5 % feels small, but it crosses a very specific line in the sand.

Exactly.

By scientific convention, we generally draw a hard boundary at 5%.

This is our threshold for significance.

The famous 5 % boundary.

We demand that the chance of a fluke be less than 5 % before we are willing to rewrite the textbooks or validate a commercial claim.

So because a 5 .5 % chance of a fluke is greater than our 5 % boundary?

The risk of a false positive is simply too high.

The result is not statistically significant.

We cannot definitively say the wax works.

Man, that is a harsh grading curve.

It is, but is the only way to protect the scientific record from noise.

It forces us to be humble and demand more data.

The text scales this up with a final example, using an election poll, which really drives home why we need the Gaussian approximation.

Let's hear it.

Imagine a politician claims they have 60 % support from the electorate.

We want to test this hypothesis, so we poll a random sample of 600 people.

When the results come back, only 330 people support the politician.

Let's run the null hypothesis on the politician's claim.

If their 60 % claim were genuinely true, we would expect our poll of 600 people to yield a mean of 360 supporters.

Right, 60 % of 600.

Using the standard deviation formula for a binomial distribution, we calculate that the spread, or standard deviation, should be 12.

But our actual result was 330.

That is 30 votes below what the politician's claim predicted.

And this is where we evaluate the distance from the truth.

30 votes is exactly 2 .5 standard deviations below the expected mean of 360.

Because 30 divided by 12 is 2 .5.

Yes.

Using the smooth Gaussian curve to approximate this massive binomial problem, we find that the probability of getting a result 2 .5 standard deviations away from the center is astronomically low.

How low?

Only about 0 .6%.

Which completely shatters our 5 % threshold.

It is deeply unlikely that a true 60 % popularity would result in a poll this low, just by random chance.

It does.

And the text makes an important conceptual distinction here, regarding how we frame our skepticism.

There is a concept called a one -tailed probability.

One -tailed?

Yes.

Where you only care about deviations in one specific direction.

For instance, if you only care whether the politician is lying about being popular, you only look at the probability of the results being 30 votes below the mean.

Okay, that makes sense.

But there is also a two -tailed probability.

If you are a statistician trying to figure out if your polling methodology is completely broken,

you would care if the results deviated drastically in either direction, above or below the mean.

And figure 10 .6 shows exactly this visual difference.

But regardless of whether you look at one tail or two tails, 0 .6 % is so incredibly small that we can confidently reject the politician's claim.

Absolutely.

The math simply does not support their reality.

So for everyone listening, take a breath and just look at the conceptual journey you've just taken.

We started by simply throwing three dice and realizing that independent events multiply.

Right.

We decoded the intimidating binomial formula, revealing it as just a two -part engine,

one -part tracking streaks, and one -part sorting combination.

The sorting machine.

Yeah.

We watched how scaling up our trials forced jagged data to smooth out into the famous Gaussian bell curve.

The metamorphosis.

And ultimately we discovered that the microscopic random errors in the physical universe are nothing more than independent coin flips, constantly battling each other to create that bell curve.

It's all connected.

That fundamental mathematical truth gives you the framework to objectively test reality, allowing you to slice through human bias and reject false claims.

Whether it's about ski wax or politics, everything builds on the power of independent probabilities.

The beauty of the math is undeniable.

But as you prepare to head back into the lab, I want to leave you with a more philosophical question to consider regarding your own experimental designs.

Oh, lay it on us.

We have seen how independent random errors beautifully balance each other out into a perfect bell curve, allowing us to find the true value of a measurement.

But what happens if your errors are not independent?

Wait, like what?

What if one tiny mistake in your setup, like a miscalibrated scale or skewed thermometer,

systematically forces every single subsequent measurement to be slightly too high?

Oh, a systematic error.

Exactly.

The math of the universe is perfectly balanced, but it only works if our tools and our methods are truly impartial.

If you skew the coin, the math cannot save you.

You can calculate the probabilities perfectly, but if your thermometer is broken, your truth is broken.

That is a brilliant warning to keep in mind.

We hope this deep dive into the binomial distribution helped clear the muddy waters, giving you the tools to not just plug numbers into formulas, but to actually understand the reality those formulas describe.

Good luck with the math.

Yes.

From the last -minute lecture team here at The Deep Dive, thank you so much for joining us for this session, and best of luck on your exam or lab report.

You've got this.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

The binomial distribution describes random experiments comprised of independent trials where each outcome falls into one of exactly two categories, with probability p denoting success and probability q equal to 1 minus p denoting failure. The fundamental binomial probability formula combines the binomial coefficient, which counts all distinct arrangements of successes and failures, with the probabilities raised to their respective powers, yielding the likelihood of observing precisely k successes across n trials. The distribution possesses a calculable mean of np and standard deviation equal to the square root of npq, with the distribution shape remaining symmetric only when p equals 0.5; asymmetry emerges for all other probability values. A significant mathematical property emerges as sample size increases: the discrete binomial distribution progressively approaches the shape of a continuous normal or Gaussian distribution centered at np with spread determined by the standard deviation. This convergence permits statisticians and researchers to substitute normal distribution calculations for binomial computations when sample sizes are sufficiently large, streamlining otherwise complex probability assessments. The underlying reason for this convergence relates to how measurements subject to numerous small, independent random errors naturally approximate the normal distribution; as error sources multiply and individual contributions diminish, the resulting distribution gravitates toward the Gaussian form. Statistical hypothesis testing employs the binomial framework by establishing a null hypothesis that specifies an assumed event probability, calculating the expected number of successes under this assumption, and contrasting actual observed outcomes against these expectations. Determining statistical significance involves comparing observed results to critical probability thresholds, conventionally set at 5 percent or 1 percent, which indicate how unlikely the observed data would be if the null hypothesis were true. The distinction between one-tailed and two-tailed probability testing reflects whether the alternative hypothesis predicts deviation in a specific direction or in either direction, fundamentally changing which outcome regions qualify as extreme or improbable under the null hypothesis.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 10: The Binomial Distribution

Related Chapters