Chapter 7: The Binomial and Geometric Distributions

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement, not replace, the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

You know, usually when we look out at the real world, there is this expectation of like infinite complexity.

It's just messy.

Right.

It's completely messy.

Yeah.

A business launches a new product and the outcomes feel limitless.

You could have wild success, moderate traction, maybe a slow fizzle.

Or catastrophic flop.

Exactly.

A catastrophic flop.

We really like to think of life as this massive spectrum of possibilities.

Well, I mean, that is the natural way to see things.

The real world is a gradient.

It is almost never completely black and white and, you know, human beings are wired to perceive all that nuance.

But then you step into the world of probability and mathematical modeling and suddenly to make sense of all that chaos, you have to completely break the universe down.

You do.

You essentially have to put on these incredibly restrictive glasses that only let you see two colors.

Right.

It is the absolute definition of a binary viewpoint.

You are taking the messy real world and forcing it into two strict boxes.

Yeah.

And we simply call them success or failure.

Welcome to this special tutoring session on the Deep Dive.

Today we are going completely hands on.

We really are.

So whether you are prepping for a looming exam or you are just, you know, insanely curious about how mathematicians force reality into predictable models, you are in the right place.

Absolutely.

Our mission today is to master a core set of concepts from the Cambridge International A .S.

and A level mathematics probability and statistics one course book, specifically chapter seven.

And we are going to do it with no jargon and no stress.

None at all.

We are just going to take a clear step by step walk through two incredibly powerful mathematical tools, the binomial and geometric distributions.

So returning to that idea of the binary viewpoint, that is really the foundational rule for literally everything we are going to cover today.

The success and failure thing.

Right.

To make sense of complex repeated experiments, mathematicians look at a single trial and simplify it entirely.

It genuinely does not matter what the actual experiment is at all.

Not at all.

There are only two possible outcomes we care about.

We label the outcome we are looking for as a success and literally everything else is a failure.

The text actually some great grounded examples of this.

Like think about a business investment.

You either make a profit, which is a success, or you make a loss, which is a failure, or think of a batter in a cricket match where they're either out or not out.

Exactly.

There is no such thing as being almost out.

But I have to ask, how does taking this incredibly strict yes or no view actually help us build a useful mathematical model?

That's a fair question.

Like doesn't that ignore all the interesting nuance of real life?

Well it certainly ignores the nuance, but what it buys us in return is incredible predictive power.

Okay.

If we're looking at repeated independent trials, meaning one attempt does not affect the next attempt at all and the probability of a success remains absolutely constant each time, then stripping away the nuance allows us to calculate exactly what is likely to happen over time.

We trade a little bit of real world gray area for tremendous amount of mathematical clarity.

That makes sense.

We're simplifying the rule so we can actually play the game.

And that brings us to the first major tool we use to harness that clarity.

Binomial distribution.

Yes.

The binomial distribution.

Building on that success or failure concept, this is how we count the number of successes in a strictly fixed timeline.

That is a really great way to summarize it.

For a random variable, which we usually represent with a capital X, by the way, to follow a binomial distribution, it has to meet four very strict criteria.

Okay.

Let's unpack these.

So we write this mathematically as X followed by a little tilde symbol, then a capital B, and in parentheses,

N, comma, P.

Those two letters are our vital parameters.

Got it.

X tilde B, N, comma, P.

Right.

First, there must be a finite fixed number of repeated independent trials.

That fixed numbers are N.

Wait.

So if you were sitting in an exam, the first thing you're looking for is whether you know exactly how many times you're rolling the dice.

Exactly.

Or flipping the coin or checking a cereal box.

You have to know that before you even start.

The finish line has to be set in stone.

You've got it.

Second, as we established, there can only be two outcomes, success or failure.

Third, the trials absolutely must be independent.

So the coin cannot remember that it just landed on heads.

Right.

The coin has no memory.

And fourth, the probability of success in every single trial, which we denote as a lower case P, must be constant.

It cannot change from trial to trial.

No, it can.

If all those conditions are met, the random variable X represents the total number of we get out of those end trials.

OK.

Looking at the formula we are supposed to use here, I mean, honestly, it looks like alphabet soup.

It could definitely look overwhelming at first glance.

Yeah.

I am looking at a capital X, a lowercase r, and an AP to the power of r.

It's a bit intimidating.

Can we break this apart?

Let's do it.

Let's pull it apart into two main halves.

We can start with the back half of the formula.

OK.

That is P to the power of r multiplied by 1 minus P to the power of N minus r.

Right.

If P is your chance of success, then 1 minus P is simply your chance of failure.

We actually often use the letter Q as a shorthand for failure.

Oh, so Q is just 1 minus P.

You got it.

So if you want exactly r successes,

you just multiply the probability of success by itself r times.

That gives you P to the power of r.

Exactly.

Let me make sure I'm visualizing this right.

If I am rolling a die and I want three sixes, I am multiplying my chance of a six by my chance of a six by my chance of a six.

Perfectly stated.

And because we have a fixed number of trials, N, if you have r successes, the rest of the trials absolutely must be failures.

You cannot escape them.

Right.

Because those are the only two options.

So the number of failures is just N minus r.

That is why we multiply by the probability of failure to the power of N minus r.

Wow.

It is perfectly balanced.

You get exactly the number of successes you asked for and exactly the number of failures left over.

It really is elegant.

OK, that makes perfect intuitive sense.

But what about the front of the formula?

Because that back half only assumes all the successes happen in a row, right?

Yes, it assumes a specific order.

Like success, success, success, failure, failure.

But the successes could be scattered anywhere in the sequence.

That is the missing piece.

And that is where the first part of the formula comes in.

The combinations.

Ah.

You will see it written as N choose r, which often looks like a little n over a little r inside tall parentheses.

Right, I've seen that.

This is the mathematical way of counting the number of different ways or different orders.

Those successes can be arranged among the failures.

I was actually fascinated by a historical note in the text about this.

These combination numbers, they form Pascal's triangle.

They do, yeah.

But the book points out that while it's heavily associated with the 17th century French thinker, Blaise Pascal, it was actually known in China and Persia way earlier.

Oh really?

Yeah.

There is a surviving display of it known as Jia Jian's Triangle in a work compiled by Yang Hui in the year 1261.

That is amazing.

It is a wonderful reminder that mathematics is a deeply global human endeavor.

Totally.

Those numbers in the triangle represent the coefficients for our binomial expansions.

They tell us exactly how many different branches on our probability tree lead to the specific outcome we want.

Let's test this out with a scenario so you, the listener, can see how this actually works on paper.

Great idea.

Imagine you are analyzing a population where 85 % of the people have rhesus positive blood.

Okay, so our constant probability of success, p, is 0 .85.

Exactly.

And we take a random sample of 40 people.

That is our fixed number of trials, so N equals 40.

Got it.

Now let's say a hospital planner needs to know the probability that fewer than 39 people in the sample have rhesus positive blood.

So we are looking for the probability that X is strictly less than 39.

Well, if you approach this without stepping back to look at the big picture,

you might try to calculate the probability of 0 people having it, then 1 person, then 2.

All the way up to 38.

All the way up to 38 and add them all together.

That sounds like a complete nightmare.

If you are sitting in an exam, punching those numbers into your calculator to find 38 different probabilities is going to eat up half your time.

It is entirely inefficient, which is why we use the strategy of the complement.

The complement, right.

Remember that all the probabilities in any distribution must add up to exactly 1.

It represents 100 % of the possibilities.

Okay.

So instead of calculating the 39 different ways to get fewer than 39 people, we look at what is left over.

Right.

What is the complement of fewer than 39?

The only things left would be 39 or more.

So exactly 39 people or exactly 40 people.

There you go.

So you calculate the probability that X equals 39 using that binomial formula we just broke down.

Right.

Then you calculate the probability that X equals 40.

You add those two tiny probabilities together and you simply subtract that sum from 1.

Doing the math from the text, the probability of exactly 39 is about 0 .0106 and exactly 40 is about 0 .001.

So 5, add them up, subtract from 1 and you get 0 .988 or a 98 .8 % chance.

Spot on.

But I need to jump in with a massive warning here straight from the textbook's tip box.

Oh, this is important.

Do not round your probabilities prematurely.

That is a critical point that trips up so many students.

If you round those tiny intermediate probabilities to just two decimal places while you are still working through the steps, your final answer is going to be completely thrown off.

Absolutely.

You have to keep the full string of numbers in your calculator until the very final step of the problem.

That premature rounding is how you drop easy marks.

It really is the difference between a model that works and a model that crashes.

Now, finding the probability of exact events is a powerful tool.

It is.

But let's go back to your hospital planner.

In reality, a hospital planner does not usually need the exact probability of 39 specific people walking through the door.

No, they just need to know roughly what to expect on an average day.

Right.

So how do we shift from exact probabilities to general expectations?

We want to find the central tendency, you know, the middle of the target.

For a binomial distribution, the formula for expectation, which is the mean denoted by the Greek letter mu, is wonderfully simple.

Okay, let's hear it.

It is just n multiplied by p, the number of trials times the probability of success.

Wait, let me make sure this is as intuitive as it sounds.

So if I have an ordinary fair die, the chance of rolling a 6 is 1 in 6.

If my fixed number of trials is 60 rolls, my expectation is just 60 multiplied by 1 6, which is 10.

I should expect to roll a 6 10 times.

It really is that simple.

It is exactly that simple.

That is your expectation.

But of course, if you actually sit down and roll a die 60 times, you probably will not get exactly 10 sixes.

Right.

I might get 8 or I might get 12.

Exactly.

That wobble away from the average is what we call variance.

So if the average is just n times p, I imagine calculating the variance, that wobble you mentioned, must factor in the chance of failure too.

You are right on the money.

The variance denoted by sigma squared is n times p times 1 minus p.

Or if you use q for the probability of failure, it's just n times p times q.

Precisely.

Wait, I want to pause and think about the implications of that.

If variance is n times p times q, that means the wobble changes depending on how likely the event is.

Well, if an event has a 99 % chance of happening, p is 0 .99, but q is tiny, 0 .01.

So when you multiply them together, the variance shrinks drastically.

That is a brilliant observation.

Why does it shrink?

Because if you have a 99 % chance of success, you are almost completely guaranteed to get a number of successes very close to your total number of trials.

Oh, right.

There is very little room for surprise.

The greatest variance, the widest wobble, happens when p is exactly 0 .5, like a coin flip, because that is the state of maximum uncertainty.

Okay, I love this because the textbook has this brilliant algebra puzzle that really tests if you understand how these two formulas relate to each other.

I know exactly what you mean.

It's reverse engineering a distribution.

Yes.

Let's say we have a random variable x following a binomial distribution, but we do not know n or p.

We are flying completely blind.

Okay.

All we are told is that the expectation is 12 and the variance is 7 .5.

And from that, we have to find both n and p.

It seems impossible at first.

Wait, how does that work?

You're telling me we can find the total number of trials and the probability of success just from the average and the wobble.

We can.

We have two missing variables.

Doesn't that usually require like a complex system of equations?

It looks like it should, doesn't it?

But let's look at the formulas side by side.

This is where mathematical logic exposes the hidden parameters.

Okay, I'm ready.

We know the formula for variance is n times p times q, and we know the formula for expectation is n times p.

Do you see the relationship between the two?

Ah, the variance formula literally has the expectation formula hidden inside of it.

It does.

Because n times p times q is really just the expectation n times p multiplied by q.

There is the trick.

So if you divide the variance by the expectation, the n and the p mathematically cancel out entirely.

And you are left with just q, the probability of failure.

Exactly.

That is such an elegant workaround.

So we just divide our variance, 7 .5, by our expectation, 12, that gives us 0 .625.

So q, our chance of failure, is 0 .625.

And since success and failure have to add up to 1, p must be 1 minus 0 .625.

So p is 0 .375.

We found the probability.

Now once you have the probability p, finding the number of trials n is trivial.

Right, because we know expectation is n times p, which equals 12.

So n times 0 .375 equals 12.

Divide 12 by 0 .375, and we discover that n is 32.

We completely reverse engineered the entire scenario just from knowing the average and the variance.

It is incredibly satisfying when you realize how these concepts interlock.

But you know, we have to acknowledge that so far, we have only been operating under the assumption that we know the finish line.

That's true.

We have been working with a fixed number of trials.

Right, but life isn't always like that.

Yeah, if you are playing a game to win a prize, you don't necessarily say, I'm going to play exactly five times.

You say, I'm going to keep playing until I finally win.

Exactly.

What happens when we do not have a fixed number of trials and we are just waiting for that first success?

Then you leave the binomial distribution behind entirely.

You do.

And you enter the geometric distribution.

The waiting game.

Exactly.

In a geometric distribution, denoted as X following GOP, we are modeling the number of trials up to and including the very first success.

OK, so we are assuming we have an infinite number of independent trials available to us.

Right.

We keep going and going until we hit a success and then we stop immediately.

Because there is no fixed N, the formula has to look different.

And honestly, it is surprisingly simple compared to the binomial formula.

It really is.

To find the probability that our first success happens on exactly the Rth trial, the formula is just P multiplied by 1 minus P to the power of R minus 1.

So let's connect this to the bigger picture.

The logic here is ironclad.

Think about what it actually means to get your first success on, say, the fifth try.

Well, it means the first four tries had to be absolute failures.

There is no other way for the fifth try to be the first success.

Precisely.

If your first success is on the Rth attempt, it absolutely guarantees that your first R minus 1 attempts were failures.

So mathematically, you just calculate the probability of a streak of R minus 1 failures.

Which is 1 minus P to the power of R minus 1.

And then you multiply it by the probability of the one success at the very end, P.

That's all it is.

It's just a straight line of failures hitting a solid wall of success.

Beautifully put.

And because this is an infinite distribution, the text gives us some really vital problem solving shortcuts.

Oh, I love a shortcut.

If you're taking a test, calculating probabilities to infinity is impossible.

So what if you need to find the probability that it takes more than raw trials to get a success?

Like, what is the chance it takes me more than five rolls to get a six on a die?

Right.

If it takes more than five rolls to get your first success, what does that practically tell you about those first five rolls?

That they were all failures.

Every single one of them.

There is your shortcut.

The probability that X is strictly greater than R is simply the probability of failing

Oh, wow.

So it's just a Q to the power of R.

Exactly.

You do not need to calculate infinite probabilities for roll six, seven, eight, all the way to infinity and add them up.

You just calculate the chance of enduring a losing streak of length bar.

That saves so much time on an exam.

Okay, now we need to talk about the quirks of the geometric distribution because this next concept completely messed with my intuition when I first read it.

I think I know what's coming.

So I am going to pose a riddle to you, the listener, right out of the textbook.

In a geometric distribution, what is the most likely attempt for you to get your first success?

Is it the attempt right near your expectation?

You would think so.

But what's fascinating here is that the mode, which is the single most probable outcome of every single geometric distribution, is one.

You are always most likely to get your first success on the very first try.

Okay, admit it.

That sounds wrong.

It does feel wrong.

Think about buying a scratch -off ticket.

If the chance of winning is only 5%, how can my most likely winning attempt be my first try?

It feels like it should be my 20th try.

It definitely feels counterintuitive until you look closely at the sequence of the math.

Remember, the probabilities in a geometric distribution form a decreasing geometric progression.

Okay, let's use your scratch -off example.

The probability of succeeding on the first try is 0 .05.

Right.

And to find the probability of your first success happening on the second try, you have to factor in failing the first time.

So it is 0 .95 times 0 .05.

And 0 .95 times 0 .05 is smaller than 0 .05.

That is the core of it.

If you multiply any positive number by a fraction less than 1, it gets smaller.

Exactly.

The probability of succeeding on the third try is the second try multiplied by 0 .95 again.

It gets even smaller.

Because every subsequent attempt requires another failure to happen first.

Right.

So the probabilities strictly decrease forever.

Therefore, the very first term, the first try, where absolutely zero failures have to occur beforehand, is always the highest probability.

It's a continuous downward slope.

So the single most likely event is that you win immediately, simply because winning on attempt 50 requires you to navigate an incredibly unlikely minefield of 49 consecutive losses first.

That makes so much sense now.

But hold on.

If the mode is always 1, what about the expectation?

The average.

Oh, the expectation for a geometric distribution is simply 1 divided by P.

So it is just the reciprocal of the probability of success.

Let's ground this with a classic scenario.

Imagine a cereal company, Zingo, puts a free toy in one out of every four boxes.

Okay.

So our chance of success, P, is one -fourth.

Let's look at the mode versus the expectation here.

Well, as we just established, the mode is 1.

If you start opening boxes of Zingo, the single most likely box to contain your first toy is the very first box you open.

But the expectation is 1 divided by one -fourth, which is 4.

So on average, over the long term, a child will need to open four boxes to find a toy.

This beautifully illustrates the difference between the most likely single event,

the mode, box 1, and the long -term average, the expectation, four boxes.

It perfectly separates what is most probable right now versus what will happen on average if you repeat this process thousands of times.

So what does this all mean for you, the learner?

Let's take a step back and recap this journey.

Good idea.

We started by putting on our binary glasses, looking at the complex world only in terms of strict success and failure.

We learned how to use the binomial distribution to count our successes when we are operating on a fixed timeline with a set finish line.

Right.

When we know when.

And we learned how to use the geometric distribution to model the waiting game, calculating how long it will take to hit that first success when our trials are limitless.

Limitless trials, yeah.

And this raises an important question, a final philosophical thought to leave you with as you close the textbook.

Oh, I love a good philosophical thought.

The mathematics of the geometric distribution inherently assumes that we have an infinite number of trials available to achieve our first success.

Right.

That is the core assumption.

Mathematically, because that probability never truly reaches zero.

If you just keep trying, success is virtually inevitable.

But you know, in the real world, our resources are not infinite.

Our time is limited, our patience wears thin, our budget runs out.

Eventually, there are simply no more boxes of zingo cereal on the shelf.

Right.

The math assumes infinity,

but reality enforces strict limits.

Exactly.

So the provocation for you to think about is this.

How do we balance mathematical certainty with real world constraints?

That's a great question.

When we use statistical models that assume infinite attempts, how do we factor in the reality that at some point we simply have to stop trying?

It is a great question to mull over the next time you are relying on probability, hoping for that success outcome.

It really brings us right back to the beginning, doesn't it?

The math is perfectly clean, perfectly logical, and beautifully predictable.

But the real world is still beautifully messy.

Thank you for joining us for this special deep dive into probability distributions.

On behalf of the Last Minute Lecture Team, happy studying, and we'll see you next time.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Discrete probability models for binary outcome situations fall into two essential categories depending on whether the number of trials is predetermined. The binomial distribution applies to fixed-sample scenarios where a specific number n of independent trials are conducted, and the analysis centers on counting the total number of successes achieved. Each trial must have exactly two possible outcomes—success or failure—with constant success probability p maintained throughout all trials. Computing the probability of obtaining precisely r successes involves combining the binomial coefficient with probability calculations, allowing practitioners to determine the likelihood of any particular success count. Summative measures like the mean and variance can be derived directly from the parameters n and p, providing efficient tools for prediction without exhaustive enumeration. The geometric distribution addresses a fundamentally different question: how many consecutive trials are necessary before the first success materializes? This model applies when trials persist indefinitely rather than stopping at a predetermined point, making it particularly useful for analyzing waiting time scenarios. The underlying conditions remain identical to the binomial case—independence and constant probability—but the variable nature of trial length creates a different mathematical structure. The probability that success first occurs on trial r requires exactly r-1 failures followed by the success on trial r, producing a formula whose values decrease as r grows larger. Complementary probability techniques provide efficient computational pathways for cumulative calculations rather than summing individual probabilities. The expected waiting period until first success equals the reciprocal of the success probability, a clean relationship absent in many other distributions. The mode consistently equals one, reflecting the reality that longer waits become progressively less probable. Together, these distributions provide complementary analytical frameworks suited to different probabilistic structures encountered in applied contexts and theoretical analysis.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 7: The Binomial and Geometric Distributions

Related Chapters