Chapter 6: Probability Distributions

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

You know, when a massive company launches a global marketing campaign or like a trading company decides to spin up a totally new line of business, from the outside it just looks like a massive leap of faith.

Oh yeah, like they're just jumping into the dark.

Exactly.

We see them pour millions of dollars into these new projects and you naturally assume they either have a corporate crystal ball or they're just hoping for the best.

It definitely looks that way when you only see the final decision, but I mean the reality is way less mystical.

They aren't leaping into the dark at all.

Right, they're navigating with, well, a highly specialized mathematical radar.

Exactly, radar built on probability.

And that radar is exactly what we are building with you today.

Welcome to this Last Minute Lecture Deep Dive.

Our mission today is to act as your one -on -one tutor and completely unpack chapter six on probability distributions from the Cambridge International AS and A level mathematics course book.

Yeah, we're going to explore how you can, you know, map out chaos, quantify your risks, and actually predict long -term outcomes.

Which is such a powerful tool because the text sets up this perfect scenario right at the start.

It asks,

how does a trading company know if a new business line will hit the $50 ,000 revenue threshold they need to turn a profit?

Right, they can't just guess.

They have to measure that risk.

They have to map out every single scenario and assign a precise weight to each one.

And to do that, they build a probability distribution.

So here's our roadmap for today.

First, we'll identify the playing pieces, which we call discrete random variables.

Then we'll build the board itself, which is the probability distribution table.

From there, we figure out how to predict the long -term outcome, which is called expectation.

And finally, we'll measure the risk or the spread of those outcomes using variance.

Sounds like a solid plan.

So before we can map out any probabilities, we really have to define what it is we are measuring.

Right, we need boundaries.

Exactly.

Which brings us to our first core concept, the discrete random variable.

Now, I know the phrase discrete random variable sounds a bit dense, but if you just break the words down, it makes perfect sense.

It really does.

Let's look at discrete and random.

In math, a variable is discrete if it can only take on specific countable values.

There are hard stocks between the possibilities.

Right, there's no middle ground.

And the random part just means that the specific value that occurs in any given trial happens completely by chance.

Yeah, exactly.

I always like to explain discrete variables by comparing them to shoe sizes, like comparing your shoe size to the actual physical length of your foot.

Oh, that's a great analogy.

Right, because the length of your foot is continuous.

It could be 10 inches or 10 .1 inches or 10 .1234 inches.

It's infinite.

But your shoe size is discrete.

You can buy a size 8 or an 8 .5 or a 9.

Exactly.

You can't walk into a shoe store and ask for a size 8 .314.

Your foot has to fit into one of those specific predefined buckets.

And those predefined buckets are the whole foundation of what we're doing.

The textbook actually uses a really relatable example.

Yeah.

Buying a carton of six eggs.

Oh, yeah.

The broken eggs?

Right.

When you open that cardboard lid,

the number of broken eggs inside is a discrete random variable.

The buckets are rigidly defined.

You can only have zero, one, two, three, four, five, or six broken eggs.

Because you can't physically have 2 .5 broken eggs.

Exactly.

Or, take another example from the text, rolling four dice.

If you want to count the number of sixes you get, that count is a discrete random variable.

Because, again, the universe of outcomes is closed.

You roll zero sixes, one, two, three, or a maximum of four.

There is simply no other possibility.

Which naturally brings us to the mathematical notation used in the book.

And you'll see a deliberate mix of capital and lowercase letters here.

Yes.

This trips people up all the time.

It really does.

So a capital letter, like a big X, represents the variable itself as a concept.

It's the placeholder for the number of broken eggs or the number of sixes.

And then the lowercase letter, like a small x, represents the specific numerical value that the variable can actually take.

Exactly.

So the capital X is the empty box, and the lowercase x is the specific number we pop into it.

And the textbook uses set notation for this, right?

Like you'll see a capital X followed by a symbol that looks like a little curved E, and then numbers in curly brackets, like one, two, and three.

Right.

And that curved E just means it is an element of.

So if you read it out loud, it says the random variable X can take the predefined values one, two, or three.

Perfect.

So now that we know what a discrete random variable is, we have our playing pieces.

Now we need to build the board.

Yes, the probability distribution table.

This is where we map out every possible value right next to how likely it is to happen.

You can just think of those probabilities as the relative frequencies of each outcome.

But before we build this table, we have to talk about the golden rule.

Oh, absolutely.

The immovable law of this mathematical universe.

The sum of all probabilities in a distribution must equal exactly one.

Right.

In notation, it's the Greek letter sigma followed by a lowercase p equals one.

Sigma just means the sum of.

So sigma p equals one.

And the philosophy here is so simple, but crucial.

The number one represents 100%.

It's the entirety of reality.

Exactly.

If you map out all your possible outcomes and the probabilities don't add up to one, your model is broken.

You either missed an outcome or your math is wrong somewhere.

So let's look at the book's introductory example to see this in action.

Tossing two fair coins.

Let's say our variable X is the number of heads we get.

Okay, so our discrete buckets for X are zero, one, or two heads.

Right.

Getting zero heads means floating tails and then tails again.

Since each coin is fair, it's a 0 .5 chance times a 0 .5 chance, which gives us 0 .25.

And getting two heads is the exact same logic.

Heads then heads.

0 .5 times 0 .5 is 0 .25.

But getting exactly one head is slightly different, right?

Right.

Because there are two ways to do it.

You can get heads then tails or tails then heads.

Both pathways have a probability of 0 .25, so you add them together.

And that gives us a total probability of 0 .5 for getting exactly one head.

Now, we apply the golden rule to verify our board.

We have 0 .25, 0 .5, and 0 .25.

Which adds up to exactly one.

The table is perfectly sealed.

Right.

But exams aren't always going to give you simple coin tosses with neat little decimals.

No, they are not.

And that brings us to worked example 6 .2.

This is where they give you a table, but the probabilities are unknown algebraic expressions.

Yeah.

So our variable can be 2, 3, 4, 5, or 6.

But the probabilities underneath them use a constant c.

You see things like 2c or c squared plus 0 .1.

Which looks terrifying at first.

How do you solve for c when you just have a bunch of random expressions?

Well, this is where the golden rule saves you.

You know that the sum of all probabilities must equal one.

So you just add all those together and set the whole thing equal to one.

Oh, right.

So when you combine the terms, you get a quadratic equation.

It comes out to c squared plus 3c minus 0 .64 equals zero.

Exactly.

And when you factor that quadratic, you get two mathematically valid solutions.

The math says c is either 0 .2 or negative 3 .2.

And here's the trap.

The book explicitly warns about this.

Mathematically, both are correct.

But statistically, we are mapping real world probabilities.

And a probability cannot be negative.

The absolute floor of reality is zero, meaning an event is impossible.

Right.

You can't have an event happen less than never.

Exactly.

If we use negative 3 .2 and plugged it back into the expression 2c from the table, that specific probability would be negative 6 .4.

Which is just absurd.

So we have to throw out the negative root.

The only valid answer is c equals 0 .2.

It really changes how you look at the algebra.

You're solving for physical reality.

Okay.

So the coin toss and the algebraic table both assume independent events.

But what happens when our choices actually change the environment for the next choice?

Ah, right.

This takes us to the mechanics of selection without replacement.

This is worked example 6 .3 in the text.

The bus example.

I love this one.

Okay.

So there are spaces for exactly three more to board a bus.

And waiting at the stop, there are eight youths, one man and one woman.

Ten people total.

And the driver randomly selects three of them.

Let's make our random variable y, which is the number of youths selected.

So first step, what are our discrete buckets?

What are the possible values for y?

Well, the driver is picking three people.

But there are only two non -youths available, the man and the woman.

Which means it's physically impossible to pick zero youths.

Even if the driver picks the man and the woman, that third seat has to go to a youth.

Exactly.

I love that logical deduction.

Yeah.

The environment limits the math.

So y can only be one, two, or three.

Okay.

So how do we calculate the probabilities for those three buckets?

We use combination formulas.

We use combinations, not permutations, because the order they board the bus doesn't matter.

We just care about the final group of three.

Right.

So to find the denominator, the total possible universe of selections, we calculate ten choose three.

Ten people choosing three.

And to find the probability of y equals one, meaning exactly one youth,

we choose one youth from the eight available.

So eight choose one.

And we multiply that by choosing two non -youths from the two available.

So two choose two.

You take that result and divide it by our total, ten choose three.

Exactly.

And the text details the fractions here.

The probability of one youth is one over 15.

For two youths, it's seven over 15.

And for three youths, it's also seven over 15.

And if we pause and check the golden rule,

one plus seven plus seven is 15.

So 15 over 15 equals one.

It works perfectly.

Even though the universe of choices was shrinking with every passenger, the distribution table holds its integrity.

Okay.

So the board is built and verified.

What do we actually do with it?

Now we predict the future.

We move to the concept of expectation.

Expectation.

Written as a capital E with the variable x in parentheses.

And in plain English, expectation is just the mean.

Yes.

It's the long -term average value of a discrete random variable over a large number of trials.

And the formula is super straightforward.

E of x equals the sum of x times p.

You just multiply every possible value by its probability and add them all up.

The book uses a bias spinner for this.

The spinner score is zero, one, two, or three.

And the probabilities are 0 .1, 0 .3, 0 .4, and 0 .2.

So we just multiply across.

Zero times 0 .1, one times 0 .3, two times 0 .4, and three times 0 .2.

And when you sum up those products, you get 1 .7.

Our expectation is 1 .7.

But wait, I have a massive problem with this.

Let me guess.

You can't score 1 .7 on the spinner.

Exactly.

A spinner only has whole numbers.

Zero, one, two, three.

So what does a score of 1 .7 actually mean in the real world?

This is such a common misconception.

Expectation isn't a prediction of what the spinner will do on the very next spin.

Okay.

So what is it then?

Think about the law of large numbers.

The text breaks this down brilliantly.

Imagine you don't just spin it once.

Imagine you spin it 1 ,600 times.

Oh, wow.

Okay.

1 ,600 spins?

Based on our probabilities, we'd expect 10 % of those to be zeros.

That's 160 zeros.

We expect 30%, or 480, to be ones.

And 642s and 323s?

Exactly.

Now, if you add up the total score of all 1 ,600 spins and divide by 1 ,600 to find the average, you get exactly 1 .7.

I see.

It's the anchor of the long game.

It's like saying the average family has 2 .4 kids.

Nobody actually has 0 .4 of a child.

But it accurately describes the center of mass for the population.

Perfect analogy.

Expectation is the anchor.

But an anchor only tells you the center.

It doesn't tell you how wildly the boat is going to swing around that anchor.

Right.

Knowing the average doesn't tell us anything about the risk.

And that's why we need variance.

Variance is a measure of the spread of values around the mean.

And the textbook has this great mnemonic to remember the formula for variance.

It's a bit of a tongue twister.

The mean of the square is minus the square of the mean.

Yes.

The mean of the square is minus the square of the mean.

Algebraically, it's the sum of x squared times p minus the expectation squared.

Let's walk through worked examples 6 .4 to see this in action.

We have a table with values 0, 5, 15, and 20.

And the probabilities are 1 over 12, 3 over 12, 5 over 12, and 3 over 12.

Okay.

Step one is finding the expectation, our mean.

So we multiply the values by their fractions and sum them up.

Which gives us 150 over 12.

And that simplifies to an expectation of 12 .5.

Now for the variance.

First, the mean of the squares.

We take our original x values 0, 5, 15, 20, and we square them.

So that's 0, 25, 225, and 400.

Right.

We multiply those squared numbers by our original probabilities and sum them up.

That calculation gives us exactly 200.

That's our mean of the squares.

And then we do the second part of the phrase, minus the square of the mean.

Exactly.

Our mean was 12 .5.

If we square that, we get a 156 .25.

So 200 minus 156 .25 gives us a variance of 43 .75.

But we have one final puzzle piece here.

Because we squared our original values, our variance is technically in squared units.

Which is really hard to visualize.

If we were talking about the broken eggs again, a variance of 43 .75 squared eggs makes zero sense.

Right.

So to get back to our original units, we just take the square root of the variance.

And that gives us the standard deviation.

Ah.

So the square root of 43 .75 is approximately 6 .61.

That's our standard deviation.

Yep.

It's a highly usable metric for how far our outcomes typically deviate from our expected anchor of 12 .5.

This is just so cool.

We started by defining our playing pieces, the discrete random variables.

Then we built the distribution table, using the golden rule where everything adds to one.

Then we handle dependent events with selections without replacement.

We found the long -term anchor using expectation.

And finally, we measured the spread using variance and standard deviation.

We basically took a chaotic random system and mapped it into a beautifully structured distribution.

And I want to leave you with a final provocative thought on this.

Straight from Explore 6 .3 in the textbook.

Oh, the Galton board.

Yes.

Imagine dropping a ball into a device with a single nail in the wall.

The ball hits the nail and has a simple, discrete 50 -50 chance to fall left or right into two cups below.

Totally random.

Now imagine expanding that device.

Make it four rows of nails, creating all these different bouncing paths to multiple cups at the bottom.

The probabilities change based on how many paths lead to each cup.

Exactly.

Now expand it to 10 rows of nails.

Or a hundred rows.

How does that simple, discrete 50 -50 choice at each single peg scale up?

It creates this massive, predictable curve at the bottom.

It does.

It's a perfect visual representation of the math we just covered.

Discrete random events building up over time into a completely predictable, mathematically sound distribution.

Chaos is just a pattern we haven't mapped yet.

Thank you so much for joining us on this deep dive.

Keep looking for the underlying structure in the noise.

Yes.

And best of luck in your studies.

From the Last Minute Lecture Team, thanks for listening.

Until next time.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Random variables serve as mathematical tools for quantifying outcomes governed by chance, with discrete random variables assuming only specific isolated values rather than continuous ranges. Understanding how these variables behave requires examining probability distributions, which enumerate all feasible outcomes alongside their corresponding likelihoods, often organized as frequency tables or visualized through bar and line graphs. A fundamental requirement of any valid probability distribution is that the probabilities assigned to all possible outcomes must aggregate to exactly one, reflecting the certainty that some outcome will occur. The expected value, synonymous with the mean or expectation, captures the long-run average that would emerge from repeatedly observing the random variable over many trials, calculated by weighting each possible outcome by its probability and combining these weighted values. However, expectation provides only partial insight into a distribution's behavior since it describes central tendency without revealing how tightly or loosely outcomes cluster around that central point. Variance and standard deviation quantify this dispersion, measuring whether the random variable's values concentrate near the mean or scatter across a wider range of possibilities. Variance is derived through a two-step process involving the expected value of squared outcomes and subtracting the square of the expected value itself, while standard deviation offers a more interpretable measure of spread by extracting the square root of variance, returning the measure to the original scale of measurement. These two parameters work synergistically to characterize random variables: expectation predicts the typical long-term result, while variance and standard deviation communicate how much confidence one should place in that prediction by quantifying variability. Mastering these foundational concepts enables meaningful analysis of uncertainty in practical contexts, from financial forecasting to quality control, where understanding both average performance and reliability of outcomes proves essential.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 6: Probability Distributions

Related Chapters