Chapter 5: Probability: What Are the Chances?
Welcome to Last Minute Lecture.
This free chapter overview is designed to help students review and understand key concepts.
These summaries supplement, not replace, the original textbook and may not be redistributed or resold.
For complete coverage, always consult the official text.
Welcome to the Deep Dive.
Today we're getting into something that, well, it's everywhere.
Chance.
Uncertainty.
That's right.
From figuring out who gets the last slice of pizza with rock, paper, scissors.
Or a coin toss before a football game.
Even like the traits kids get from their parents, it's all chance.
And today's Deep Dive is all about the mathematics behind that chance behavior.
We call it probability.
It's a really fundamental idea, helps us understand this unpredictable world.
So our mission today is to give you, our listener, a really crystal clear grasp of probability.
Exactly.
We want to cover everything from the basic rules to some really powerful applications.
It's perfect if you're navigating AP statistics or honestly, if you just want to be better informed about how things work.
We'll unpack the key concepts, walk through some real world examples, and maybe even bust a few myths along the way.
Yeah, and we'll even start with something familiar like those one in six wins soda cap promotions.
We can actually use simulation to see if a company's claim holds water.
Okay, so let's start with the core idea of probability itself.
What's the most important thing to get?
I think the absolute key is understanding this difference between the short run and a long run.
Chance behavior, it's totally unpredictable moment to moment in the short run.
Like flipping a coin, heads or tails, no idea.
Precisely.
Each flip is independent, a fresh start.
But and this is the amazing part, if you repeat that chance process many, many times, a pattern emerges, a very regular predictable pattern.
That sounds kind of contradictory, unpredictable, but predictable.
It does, right.
But think about that coin toss again, and just say 10 tosses, you might easily get seven heads, maybe even three, it's all over the place.
Sure.
But if you keep tossing it a hundred times, five hundred, five thousand times, the proportion of heads, you know, the number of heads divided by the total pauses, starts to settle down, gets closer and closer and closer to point five to 50 percent.
So you could almost graph that, couldn't you?
Like number of tosses on the bottom axis, proportion of heads on the side axis.
Exactly.
And that graph would show the line bouncing around like crazy at first for the first few tosses.
Right.
But as the number of tosses increases along that horizontal axis, the line representing the proportion of heads would smooth out, you would get closer and closer to just being a flat horizontal line right at zero point five.
Ah, so the randomness averages out over time.
That's the essence of it.
And that long run stability, that's what lets us define probability.
It's a number always between zero and one.
Zero meaning it never happens, one meaning it always happens.
Correct.
And that number describes the proportion of times an outcome would occur in a very, very long series of repetitions.
The settling isn't just a coincidence.
It's guaranteed by something called the law of large numbers.
The law of large numbers.
It just formally states that as you observe more and more repetitions of any chance process,
the proportion of times a specific outcome happens gets closer and closer to its true probability.
OK, so let's make this concrete.
The book of odds says the probability a random US adult drinks coffee on a given day is point five, six.
What does that actually mean in practice?
It means if you took a massive random sample, thousands and thousands of adults, you'd expect about 56 % of them to have had coffee that day.
But not if I just ask like a hundred people.
Probably not exactly 56 out of a hundred.
That's the key difference.
Probability describes the long run in your sample of a hundred.
Maybe you get 52, maybe 61.
That's just normal short -term variability.
It doesn't mean the point five, six probability is wrong.
Right.
And this is where our intuition can really mess us up, isn't it?
Oh, definitely.
We try to find patterns where none exist.
Like if you toss a coin six times and get tails, tails, tails, tails.
It's fine to get, yeah.
Your gut screams the next one has to be heads to even things out.
Right.
The law of averages or something.
Exactly.
People call it that or the gambler's fallacy.
Yeah.
But it's a total myth.
Coins don't have memory.
Each flip is independent.
So those first six tails don't matter for the next flip.
Not one bit.
In the long run, say after 10 ,000 tosses, those first six tails just get drowned out by all the other results.
What's really weird is that runs like a string of tails or a basketball player having a hot hand.
These are actually more common in truly random data than we intuitively expect.
Our brain is just wired to see patterns.
Okay.
So we understand probability describes the long run, but doing of coin flips or sampling thousands of people is often impractical, right?
Or impossible.
Absolutely.
And that's where simulation comes in.
It's basically the imitation of chance behavior.
We create a model that accurately reflects the situation and then use that model to generate data.
It's like a stand -in for the real thing.
Precisely.
And there's a nice clean process for doing simulations, especially important for say the AP statistics exam, three steps.
Okay.
What are they?
First, you describe, you explain exactly how you'll use a chance device.
Maybe random numbers from a calculator, maybe rolling a die, maybe using a table of random digits to imitate one trial of the process, and you have to state clearly what result you'll record from that trial.
Step one,
describe the process for one trial.
Got it.
Step two, perform.
You actually carry out many, many trials of the simulation.
The more trials you do, the more reliable your results will be.
Okay.
Do lots of trials and step three, step three answer.
You use the results you collected from all those trials to estimate the probability you're interested in.
And then you answer the original question based on that estimate.
Let's try an example of that NASCAR cereal box thing you mentioned.
Perfect.
Okay.
So the company claims there are five different collectible driver cards and they're all equally likely in any box.
Right.
But a fan buys 23 boxes and still doesn't have all five cards.
The question is, does this suggest the company's claim of equally likely might be false or was the fan just unlucky?
How would we simulate that?
Okay.
Step one, describe.
We need something with five equally likely outcomes.
We could use random integers from one to five.
Let's say one is Joey Logano, two is Kevin Harvick and so on up to five.
Then we generate these random integers one at a time, representing buying boxes.
We keep generating numbers until we've seen all five distinct numbers.
One, two, three, four, and five.
That's one trial.
And what do we record?
We record how many numbers we had to generate, how many boxes it took to get all five drivers.
Got it.
So one trial might take say 12 boxes.
Another might take eight.
Another might take 20.
Exactly.
Step two, perform.
Yeah.
We repeat this whole process many times.
Let's say we do it a hundred times.
Okay.
A hundred trials done.
Now step three, answer.
We look at the results from our hundred trials.
How many boxes did it typically take?
Maybe we make a dot plot.
Each dot shows the number of boxes for one trial.
I can picture that.
A bunch of dots clustered.
Maybe you're on 10, 15, maybe a few higher.
Right.
And now we look at our fan's result.
23 boxes.
We check our simulation results.
How often did it take 23 or more boxes in our hundred trials?
What if it never happened?
Or if only happened like once out of a hundred times.
Then that suggests getting 23 or more boxes is really unusual if the company's claim is true.
It provides pretty convincing evidence against the claim.
Of course, it's so possible the fan was just incredibly unlucky, but the simulation helps us gauge how likely that bad luck really is.
Okay.
Another quick one, the golden ticket parking lottery.
95 students are eligible.
28 are in AP stats.
They draw two names for reserve spots and both winners are AP stats students.
Rigged.
Interesting question.
Let's simulate.
Step one, describe.
We need to represent the students.
Let's label the 28 AP stats students, 01 to 28.
Okay.
Got to use two digits, right?
Yes.
Good point.
Consistent label length is key for using random digits.
So 01 to 28 for AP stats.
Then the other 67 students would be 29 up to 95.
28 plus 67, 95.
Makes sense.
Now we use a random number table or generator to pick two different two digit numbers between 01 and 95.
We have to ignore any repeats because you can't pick the same student twice.
And we ignore numbers outside the 01, 95 range.
And what do we record for each trial?
We record whether both selected numbers fall in the range 01 to 28, meaning both winners were AP stats students.
Step two, perform many trials.
Say 100 again.
Step three, answer.
Let's imagine that in our 100 simulated lotteries, we found that both winners came from the AP stats group nine times.
So about a 9 % chance.
Right.
Is 9 % unusual enough to scream rigged?
Probably not.
It's a bit low.
Maybe, but certainly plausible that it could happen just by chance.
So the simulation suggests we don't have strong evidence that the lottery was unfair.
And that AP exam tip about describing simulations, clearly using consistent labels, handling repeats, sounds really important.
Absolutely critical.
You need to describe it so someone else could replicate your simulation exactly.
So simulation is great for estimating probabilities by mimicking chance, but sometimes we can calculate exact probabilities using math, right?
Exactly.
We don't always need to run thousands of trials.
We have mathematical rules, the building blocks of probability.
This starts with the idea of a probability model.
A model, like a blueprint.
Kind of.
It has two parts.
First, the sample space, which is just a list of every single possible outcome of a chance process, everything that could possibly happen.
Okay.
Like for a coin toss, the sample space is just heads, tails.
Perfect.
The second part is assigning a probability to each outcome in the sample space.
So for fear coin, P heads equals 0 .5 and P tails equals 0 .5.
Correct.
And a key rule is that all the probabilities in the sample space must add up to one.
Makes sense because something has to happen.
Right.
And if all the outcomes are equally likely, like with a fair coin,
calculating the probability of an event, which is just a collection of outcomes is simple.
It's just the number of outcomes in your event divided by the total number of outcomes in the sample space.
Like rolling a fair six sided die.
The sample space is one, two, three, four, five, six.
The probability of rolling an even number is three outcomes, two for sex divided by six total outcomes, which is 12.
Exactly.
You can visualize more complex sample spaces too.
Imagine rolling two dice, say one red and one blue.
You can think of a six by six grid.
Each cell represents one outcome, like red one, blue one, red one, blue two, all the way to red six, blue six.
There are 36 possible equally likely outcomes in that grid.
So the probability of rolling snake eyes red, blue one is 136.
Precisely.
Or think about spinning a spinner with three equal sections, red, blue, yellow, two times.
Okay.
First spin could be R, B or Y.
Second could be R, B or Y.
So the sample space has three by three equals nine equally likely outcomes are R, B by B, R, B, B, B, Y, Y, Y, Y, Y, Y.
And the probability of spinning blue at least once.
Let's see R, B, B, B, Y, Y, B, B, R.
That's five outcomes of 59.
Perfect calculation.
Now, besides the basics, probabilities between zero and one summing to one.
Are there other core rules?
Yes.
A few really useful ones.
First, the compliment rule.
The compliment of an event A written A superscript C means event A does not happen.
The rule is simply P, A, C, one P, A.
Ah, so the probability of not getting an outcome is one minus the probability of getting it.
That seems handy.
It's incredibly handy.
Think about rolling those two dice again.
What's the probability of not getting a sum of five?
Calculating the probability of getting a sum of five is easier.
The outcomes are one, four, two, three, three, three, three.
That's four outcomes out of 36.
So P sum is five, it's four, 36.
Right.
So using the compliment rule, P sum is NIT five, pictures one, four, 36 equals three, two, 36.
Much faster than counting all 32 outcomes that don't sum to five.
Definitely easier.
Okay.
What else?
We need to understand mutually exclusive events, sometimes called disjoint events.
These are events that have no outcomes in common.
They simply cannot happen at the same time.
Like rolling a die, you can't roll a two and a three on the same single roll, or an M and M can't be both orange and brown.
Exactly.
For mutually exclusive events, A and B, the probability of both happening, P, A and B is zero.
It's impossible.
And this leads to the addition rule for mutually exclusive events.
If A and B are mutually exclusive, then the probability of A or B happening is simply P, A or B plus P, B.
So if the probability of a randomly chosen M and M being orange is 0 .205 and brown is 0 .124, the probability of it being orange or our brown is just 0 .205 plus 0 .124, which is 0 .329.
Yep.
Simple addition because an M and M can't be both colors at once.
But what if events aren't mutually exclusive?
What if they can happen together?
Like drawing a card from a deck.
It can be a king and it can be a heart, the king of hearts.
You can't just add P king plus P heart.
Excellent point.
That simple addition rule only works for mutually exclusive events.
If you just add P, A plus P, B, when there's overlap, you double count the outcomes where both A and B occur.
Right.
Adding P king plus P heart would count the king of hearts twice.
Exactly.
So we need the general addition rule.
This rule works for any two events, whether they overlap or not.
It says P, A or B plus P, B, P, A and B.
Ah, you add the individual probabilities and then subtract the probability of the overlap, the A and B part.
Precisely.
That subtraction corrects for the double counting.
This sounds like where Venn diagrams come in handy.
Absolutely.
Imagine two overlapping circles inside a rectangle.
The rectangle is the whole sample space.
One circle is event A, the other is event B.
The overlapping part is where both A and B happen.
Yes, that's the intersection, A and B, often written A, A, B.
The total area covered by either circle is the union A or B written A, A, B.
I see.
So if you add the area of circle A and the area of circle B, you've added that overlapping intersection area twice.
Right.
So the formula P, A, B plus P, B, P, A, B visually makes sense.
Add the two circles, subtract the overlap you counted twice.
Let's try an example.
Say a survey finds 68 % of residents use Facebook, 28 % use Instagram, and 25 % use both.
What's the probability a randomly chosen resident uses Facebook or Instagram?
Okay.
Using the general addition rule.
P Facebook or Instagram, P Facebook plus P Instagram, P Facebook and Instagram.
So 0 .68 plus 0 .280, 0 .25, which equals 0 .71 or 71%.
And an AP exam tip here.
Always show your work.
Don't just write down 0 .71, rate the formula you're using and plug in the numbers.
A naked answer might lose your points, even if it's correct.
Okay.
This is great.
We've covered basic probability simulation and how to handle or situations.
Now, what about when we get new information, like the probability given something else happened?
Right.
This is where conditional probability comes in.
And it's super important.
It's the probability that one event happens given that another event is already known to have occurred.
So the condition changes the probability.
Often.
Yes.
It effectively shrinks our sample space.
We're no longer looking at all possible outcomes, only those where the given event happened.
And there's a formula.
Yep.
We've write it as PAB, which reads the probability of A given B, the formula is PAB equal P A and B P B.
You find the probability that both things happen and divide by the probability of the condition, the given event B happening.
Okay.
P A and B divided by P B.
Let's try the Titanic example you mentioned.
Great idea.
Let's look at the survival data for adult passengers, broken down by ticket class, first, second, third.
Suppose we know a randomly selected adult passenger survived.
What's the probability they were in third class?
So the given is that they survived.
We only care about the survivors now.
Exactly.
We ignore everyone who didn't survive.
Let's say the records show 442 adult passengers survived and among those survivors, 151 were third -class passengers.
So P third -class survived would be 151 divided by 442.
Precisely.
Which is about 0 .342 or 34 .2%.
So among the adult survivors, about 34 % were from third -class.
Knowing they survived changed the relevant group we were looking at.
That makes sense.
The condition narrows the focus.
And this idea of conditions changing probabilities leads directly to the concept of independence.
Right.
You mentioned this earlier with the coin flips having no memory.
Two events, A and B are independent if knowing that event B happened does not change the probability of event A happening.
So mathematically, how do we check for independence?
You compare the conditional probability P A with the original probability P A.
If P A B is the same as P A, then knowing B occurred didn't change A's probability so they're independent.
And if P A B is different from P A?
Then they are not independent.
They are dependent.
Knowing B happened did change the likelihood of A.
Let's use that gender and handedness example again.
In that Australian student sample, P left -handed male was about 15 .2%, but P left -handed female was only about 5 .6%.
Right.
So does knowing the student's gender change the probability of them being left -handed?
Clearly yes.
15 .2 % is very different from 5 .6%.
So gender and handedness in that sample were not independent events.
They were dependent.
And again, you can't just rely on intuition here.
You really have to calculate and compare the probabilities.
Okay.
Conditional probability helps us understand dependence.
Does also help us calculate the probability of both events happening, the A and B part?
Yes.
Through the general multiplication rule, it stems directly from the conditional probability formula.
It says P A and B, P A P B A.
So the probability of A happening multiplied by the probability of B happening given that A already happened.
Exactly.
For instance, suppose 55 % of high school students play a sport, that's P A.
And among those who play a sport, 6 % go on to play in the NCAA, that's P B A.
And the probability that a randomly selected high school student plays a sport, A and D plays in the NCAA, is P A and B equals 0 .55 .06.
Which is 0 .033 or 3 .3%.
Right.
That rule is incredibly versatile for finding in probabilities, especially in sequential events.
And sequential events often bring tree diagrams into play, right?
Oh, tree diagrams are fantastic for this stuff.
They really help visualize sequences of events and conditional probabilities.
How do they work again?
You start with the first event, drawing branches for each outcome with its probability.
Then from the end of each of those branches,
you draw new branches for the second event.
But the probabilities on these second stage branches are conditional probabilities.
They depend on which first stage branch you came down.
Like Shannon and her snooze button, she hit snooze 60 % of the time.
Right.
So the first branches are snooze, probability 0 .60, and no snooze, probability 0 .40.
Then if she snoozes, she's on time 70%, so late 30%.
If she doesn't snooze, she's on time 90%, so late 10%.
Exactly.
The probabilities on the second set of branches, on time late, are conditional on whether she snoozed or not.
P on time snooze, it's 0 .70, but P on time no snooze, 0 .90.
And to find the probability of a whole path, like snooze and late, you just multiply along the branches.
Yes, P snooze and late.
P snooze, there's P late snooze, 0 .60, 0 .30, 0 .30, 0 .1.
And if we want the overall probability of being late.
You find all the paths that end in late snooze and late, and no snooze and late, calculate their probabilities by multiplying along the branches, and then add those probabilities together.
P late, P snooze and late, plus P no snooze and late.
That makes sense.
Find all the ways it can happen and add them up.
And tree diagrams are also great for answering those backward conditional probability questions.
Sometimes related to Bayes' theorem, like given that Shannon was late, what's the probability she hit the snooze button?
How do you find that from the diagram?
You use the conditional probability formula, P snooze and late, P late.
You already calculated P snooze and late by multiplying along that branch, 0 .18, and you calculated the overall P late by adding up all the leap paths, just divide.
This seems really powerful, especially for situations where intuition fails, like medical testing.
Absolutely.
Let's take that mammogram example.
See, breast cancer affects 1 % of women in a certain group.
P cancer equals 0 .01.
The test isn't perfect.
It has a 6 % false positive rate for healthy women, P positive no cancer equals 0 .06, and a 3 % false negative rate for women with cancer.
P negative cancer equals 0 .03.
Okay.
So P positive cancer must be 1 .03, 0 .97, the true positive rate.
Correct.
Now the big question, a woman gets a positive test result.
What's the probability she actually has cancer?
P cancer positive.
My gut feeling says it should be pretty high, like maybe 90%.
The test seems fairly accurate.
That's the intuition trap.
Let's use a tree diagram of the formulas.
We need P cancer and positive and the overall P positive.
Okay.
P cancer positive, P positive cancer, P positive 0 .97 equals 0 .0097.
Good.
And P no cancer and positive, P no cancer, P positive no cancer, 0 .0506, 0 .0594.
So the total P positive is 0 .00097 plus 0 .0691.
Right.
Now P cancer positive, P cancer positive, P positive 0 .0097, 0 .0691.
Which is, wow, about 0 .14 or only 14%.
Surprising, isn't it?
Even with a positive test, the chance she actually has cancer is quite low in this scenario.
Why is it so low?
Because the underlying condition is rare.
Only 1 % have cancer.
The false positives, 6 % of the healthy 99 % end up vastly outnumbering the true positives, 97 % of the sec 1%.
Most positive results actually come from healthy women.
That really highlights why follow -up testing is so important for rare conditions.
Exactly.
Don't rely on one test result, especially when the prior probability is low.
Okay.
So we have the general multiplication rule, P, A and B, P, A, P, B, A.
What about when events are independent?
Does it get simpler?
Yes.
That's the special case.
If A and B are independent, then knowing A happened doesn't change the probability of B.
So P, B, A is just the same as P, B.
Ah, so the rule becomes P, A and B.
P, A, just multiply the individual probability.
Exactly.
This is the multiplication rule for independent events.
But, and this is crucial, you can only use this simple rule if you know or can reasonably assume the events are independent.
Like if the probability of getting a green light at an intersection on Monday is 0 .42 and the probability of getting a red light there on Tuesday is 0 .55 and we assume the lights on different days are independent.
Then P, green mon, A, D, red 2, P, green mon, P, red 2 equal 0 .42, 0 .55.
Okay.
Or the Challenger disaster example.
The failure of different O -ring joints were considered independent events.
If the probability of one joint working was say 0 .977.
Then the probability of all six working properly would be 0 .977 multiplied by itself six times or 0 .977.
And the probability of at least one failing?
That sounds hard to calculate directly.
Ah, but use the complement rule.
P, at least one fails.
One P, none fail.
As one P, all six work.
Nice.
The complement rule strikes again.
It's incredibly powerful for at least one type problems, but remember that critical caution, do not use the simple P, A, P, B rule unless you're a certain A and B are independent.
Like you can't multiply the probability of rain today in city X by the probability of rain today in nearby city Y, because if it's raining in one, it's probably more likely to be raining in the other.
They aren't independent.
Perfect example.
Use the general rule P, A, P, B, A, unless independence is established.
One last clarification.
Mutually exclusive versus independent.
They sound related, but are they the same?
Definitely not.
In fact, they're almost opposites in a way.
Think about it.
If two events, A and B, have positive probabilities and are mutually exclusive, they can't happen together.
Like being male and being pregnant.
Right.
If you know event A male has happened, what does that tell you about the probability of event B pregnant?
It tells you the probability is zero.
It can't happen.
Exactly.
So knowing A occurred drastically changed the probability of B from whatever small probability it might've had in the general population down to zero, since PBA is different from PB, they cannot be independent.
So if events are mutually exclusive and have non -zero probabilities, they must be dependent.
You got it.
They're distinct concepts.
Mutually exclusive is about whether events can occur together.
Independence is about whether one event's occurrence affects the probability of the other.
Wow.
Okay.
That was quite a journey through probability.
We went from just the basic idea of chance to the difference between short -run randomness and long -run predictability, the law of large numbers.
Then we explored simulation as a tool to model chance, and then dived into the formal rules, probability models, complements, the addition rules for or situations, and finally, the really crucial ideas of conditional probability, PA given B and independence, including tree diagrams and those multiplication rules.
It's a framework that really lets you make sense of uncertainty.
It helps predict long -term patterns, evaluate claims, and make decisions based on data, not just get feelings.
Whether that's understanding game odds, interpreting medical tests like we discussed, or analyzing any kind of real -world data, it really helps uncover truths that might not be obvious at first glance.
That helps you critically assess claims people make based on statistics or luck.
So here's a final thought to leave you with.
Now that you have these tools, what other common beliefs about chance or luck, or maybe even data you see every day might be statistical myths just waiting for you to investigate and maybe debunk.
We really hope this deep dive has helped clarify these essential probability concepts, especially if you're working through AP statistics or just trying to become more statistically literate.
Thanks so much for joining us for this deep dive.
Keep exploring, keep questioning, and keep learning.
ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.
Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.
Support LML ♥Related Chapters
- ProbabilityElementary Statistics
- Basic Principles of HeredityGenetics: A Conceptual Approach
- DifferentiationCambridge International AS & A Level Mathematics: Pure Mathematics 1 Coursebook
- Discrete Probability DistributionsElementary Statistics
- Mendelian InheritanceGenetics: Analysis and Principles
- Normal Probability DistributionsElementary Statistics