Chapter 5: Probabilistic Analysis and Randomized Algorithms

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook, and may not be redistributed or resold.

For complete coverage, always consult the official text.

Ever get that feeling?

Like, you're making this big choice, like who to add to your team, or even just picking a place to eat in a new city, and you just have these bits and pieces of info to go on.

You're just hoping you make the right call.

Yeah, I know exactly what you mean.

Turns out, even when it comes to algorithm, that whole thing about dealing with limited information, uncertainty, it's a huge hurdle.

But what if I told you that probability, and even just a sprinkle of randomness, could actually be like surprisingly effective ways to tackle these problems?

Well that's a really intriguing point to start from.

Today we're diving deep into that very idea, how probability can be this super powerful lens to analyze algorithms, and how intentionally adding randomness can sometimes lead to really elegant and efficient solutions.

So good to me.

The groundwork for our exploration comes from a chapter that digs into probabilistic analysis, randomized algorithms, and sorting, giving us this framework to understand algorithm performance in a more nuanced way.

So basically, instead of just fixating on the absolute worst case scenario, which, let's face it, might not happen that often in reality, we're going to be looking at what happens on average, what's called the probabilistic analysis part.

And then the really cool part,

randomized algorithms, algorithms that basically have randomness baked into them, can sometimes outperform those rigid deterministic ones in the long run.

Exactly.

To really wrap our heads around this, maybe let's start with something we can all relate to.

Hiring.

Perfect.

I mean everybody's done it at some point or another, right?

The hiring problem, as it's often called, it's a great way to grasp these concepts because it's so intuitive.

Imagine you're interviewing people one by one for an office assistant position.

Now, a straightforward deterministic approach, what the chapter calls hire assistant, is pretty simple.

You hire the person you're interviewing, if they're better than whoever you've got already, or if you haven't hired anyone yet, and you just keep going until you've interviewed every applicant.

Seems logical enough.

Always go for the best person you've seen so far.

But what about the cost of all this?

What kind of cost are we talking about with this strategy?

So there are two main costs we have to consider.

Okay.

First, there's the cost of actually interviewing each candidate.

Let's call it SES.

Now, this is a cost you're going to have for each person you interview, no matter what.

Then you've got the cost of actually hiring someone new, let's call that SES.

This cost only comes up when you actually decide to bring someone on board.

So if you interview,

say, N candidates, and you end up hiring M of them, your total cost would be N times SES plus M times SES.

Right.

So it's like the interview cost times the number of interviews plus the hiring cost times the number of hires.

Yeah.

Makes sense.

But what's the potential downside here?

What's the worst case scenario with this approach?

Well, the worst possible situation happens when the candidates just happen to come in like ascending order of quality, the worst one first, and the absolute best one dead last.

Just think about it.

You interview the first person, they are by default the best you've seen so far because they're the only one, so you hire them.

Makes sense.

Then the very next person is even better, so you have to let the first one go and hire the second one.

And this just keeps happening every single interview.

Oh, I see where you're going with this.

In this nightmare scenario, you end up hiring every single one of the N candidates.

That sounds awful.

So the number of hires M ends up being the same as N, and your total cost is N times C plus N times C.

Okay.

Now if the cost of hiring someone is really high, this could get super expensive.

That means the worst case cost ends up being on the order of O times N.

Hiring everyone.

I can't even imagine.

But as the chapter points out, that whole perfect storm of candidates getting better and better might not happen very often in reality.

So that's where probabilistic analysis comes in, right?

You got it.

In a lot of real world situations, we don't really have control over the order things happen in, whether it's job applications, you know, business opportunities, or even just the order data comes in for an algorithm to process.

That's very true.

So it often makes more sense to assume the candidates are showing up in random order.

What we mean by that is all the possible N permutations of their rankings, all equally likely.

Okay, so there are a lot of potential combinations.

Right.

So this assumption lets us do a probabilistic analysis, understand the expected cost, not just that worst case cost.

Okay, so we're shifting gears from that potentially disastrous worst case to what we can realistically anticipate happening on average.

If things are random, how do we even begin to figure that out?

Alright, so this is where this idea of indicator random variables comes in handy.

Okay.

So for any event we're interested in, let's say whether a certain candidate gets hired, we can define this thing called an indicator random variable.

Let's call it IA, where A represents that event.

This variable is pretty simple.

It's one if the event A actually happens and euro if it doesn't.

The chapter highlights this really useful property.

The expected value of that indicator variable written as EIA is simply the probability that A happens, PRA.

Okay.

So the expected value is directly tied to the probability.

That sounds like a neat trick.

Yeah.

But how does that actually help us with our hiring problems?

So we can define one of these indicator random variables for every single candidate we interview.

Let's say she is the indicator variable.

That's one if the IF candidate we interview gets hired and ORI if they don't.

The big goal here is to find the expected total number of people we hire.

That would be the expected value of the sum of all these little indicator variables.

EX1 plus X2 plus XN, where N is the total number of candidates.

I'm with you so far.

And this is where the power of something called linearity of expectation really shines.

I've heard of that.

It's a powerful concept.

Linearity of expectation basically tells us the expected value of the sum of a bunch of random variables is just the sum of their individual expected value.

Makes sense.

The really cool part, it doesn't even matter if those variables are independent of each other or not.

So EX1 plus X2 plus XN is just EX1 plus EX2 plus DIC plus EXN.

So each part can contribute to the whole.

Exactly.

It's like you can look at the average outcome of each individual part of a process, just add those averages up, and boom, you get the average outcome of the entire thing.

Doesn't even matter how complicated those connections may be.

Oh yeah, that sounds super useful when you're trying to analyze complex algorithms.

Totally.

Okay, so that definitely makes things simpler.

So to get that expected number of hires, all we need to do is figure out the expected value of G for each candidate and then just add them all up, right?

You got it.

So let's think about the probability of actually hiring the i -th candidate.

For that to happen, this candidate has to be better than every single one of the previous i -1 candidates that we've interviewed.

Now if we're working with that assumption that every possible order the candidates could come in is equally likely, then any one of those first i candidates has an equal chance of being the best.

So the probability that the i -th candidate is the top one out of those first i, and therefore gets hired, is simply 1i.

Ah, I'm starting to see a pattern here.

So the probability of hiring the very first candidate is 11, which is just 1 because they are the best by default at that point.

Then the probability of hiring the second candidate is 12, because they have to be better than the first one, then for the third it's 13 and so on.

You got it, spot on.

And because we know that the expected value of G is equal to the probability that G is 1, which is the same as the probability of hiring the i -th candidate, we get AXI equals 1i.

Now, using that linearity of expectation we were talking about, the expected total number of hires, which we can call EX, is just the sum of all those individual expected values.

EX1 plus EX2 plus i1 plus EXN equals 11 plus 12 plus 13 plus 1 plus 1N.

Wait a minute, that looks familiar, isn't that the harmonic series?

Ding ding ding, you got it, that's the harmonic series.

And the sum of the first N terms of the harmonic series, it's got this well -known approximation.

It's basically equal to the natural log of N plus this small constant term written as O1.

So the expected number of hires is roughly LNNN plus O1.

This means the average case hiring cost for our higher assistant algorithm ends up being on the order of OFOX times LNN.

Now this change, going from linear to logarithmic complexity, that's a big deal in computer science.

Imagine dealing with millions of candidates.

A linear cost, that's a nightmare.

A logarithmic cost, much more manageable.

It's a huge improvement over that OFOF and worst -case cost we talked about earlier.

Wow, that's a massive difference.

Going from potentially hiring every single person you interview to an expected number that just grows logarithmically with the number of candidates, it's kind of counterintuitive how much better the average case turns out to be.

It makes you wonder if we spend too much time worrying about those extreme situations when things are usually much smoother on average.

But here's where it gets really interesting.

What if we can't just assume that the input, the order of the candidates in this case, is actually random?

What if there's some kind of bias at play?

That's a great point.

You're right, in a lot of real -world scenarios, we can't just bank on the input being randomly ordered.

Yeah, that's true.

Maybe the best candidates tend to apply later in the game or there's something else throwing off the order.

This is where randomized algorithms step in.

The main idea here is that instead of crossing our fingers and hoping the input is random,

a randomized algorithm builds randomness directly into the algorithm itself.

So we're taking control of the randomness.

Exactly.

So how would we use this randomized algorithm idea for our hiring problem?

Well, the chapter talks about randomized hire assistant.

The first crucial step here is to randomly shuffle the list of candidates before we even start interviewing anyone.

So we're just mixing things up right from the start?

Yeah, we're shuffling the interview order so that every possible order has an equal chance of happening.

After this initial random shuffle, we just run our original hire assistant algorithm like we did before.

Ah, I see what's happening here.

By shuffling the candidates randomly at the very beginning, we're basically tricking the hire assistant algorithm into thinking they're coming in random order no matter what the original order was.

Exactly.

And by introducing this randomization step, we can get an expected hiring cost of O times LNN for any input you throw at it.

Is that pretty clever?

The expectation now is about the random permutation that we created, not some hopeful assumption about how the input is structured.

It's a subtle but important difference.

Right, so it's way more robust, more reliable.

Exactly.

But how do we actually do that shuffling?

The chapter mentions something called a randomly permute procedure.

Yeah, there are some clever ways to create a uniform random permutation of an array with N elements.

A common method, which usually takes about N time, involves going through the array from the first element to the last.

For each element at a certain position, let's say index I, we pick a random index J somewhere between I and N and swap the elements at those two indices.

Okay, so for the very first element, you're randomly swapping it with any other element in the entire array.

Then for the second element, you're swapping it with any element from the second position onwards and so on.

That's exactly it.

The chapter also talks about using something called a loop invariant to formally prove that this method actually generates a truly uniform random permutation.

This means that each of the N possible orderings has an equal probability of one N of being the result.

Okay.

It's a way to mathematically guarantee that our shuffling process is working correctly making sure we get a truly random order.

Right.

So we're not just hoping for the best.

It's guaranteed.

Okay.

So we've seen how probabilistic analysis helps us understand how well a deterministic algorithm performs on average, and we've seen how randomized algorithms can introduce randomness to get good expected performance no matter what the input looks like.

Now, the chapter goes on to explore some of the really cool applications of these probabilistic ideas.

Maybe we could jump into the birthday paradox.

That's always sounded fascinating, a bit mind -bending.

The birthday paradox is definitely a classic example of how probability can throw us curveballs, results that our intuition just doesn't see coming.

The basic question is, if you've got a group of hey people,

what are the odds that at least two of them share the same birthday?

At first glance, you'd think you'd need a really large group for that to be very likely since there are 365 days in a year.

That's what most people would think, but the truth is pretty surprising.

Right?

Again, we can use those indicator random variables to help us break this down.

Instead of directly looking at the probability of at least two people having the same birthday, it's often easier to look at the probability of no two people sharing a birthday and then just subtract that from one.

But we can also use indicator variables to figure out the expected number of pairs of people who do share a birthday.

So let's think about all the possible pairs of people in a group of K.

There are K choose two pairs, which mathematically speaking is K, K one two.

Now for any specific pair, the chance they have the same birthday is one N, where N is the number of days in a year, usually 365.

So for each of those K, K one two pairs, we can make an indicator random variable.

You got it.

That variable is one if they share a birthday and no if they don't, then the expected value of that variable would be one N.

Exactly.

And then if we use that linearity of expectation idea again, the expected total number of pairs with the same birthday is just the sum of those probabilities for all the possible pairs.

That gives us K, K one, a K one two N.

The crazy part is this expected value is actually bigger than one when K is just a bit bigger than the square root of two N.

For NIOLs is 365, that happens when K is about 23 people.

Seriously?

Yeah.

So in a room of just 23 people, there's actually a better than 50 % chance that at least two of them will share a birthday.

Wow, that's wild.

It really shows how fast those pairwise comparisons add up as the group size grows.

That's mind blowing.

It really shows that even things that seem pretty unlikely can become pretty probable when you have enough chances for them to happen.

Okay, up next, the chapter dives into balls and bins.

What's the core concept behind that model?

The balls and bins model, it's like this fundamental building block in probability theory that pops up in all sorts of areas, computer science, things like hashing and load balancing, even in ecology.

Wow, I wouldn't have expected that.

The basic idea is you've got N identical balls and you're randomly tossing each one into one of B different bins.

Okay.

Then we can ask all sorts of probability questions about how those balls end up spread out among the bins.

So what kind of questions are we talking about here?

Well, for example, we could ask how many balls on average end up in a specific bin?

The chapter mentions that the number of balls in a particular bin follows what's called a binomial distribution.

We could also look at how many tosses it takes on average before a specific bin finally gets at least one ball.

That follows a geometric distribution.

And then there's that classic coupon collectors problem, which if I remember correctly, has something to do with collecting prizes.

Yeah, I've heard of that.

That's a perfect analogy.

So the coupon collectors problem asks if you've got B different types of coupons and every time you get a coupon, it could be any of those B types with equal probability.

How many coupons do you have to collect before you've got at least one of each type?

The chapter says the expected number of tries to get all B coupons is about B times the natural log of B.

Okay.

This basically tells us that to get a complete set, you usually need a lot more tries than the number of different coupon types.

Makes sense.

So even if there's a fixed number of possibilities, the number of tries it takes to see them all can be way higher.

Exactly.

What's next on our probabilistic journey?

All right.

Well, the chapter then goes into streaks and things like coin flips.

Ah, streaks.

That's another area where we can find some counterintuitive stuff.

Yeah, streaks can be pretty deceiving.

Analyzing streaks, like the longest run of heads in a row when you flip a fair coin N times, that's another interesting way to use probability,

using indicator random variables and some probabilistic bounds, we can actually figure out the expected length of that longest streak.

The chapter mentions that in N coin flips, the longest streak of heads you'd expect to see is about the order of the base two logarithm of N written as L, G, N.

So if you flip a coin, let's say 100 times, you wouldn't really expect a streak of like 50 heads in a row.

The longest streak we expect to see grows much slower than the total number of flips.

Exactly.

It shows how unlikely those super long streaks really are, even in a totally random sequence.

Okay, so unlikely, but not impossible.

Finally, the chapter talks about the online hiring problem, a more, I guess, realistic and limited version of that original hiring scenario we talked about.

Right.

So how does that online aspect change things?

Well, with the online hiring problem, the main constraint is that you have to make a decision about each candidate right after you interview them.

No going back, no rehiring someone you already rejected.

That's tough.

The goal is still the same, hire one person, the best one, but now you need a strategy that gives you the highest chance of picking the absolute best candidate from the whole pool.

That sounds a lot more like those high pressure decisions we make in real life, you know, where you don't get do -overs.

Exactly.

The chapter talks about this strategy.

You decide on a number, K, somewhere between 1 and N1, where N is the total number of candidates.

You interview the first K candidates and you say no to every single one, no matter how good they seem.

Oh, wow.

That's a bold strategy.

This first phase is like setting a baseline, getting a feel for the overall quality of the candidates.

After you've rejected those first K, then you change the rules.

You hire the first candidate who comes along, starting with the K plus one, one who's better than all those first K you rejected.

Interesting.

So it's like you're sacrificing some potentially good candidates early on for the chance of finding an even better one later.

Precisely.

And if no one better than those first K shows up, you just hire the very last candidate you interview.

Okay, I see.

So the big question is, what's the best value for K to give you the highest chance of getting the absolute best candidate overall?

Turns out, there's a mathematical solution.

Setting K to about nay, where E is that special number, the base of the natural log, about 2 .718, that gives you at least a one E probability, about 37 % of hiring the very best applicant.

Wow, that's a really concrete result, but counterintuitive too.

Yeah, right.

So even when you have to make a decision on the spot, you could still use a strategic approach, use probabilistic thinking to give you a decent shot at landing the top candidate.

Exactly.

It perfectly highlights what we've been exploring.

How probabilistic analysis and strategically introducing randomness can be really useful frameworks for making better decisions, even when things are uncertain and we've got limitations.

Okay, let's take a step back and sum up what we've learned in this deep dive.

We started by looking at how tough it can be to make the best choices when you only have partial information, using that hiring problem as our example.

Then we saw how probabilistic analysis helped us understand how a straightforward deterministic hiring algorithm would perform on average, and it turned out it performed way better than that worst case scenario we initially imagined.

Then we explored how randomized algorithms, by adding randomness to the mix, can actually guarantee good performance on average, no matter how the input is structured.

We also learned about those crucial tools, indicator random variables, and that powerful concept of linearity of expectation, which are at the heart of this whole analysis.

And from there, we went on to some fascinating applications that show just how broadly applicable probabilistic thinking can be.

Yeah, we covered a lot of ground.

We delved into that seemingly paradoxical birthday paradox, demonstrating how coincidences can be way more likely than we'd expect.

We looked at the balls and bins model, a fundamental idea for understanding how things are randomly distributed and how it applies to the coupon collector's problem.

It was interesting to see just how many attempts you need to get a complete set of something.

It was.

We also talked about the expected lengths of streaks in random sequences, like those coin flips, and wrapped things up with those strategic insights from the online hiring problem, showing how a period of initial rejection can actually boost your chances of finding the best possible outcome down the line.

I gotta say, it's really cool how these seemingly abstract concepts from probability theory show up in so many real life situations, often leading to results that totally defy our gut instincts.

I still can't get over the fact that you only need about 23 people in a room for a better than even chance of a shared birthday.

And the idea that sometimes saying no to good options early on can increase your odds of finding the absolute best one later, that's a powerful lesson that goes way beyond just hiring.

Absolutely.

The really valuable thing about these tools is they let us move past just reacting to the worst possible scenarios and actually focus on understanding what's likely to happen on average.

This shift in perspective can lead to algorithms and strategies that are much more efficient and practical, especially in a world where uncertainty is just a fact of life.

So as you go about your week, maybe think about a situation where you're facing some uncertain outcomes.

Could adding a little bit of randomness to your approach actually help you get a better result?

Maybe reflect on how often the average scenario is actually more relevant than those worst case anxieties that can sometimes hold us back.

It's definitely a different way to approach decision making.

Absolutely.

And with that, we've reached the end of our deep dive into the concepts of probabilistic analysis and randomized algorithms as presented in this chapter.

We covered the hiring problem, the power of indicator random variables and that linearity of expectation and explored a whole bunch of applications, including the birthday paradox, balls and bins, streaks, and that online hiring problem.

It was quite a journey.

We touched on all the major points,

theories, findings, and examples laid out in the source material.

So until our next deep dive, keep those questions coming, keep learning, and stay open to the surprising power of probability.

Sounds good to me.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Probabilistic analysis and randomized algorithms provide powerful frameworks for designing efficient solutions and evaluating algorithm performance in the presence of uncertainty. The hiring problem exemplifies this approach: while a deterministic algorithm that hires each candidate exceeding the current best achieves O(n) worst-case cost, probabilistic reasoning reveals that with randomly ordered input, the expected number of hiring decisions drops to approximately ln n. This dramatic improvement emerges through indicator random variables, which encode whether a specific event occurs as either 1 or 0, allowing expectations to be computed as the probability of that event. By leveraging linearity of expectation, expected values can be combined even when random variables depend on one another, transforming complex probabilistic questions into manageable arithmetic. To achieve favorable performance guarantees regardless of input order, the randomized variant simply shuffles the input before processing, generating a uniformly random permutation in Θ(n) time and rendering the algorithm immune to adversarial orderings. Beyond the hiring problem, these analytical tools illuminate classical probability phenomena: the birthday paradox demonstrates that collisions become probable around √(2n) individuals; the coupon collector problem shows that obtaining all b distinct items requires approximately b ln b draws; and analyzing coin flip sequences reveals that the longest streak of consecutive heads in n trials scales as log n in expectation. The secretary problem concludes the chapter by demonstrating a strategy with provable optimality: by skipping the initial n/e candidates and selecting the next applicant who surpasses all previous ones, an algorithm achieves the best candidate with probability at least 1/e. These analyses showcase why randomized approaches often provide simpler algorithms, inherent robustness against adversarial inputs, and closed-form probability calculations compared to purely deterministic strategies that must account for worst-case scenarios or rely on unverified average-case assumptions about real data.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 5: Probabilistic Analysis and Randomized Algorithms

Related Chapters