Chapter 21: Minimum Spanning Trees: Kruskal and Prim Algorithms

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement, not replace, the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Alright, so today we're going to tackle a pretty fundamental problem in network design and that's figuring out the most cost -effective way to connect a bunch of different points.

Like think cities, computers, anything really, without any unnecessary detours or loops.

And we're diving deep into a concept called Minimum Spanning Trees, or MSTs.

That's right, MSTs, yeah.

Imagine building a network and every connection has a price tag.

The goal of finding an MST is to connect all those points, all your points, with like the absolute minimum total cost.

To get there,

we're dealing with what are called connected, undirected graphs.

Connected just means you can get from any point to any other point in the network.

Undirected means if there's a connection between point A and point B, you can go in both directions,

the two -way street.

And each of these connections, or edges,

has a weight representing its cost.

Okay, so we've got this network, it's got costs on all these connections.

What's the puzzle that we're solving for everyone today?

So the puzzle is to select the cheapest set of these connections.

The cheapest possible set.

That still manages to link all the points in your network together.

And here's a key constraint.

We can't create any closed loops or cycles.

The resulting structure, the network structure, needs to be a tree.

Think of like a family tree.

Right, a family tree, yeah.

Everyone's connected,

but there's no circular relationship.

Gotcha, gotcha, no going in circles.

Exactly, yeah.

So to be clear, for everyone joining us, we're talking about a connected graph, where all those connections are two -way, right,

undirected.

Each one has a cost, the edge weight, and we want to find a subset of these connections that link all the points, making it a spanning tree without any loops, so it's a cyclic.

And the total cost is as low as possible.

And that's how we get to the minimum spanning tree.

Exactly, that lowest cost spanning tree, that's the minimum spanning tree.

And the chapter we're exploring today is delving into how we can find these MSTs efficiently.

We're going to be focusing on two classic algorithms, Krisco's algorithm and Prim's algorithm.

And both of these can achieve a time complexity of big O of E log V, where E is the number of connections, or edges, and V is the number of points, vertices,

and the network.

And what's really interesting is that Prim's algorithm can actually be optimized even further for even better performance in certain cases.

Big O of E log V, that sounds pretty efficient for tackling this kind of network problem.

Yeah, it's pretty good.

Now, both Krisco's and Prim's, they kind of fall under this umbrella of greedy algorithms.

When I hear the word greedy, it makes me wonder, are we just making the best local choice at each step?

That's a great question.

Does that actually guarantee that we end up with the absolute best solution for the whole network?

Yeah, and that's a brilliant question.

And it hits on a really powerful insight about this MST problem.

In many optimization scenarios,

a greedy approach,

just picking what seems best right now, can lead you down a suboptimal path in the long run.

However,

the fascinating thing about minimum spanning trees is that their specific structure allows this greedy strategy to consistently deliver the absolute best result.

It's one of those elegant aha moments.

I love those.

Where simplicity meets optimality in computer science.

Absolutely.

Those are my favorite moments, those elegant aha moments.

Yeah, for sure.

So before we dive into the specifics of Krisco's and Prim's, the chapter lays some groundwork with what it calls a generic MST method.

So what's the big idea there?

Okay, so think of the generic MST method as like a step -by -step recipe for building a minimum spanning tree.

The core idea is to maintain a set of connections, let's call it A.

Which, at every stage, is guaranteed to be a part of some minimum spanning tree.

Our goal is to keep adding what are called safe edges to this set A, until A itself becomes a complete MST, connecting all the points with that minimum total cost.

Safe edges, that sounds like the key ingredient in this recipe.

So what makes an edge safe to add?

So a safe edge is basically a connection that, when you add it to your current set of MST connections, A, it ensures that the resulting set is still a subset of some, maybe a different minimum spanning tree.

In other words, it keeps you on the right track toward that optimal solution.

Right, you're not going to veer off course.

Exactly.

To understand how to identify these safe edges, we need to introduce the idea of a cut in the graph.

Okay, a cut.

A cut.

So is that like dividing the network into like separate pieces?

Precisely.

Okay.

Imagine you take all the points in your network and you divide them into two completely separate groups.

Let's call them set S and set V minus S.

So everything that's not in S, that division, that's what we call a cut, denoted S, VS.

Okay.

Now, if you have a connection, an edge, between a point in set S and a point in the other set, V minus S, we say that this edge crosses the cut.

Okay, so we've split the network into two sides and some connections bridge that divide.

Exactly.

Okay.

So here's another important definition.

Okay.

A cut, S, VS, is said to respect a set of edges, A.

Okay.

If none of the connections that are already in our set

cross this cut.

Okay.

So all the connections we've already decided are part of our growing MST

state entirely within one of the two groups.

Gotcha.

Okay.

And then there's the term light edge.

Right, light edge.

That sounds like the kind of connection we would want.

Yeah, exactly.

Light edge crossing a cut is simply the connection with the minimum cost among all the connections crossing that particular cut.

Okay.

There might be multiple connections with the same minimum cost.

In that case, any of them can be considered a light edge.

Okay, so we have these cuts, right?

Right.

That our current MST connections respect.

Yeah.

And then we look for the cheapest connection that bridges that cut, the light edge.

Yeah.

How does all this help us find our safe edges?

All right, so this brings us to a fundamental rule.

Okay.

Theorem 21 .1 from the chapter.

Okay.

It states,

if your current set of MST connections, A, is part of some minimum spanning tree.

Okay.

And you find a cut S, VS.

Right.

That respects A.

Okay.

Then any light edge that crosses this cut.

Right.

Is guaranteed to be a safe edge to add to A.

Okay, so let's break this down, right?

Yeah.

We're in the process of building our MST represented by that set of connections.

Right.

We can find a way to divide the remaining points on our network such that none of the connections we've already chosen cross that division.

Right.

Then the cheapest connection that bridges that division.

Right.

Is guaranteed to be a part of some optimal final MST.

That's a pretty powerful principle.

It is indeed, yeah.

Okay.

And it gives us a solid foundation for how to grow our MST safely.

Okay.

And the chapter offers a very useful consequence of this theorem.

Okay.

Called corollary 21 .2.

Okay.

Think about the connected components formed by the connections you've added to A so far.

Right.

Initially, when A is empty.

Yeah.

Each point in your network is its own isolated component.

Right, right.

Like a forest of individual trees.

Yeah, yeah.

Now, if you find a light edge.

Okay.

That connects one of these separate components to another component.

Right.

Then this light edge is a safe edge to add to your growing MST.

Ah, okay.

That makes perfect sense.

Yeah.

Because a light edge connecting two separate components is essentially the cheapest way to bridge that particular cut between those sets of points.

Precisely.

Right.

And this corollary.

Yeah.

This idea of always picking the cheapest connection between different connected parts.

Right.

That's the core idea behind Kruskal's algorithm.

All right, so let's get into the specifics of Kruskal's algorithm.

Yeah.

How does it use this idea of connecting components?

Okay, so Kruskal's algorithm starts with an empty set of connections, A.

Right.

And it considers each point in your network as its own little individual unconnected component.

Okay.

A forest where each tree is just a single point.

Yeah.

Then it looks at all the possible connections in your original network and sorts them by their cost.

Okay.

From least expensive to most expensive.

So right off the bat, we're being greedy.

Exactly.

Okay.

Then it goes through this sorted list of connections one by one.

Okay.

For each connection, U, V.

Right.

It checks if the two points, U and V, are currently in different connected components.

Okay.

A really efficient way to do this is using a data structure called a disjoint set data structure.

Okay.

You can think of this as a way to keep track of which group each point belongs to.

Gotcha.

Okay.

If they are in different groups.

Yeah.

It means that adding this connection, U, V, will link these two separate parts of our growing forest together.

Okay.

Without creating a cycle.

Right, right.

According to our corollary.

Yeah.

This connection, since we're considering them increasing order of cost.

Right.

Is a safe edge to add.

Right.

So we add this connection to our set A.

Okay.

And then we merge the two groups that U and V belong to.

Right.

Using the union operation of this disjoint set data structure.

Okay.

And we continue this process of looking at the connections in order of their cost.

Right.

Checking if they connect different components.

Yeah.

Adding them if they do.

Right.

And merging components if they do.

Right.

Until all the points in your network belong to a single connected component.

Right.

Okay.

At that point,

the set of connections that we've added, A.

Yeah.

Forms your minimum spanning tree.

It forms the MST.

Okay.

Exactly.

That sounds like a very systematic approach.

Yeah.

What about the efficiency, right?

Yeah.

The chapter mentioned a time complexity of big O of E log V.

Right.

So where does that come from?

So the most time consuming part of Kruskal's algorithm is that initial step.

Okay.

Of sorting all the connections by their weight.

Okay.

Which takes big O of E log E time.

Oh.

Now in most networks, the number of connections, E.

Right.

Is not drastically larger than the number of points, V.

Okay.

In fact, in a simple connected graph, E can be at most on the order of V squared.

Okay.

So log E is actually in the same ballpark as log V.

Gotcha.

The operations on the disjoint set data structure.

Right.

Like checking which set of point belongs to phi and emerging two sets union.

Right.

Those are also incredibly efficient.

Okay.

They take almost constant time on average for each operation.

Okay.

Since we perform at most E find operations.

Yeah.

And V minus one union operations.

Their contribution to the overall time complexity is smaller than that sorting step.

Okay.

So the overall time complexity of Kruskal's algorithm ends up being big O of E log V.

Gotcha.

This makes it particularly useful for sparse networks.

Okay.

Where you have relatively few connections compared to the number of points.

Gotcha.

Okay.

That's Kruskal's.

Yeah.

Now let's switch gears and talk about Prim's algorithm.

All right.

So how does Prim's algorithm tackle the MST problem?

So Prim's algorithm also follows that generic MST method.

Okay.

But it takes a slightly different approach.

Okay.

Instead of starting with a forest and gradually merging trees.

Right.

Prim's algorithm starts with an arbitrary single point.

Okay.

As the root of your MST.

Gotcha.

And then grows a single tree one connection at a time.

Right.

Until it includes all the points in the network.

Okay.

So it's like a continuous tree building process.

Exactly.

Yeah.

How does it decide which connections to add in each step?

So at each step,

Prim's algorithm looks for the cheapest connection.

Okay.

That links a point already in the growing tree.

Okay.

To a point that's not yet in the tree.

Gotcha.

To efficiently keep track of these potential cheapest connections.

Yeah.

Prim's algorithm uses a data structure called a min priority queue.

This queue holds all the points that are not yet part of the tree.

Each point v in the priority queue has a key value v dot key.

Okay.

Which represents the minimum cost of any connection that links v to a point that's already in the tree.

Okay.

We also keep track of a parent pointer v dot pi.

Okay.

Which points to the point in the tree that v would be connected to by this minimum cost connection.

Okay.

So what's the initial setup then?

Initially, you choose any point in your network to be the starting root of your MST.

Right.

Its key is set to zero as it's already in the tree.

Okay.

All other points have their keys initialized to infinity.

Right, because they're currently disconnected.

Exactly, because we haven't found a way to connect them to the tree yet.

Yeah.

Then the algorithm enters a loop in each iteration.

Yeah.

It extracts the point with the smallest key from the priority queue.

Okay.

This point fq is the one that can be connected to our current tree.

Right.

With the lowest possible cost.

And then we add e to our growing MST.

Gotcha.

Okay.

And once we've added u to the tree, what happens to its neighbors?

Yeah.

The points that it's directly connected to.

Yeah, exactly.

For each neighbor v of u, that is still in the priority queue.

Okay.

Meaning it's not yet in the tree.

Right.

We look at the cost of the connection uv.

Okay.

If this cost wv is less than the current key value of v.

Okay.

Which represents the best cost we knew to connect v to the tree so far.

Right.

We update v's key to wu.

Okay.

And set its parent pointer v .pi to u.

Right.

This means we found a cheaper way to connect v to the tree through u.

Right.

So we're constantly updating the best guess for how to connect each remaining point.

Precisely.

Okay.

We continue this process of extracting the point with the minimum key.

Okay.

Adding it to the tree.

Yeah.

Updating the keys and parent pointers of its neighbors.

Right.

Until the priority key becomes empty.

Okay.

At that point, all points have been added to the tree.

And the connections we've tracked through those parent pointers.

Right.

Form our minimum spanning tree.

They form the MST.

Okay.

Time Complexity for Prims.

Yeah.

The chapter mentioned big O of e log v when using a binary heap for the priority queue.

Yes.

So if you use a binary heap to implement that min priority queue.

Okay.

The operation of extracting the point with the minimum key.

Right.

Takes big O of log v time.

Okay.

And we do this v times once for each point.

Right.

Updating the key of a neighbor might also involve an update operation on the priority queue.

Okay.

Which could take big O of log v time as well.

In the worst case.

Okay.

Since we examine each connection at most once during this process.

Okay.

The total time spent on these updates across all the connections is big O of e log v.

Okay.

Therefore, the overall time complexity of Prims algorithm using a binary heap is big O of e log v.

Gotcha.

Yeah.

And you mentioned there's an optimization using something called Fibonacci heaps.

Right.

Yeah.

Can you explain how that works?

Yeah.

So by using a more advanced data structure called a Fibonacci heap.

Okay.

The insert and decrease key operations.

Okay.

Which we use when updating a neighbor's key.

Right.

Take only big O of one amortized time.

Okay.

And the extractment operation still takes big O of log v amortized time.

Right.

This leads to an improved overall time complexity for Prims of big O of e plus v log v.

Okay.

Which is particularly advantageous for dense networks.

Okay.

Where the number of connections e is much larger.

Gotcha.

Okay.

So we have these two distinct but both effective greedy strategies.

Right.

Finding these minimum spanning trees.

Now the chapter also hinted at some more in -depth exploration like in the exercises and problems.

Yeah.

So what kind of questions do they delve into?

So the exercises in the chapter explore some of the fundamental properties.

Okay.

And nuances of minimum scanning trees.

Okay.

For example, they might ask you to consider whether a connection.

Okay.

With the absolute lowest cost in the entire network is always guaranteed to be a part of some minimum spanning tree.

Okay.

Interesting.

They also challenge you to think about the converse of the safe edge theorem.

Okay.

If an edge is safe to add,

does it have to be a light edge across some cut that respects the current set of MST edges?

Interesting.

Okay.

And they also delve into the conditions under which the minimum spanning tree for a given network is unique.

Right.

So these kinds of questions really help solidify your understanding of the underlying principles.

Okay.

Those sound like good mental workouts.

Yeah.

Definitely.

To grasp those core concepts.

For sure.

And the problems for chapter 21 seem to venture into some more like advanced territory.

Yeah.

So the problem section really builds upon the basics.

Okay.

One interesting problem explores how to find the second best minimum spanning tree.

Okay.

So a spanning tree whose total cost is just slightly higher.

Right.

Than that of the actual MST.

Gotcha.

This involves thinking about what changes you would need to make to an MST to get a slightly more expensive spanning tree.

Okay.

Another area they touch upon is optimizing MST algorithms for extremely sparse networks.

Okay.

Where the number of connections, E, is significantly less than V log V.

Right.

The chapter mentions a technique called MST reduci.

Okay.

Which, when combined with Prim's algorithm, has the potential to achieve a time complexity of big O of E log log V for such very sparse crafts.

Big O of E log log V.

That's a noticeable improvement.

Yeah.

For those really thinly connected networks.

Absolutely.

Okay.

The problems also introduce and ask you to analyze the correctness of a few hypothetical MST algorithms.

Okay.

Creatively named maybe MST A, B, and C.

All right.

These are designed to test your ability to apply the core principles of MST construction.

Okay.

And to identify whether a given approach is guaranteed to produce a correct minimum spanning tree.

Gotcha.

Lastly, they introduce the concept of a bottleneck spanning tree.

Okay.

So this isn't about minimizing the total cost, but rather about minimizing the cost of the most expensive connection.

Oh, okay.

Interesting.

In the spanning tree.

Interesting.

Interestingly, the chapter notes that a bottleneck spanning tree can actually be found in linear time.

Wow.

Okay.

Which is quite efficient.

That's a completely different optimization goal.

Right.

It's not about the sum of the cost, but about that single highest cost.

Exactly.

That could be incredibly relevant, right?

Yeah.

In situations where the reliability of the weakest link is just super critical.

Right.

Like in a communication network or something where you want to make sure that no single connection is prohibitively expensive or prone to failure.

Exactly.

Wow.

Okay.

And the chapter notes section provides some really valuable historical context as well.

It credits Odekar Borufka.

Okay.

With developing what's considered to be the first algorithm.

Wow.

For finding a minimum spanning tree way back in 1926.

1926?

That's way back.

That's pretty early on, yeah.

And it points out that while Prim's algorithm is generally attributed to Robert Prim, it's actually independently discovered earlier by Vojtěš Jarnick.

Interesting.

The notes also briefly mention the existence of even more advanced algorithms.

Okay.

That can achieve better theoretical running times for sparse graphs.

Wow.

Okay.

As well as the development of randomized algorithms for finding MSTs.

Interesting.

So hinting at further avenues of research and optimization beyond the scope of this particular chapter.

Wow.

So much to explore there.

Yeah.

It's always fascinating to see how all this stuff develops, you know?

Yeah.

These fundamental algorithms, how different researchers kind of come upon similar ideas.

Yeah, independently.

Independently.

Yeah, that's so cool.

It's really cool, yeah.

And finally, the chapter includes a glossary of terms.

Yeah.

I imagine having a clear understanding of those definitions is pretty key, right?

Absolutely crucial.

Yeah.

So the glossary provides clear, concise definitions.

For all the key concepts we've discussed today.

Yeah.

From, you know, the very definition of a minimum spanning tree and spanning tree itself.

Right.

To more specific terms like safe edge, cut, light edge.

Right.

And the names of the data structures.

Yeah.

That make these algorithms efficient.

Right.

Such as the disjoint set data structure and min priority queue.

Right.

So having a solid grasp of this vocabulary.

Yeah.

Is the foundation.

Right.

For anyone that wants to delve deeper.

Yeah.

Into the world of MST algorithms and their many applications.

Absolutely.

All right, let's bring it all together for everyone listening.

Okay.

We started by tackling this fundamental problem of finding the cheapest way to connect all points in a network.

Right.

Without creating any loops.

The minimum spanning tree.

Yeah.

We established the key terms.

Connected, undirected graphs, edge weights, acyclic structures, spanning trees.

Right.

And the ultimate goal, the minimum spanning tree.

We learned about the surprising power of these greedy algorithms in this context.

Yeah.

We explored that generic MST method.

Right.

Built on the ideas of safe edges, cuts, light edges.

Yeah.

That led us to these two really important algorithms.

Prescol's algorithm.

Yeah.

Builds the MST by iteratively adding those cheapest edges.

Yeah.

That connect previously unconnected components.

Right.

And Prim's algorithm.

Right.

Which grows a single tree from a starting point.

Right.

By always adding that lowest cost edge that reaches a new point.

Exactly.

We also touched on the efficiency of these algorithms.

Right.

Some of the more advanced topics and variations that are explored in the chapter's exercises and problems.

Right.

Kind of gave you a glimpse into the broader landscape of MST research.

For sure.

So for you, our listener.

Yeah.

Think about a network that you encounter regularly.

Maybe it's the internet.

Right.

Maybe it's the network of roads in your city.

Maybe it's the supply chain for your favorite products.

Right.

How might the principles of minimum spanning trees be relevant in designing or optimizing these systems?

That's a great question to think about.

What factors beyond just the cost of connections might influence the best way to actually link these systems together?

Yeah.

It's definitely food for thought.

It is.

And this deep dive has taken us through those core concepts and algorithms providing you with, hopefully, a really solid foundation for understanding this vital area of network optimization.

And that covers everything from the chapter on minimum spanning trees.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Finding a minimum spanning tree is a foundational problem in graph theory where the goal is to select a subset of edges connecting all vertices in a weighted, undirected graph while minimizing total edge weight and maintaining acyclic connectivity. The solution produces exactly |V|–1 edges forming a tree structure with broad applications across network design, clustering tasks, and approximation algorithms. Two primary greedy strategies dominate this area: Kruskal's algorithm sorts edges by weight and builds the tree incrementally by adding edges that connect previously disconnected components, employing a disjoint-set forest data structure to efficiently track component membership and detect cycles in O(E log V) time. Prim's algorithm adopts an expansion strategy, growing the tree outward from a designated root vertex by repeatedly selecting the minimum-weight edge linking the current tree to an external vertex, maintaining a min-priority queue to identify and process these boundary edges efficiently. The theoretical justification for both approaches derives from the safe edge property, which establishes that any minimum-weight edge crossing a partition of vertices can be safely incorporated into a growing partial solution without compromising optimality. Prim's performance varies based on priority queue implementation, achieving O(E log V) with binary heaps and the improved O(E + V log V) bound with Fibonacci heaps. Beyond the basic algorithms, related problems extend the framework to include finding the second-best spanning tree, applying MST-REDUCE techniques for performance optimization on sparse graphs, and solving the bottleneck spanning tree variant that prioritizes minimizing the heaviest edge rather than the sum of all weights. Understanding these algorithms provides essential knowledge for distributed systems, circuit design, and various clustering methodologies in computer science applications.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 21: Minimum Spanning Trees: Kruskal and Prim Algorithms

Related Chapters