Chapter 23: All-Pairs Shortest Paths: Floyd-Warshall and Johnson's Algorithm

0:00 / 0:00
Report an issue

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Okay, so today we're going to dive into something that's, well, it's actually everywhere when you think about it, but it's not something we always think about directly.

Like, we're talking about finding the absolute shortest way to get between any two points,

but I'm not just talking about using Google Maps to get around town.

Right, right.

Think bigger.

Think how information flows through the internet or how dependencies work in a huge project with tons of moving parts.

It's about finding those optimal paths, those most efficient connections in any kind of network.

Exactly.

And it gets pretty complex, right?

Because we're not just looking for the shortest path from A to B.

We want to know the shortest path between every single pair of points in the whole network.

That's the key.

All pairs, not just one pair.

And that's where these really clever algorithms come in.

And that's what we're going to unpack today, right?

Our source material goes deep on these different algorithms, breaks them down, shows how they work and why they work.

So our goal today is to, well, to basically give you the rundown on all that, like a guided tour through this fascinating world of all pairs shortest paths.

Yeah, absolutely.

And even if you're not a programmer, I think you'll find some cool takeaways here.

Like the way these algorithms approach problems, the logic they use, it's applicable in so many areas.

I totally agree.

Okay.

So to kick things off, let's imagine we have this network, right?

Could be a map, a computer network, anything.

And we want to know the shortest path between every single pair of points.

Now, the most obvious way to do it, right, is to just calculate the shortest path from each point individually to every other point.

Yeah, that's the naive approach, so to speak.

If you have, let's say, a hundred cities on your map, you'd pick city A and find the shortest route to all 99 other cities.

Then you'd move on to city B and repeat and so on.

And for that, we could use algorithms like Dijkstra's, right?

Especially if all the connections like the roads on our map have positive costs or distances.

And depending on how you organize the calculations, Dijkstra's can be pretty efficient for finding all the paths from a single source.

That's right.

But then there's the case where some connections might have negative costs.

Not a cycle where you could go round and round and end up with a cost of negative infinity, but just some negative edges.

In that case, you might need to use something like the Bellman forward algorithm.

Which is a bit more computationally heavy, right?

More steps involved.

Exactly.

And this is where things start to get interesting.

Because if you're talking about a massive network with thousands or even millions of points running these single source algorithms over and over again for each point, well, that's just not going to cut it in terms of speed.

Right.

It becomes ridiculously inefficient.

So that's where the need for these more sophisticated algorithms really comes in.

Absolutely.

And the source material introduces a really clever way to represent the network as an adjacency matrix.

They call it W.

And think of it as a table that lays out all the costs between each pair of points.

Okay.

So we're taking our network and turning it into this cost table,

rows and columns, each representing a point.

And the cell where they intersect shows the cost of going directly between those two points.

Right.

So if the starting and ending points are the same, that cost is zero.

If there's a direct connection between two points, the cost is whatever the cost of that connection is.

And if there's no direct connection, we're talking infinity basically, you can't go directly between those two points.

Got it.

And the source also mentions that for now we're assuming no negative cost cycles.

Those can really throw a wrench into things.

Oh, definitely.

We'll get to those later.

But for now, let's assume all cycles have a non -negative cost.

Now here's where the source material introduces this really cool idea dynamic programming.

And it ties in with matrix multiplication in a way that's pretty surprising.

Okay.

Now you've got my attention.

Matrix multiplication and finding shortest paths.

How do those two connect?

Well, it's about breaking the problem down into smaller pieces.

Instead of trying to find the absolute shortest path right away, we start by thinking about the shortest path between any two points using at most a certain number of steps or connections.

So let's say l i j to the power of r represents the minimum cost to get from point i to point j using at most r connections.

Okay.

So we're setting a limit on the number of hops we can take.

That makes sense.

Right.

So if you have a limit of zero connections, you can only get from a point to self at zero cost.

Trying to get anywhere else would cost infinity because you're not allowed to move.

Yeah.

That's our base case.

l i j to the power of zero is zero if i and j are the same point and infinity otherwise.

Exactly.

Now to figure out l i j to the power of r, the best way to get from i to j in r or fewer steps, we've got two options.

One, the best path already uses r minus one steps or fewer.

In that case, the cost is just what we already calculated.

l i j to the power of r minus one.

Two, it uses exactly r steps.

So if it uses exactly r steps, the last step has to be from some other point, let's call it k, to our destination j.

So we need to consider all the possible intermediate points k and see which one gives us the shortest overall path.

Right.

So we look at the best way to get from i to k in r minus one steps.

That's l i k to the power of r minus one and then we add the cost of going directly from k to j which is w k j and we do this for every possible k and we pick the minimum cost.

Okay, so the formula is l i j to the power of r equals the minimum of l i j to the power of r minus one and the minimum overall k of l i k to the power of r minus one plus w k j.

Precisely.

And the source points out something really important.

In a network with n points and no negative cost cycles, any shortest path that doesn't loop back on itself will have at most n minus one connections.

Because if it had more than n minus one connections it would have to visit the same point twice which would mean there's a cycle.

Right.

And we assume no negative cost cycles.

Exactly.

So l i j to the power of n minus one will give us the actual shortest path cost denoted by delta of i and j.

Okay, so we've built up the shortest path step by step considering paths with more and more connections until we reach n minus one connections and then we know we've found the shortest path.

But you mentioned matrix multiplication.

How does that fit in?

Right, so the source describes this procedure called extend shortest paths.

If you look closely at our formula for l i j to the power of r it resembles how you calculate entries in a matrix product.

Instead of multiplying and adding we're taking the minimum of sums.

It's like we're doing matrix multiplication but with different operations.

Instead of multiplying corresponding elements and adding the results we're adding corresponding elements and taking the minimum of the results.

Exactly.

And this is where that idea of the tropical simmering comes in.

It sounds exotic but it's basically a mathematical system where we swap the usual addition and multiplication.

Okay, hold on.

Tropical simmering.

You're gonna have to explain that one.

So in this practical simmering the sum is actually finding the minimum and the product is our usual addition.

And infinity is the identity element for the sum while zero is the identity for the product.

It's a bit abstract but it provides a framework for understanding how this matrix -like approach works for finding shortest paths.

Okay, I'm seeing how it connects but it's definitely bending my brain a bit.

So instead of multiplying rows by columns and summing up the results we're taking the minimum of sums.

And this extend shortest paths procedure basically does this tropical multiplication, right?

And it takes about the same amount of time as regular matrix multiplication.

Precisely.

And this leads us to what the source calls the S -L -O -W -A -P -S -P algorithm.

S -L -W -A -P -S -P.

Catchy.

Right.

So we start with L to the power of zero, that's our matrix, with zeros on the diagonal and infinities everywhere else.

And then we repeatedly apply this tropical multiplication with our original cost matrix W, a total of n minus one times.

And because each multiplication takes about n cubed time, the total time for slow APSP ends up being around n to the fourth.

N to the fourth.

That sounds pretty slow, especially for large networks.

But I have a feeling there's a faster way to do this.

You bet.

The source calls this faster APSP and it uses a trick called repeated squaring, much like you can do with regular multiplication to quickly calculate large powers.

Okay, I remember repeated squaring, like to calculate x to the eighth you can square x then square the result and so on instead of multiplying x by itself eight times.

Exactly.

We can do something similar with our tropical matrix multiplication.

L to the power of two R is the same as multiplying L to the power of R by itself.

So instead of multiplying by W n minus one times, we compute W squared, then W to the fourth, W to the eighth, and so on until we get to a power greater than or equal to n minus one.

So we're essentially doubling the exponent with each step.

Right.

So we only need about log base two of n of these matrix multiplications and each multiplication takes about n cubed time.

So the time for faster APSP is roughly n cubed times log n, much faster than n to the fourth.

Much faster and much more clever.

It's pretty amazing how optimizing the way you do the calculation can lead to such a significant speed up.

Okay, so we've seen how this matrix based approach works by thinking about the number of steps.

Now the source introduces another dynamic programming technique, the Floyd Warshall algorithm.

Yes, Floyd Warshall takes a slightly different approach.

Instead of focusing on the number of connections, it focuses on which vertices we're allowed to use as intermediate stops on our path.

Okay, interesting.

So instead of limiting the number of steps, we're limiting which points we can pass through.

Exactly.

We define D ij to the power of k as the shortest path cost from point i to point j, where we're only allowed to use points from the set one to up to k as intermediate vertices.

So when k is zero, it means we can't use any intermediate stops at all.

In that case, the shortest path is simply the direct connection if it exists.

That's right.

The base case D ij to the power of zero is just the cost from our adjacency matrix.

Got it.

So what happens when we increment k?

When we consider using the kth point as a potential intermediate stop, there are two possibilities for the shortest path from i to j.

Either the shortest path doesn't use point k, in which case its cost is the same as before, that is D ij to the power of k minus one.

Right, because we're not really changing anything in that case.

We're still using the same set of allowed intermediate points.

Right.

The other possibility is that the shortest path does use point k.

If that's the case, then the path has to go from i to k and then from k to j, using only the previous allowed intermediate points for both of those segments.

So in that case, the cost would be D ik to the power of k minus one plus D ij to the power of k minus one.

We're taking the shortest path from i to k and then the shortest path from k to j, both using only intermediate points up to k minus one.

Exactly.

And we take the minimum of those two possibilities to get D ij to the power of k.

So the formula is D ij to the power of k equals the minimum D ij to the power of k minus one and D ik to the power of k minus one plus D kj to the power of k minus one.

Okay, I'm following.

So we're gradually allowing more and more points as possible stops along the way, and we update the shortest path costs accordingly.

And the final shortest path costs for all pairs points will be in D to the power of n, where n is the total number of points, because we've considered all points as potential intermediate stops.

Precisely.

And the source describes the Floyd -Warshall algorithm, which implements this with three nested loops.

Three nested loops.

That sounds intensive.

It's actually pretty straightforward.

The outermost loop iterates through each point k from one to n, and the two inner loops go through all pairs of starting and ending points i and j, and they update D ij to the power of k using the formula we just discussed.

Okay, so we're going through each point k and seeing if using it as an intermediate stop gives us a shorter path between any two points i and j.

And because of the nested loops, the running time is on the order of n cubed.

That's right.

Similar to the faster APST algorithm, but the Floyd -Warshall algorithm is arguably simpler to implement.

It's just three nested loops.

And the source also mentions that we can keep track of not only the shortest path costs, but the actual paths themselves using a predecessor matrix.

Predecessor matrix.

So for each pair of points, we store the point that comes before the destination on the shortest path.

Exactly.

So we have another n by n matrix, let's call it pi, where pi ij stores the predecessor of j on the shortest path from i.

And we update this matrix whenever we find a shorter path through our intermediate point k.

Then we can reconstruct the shortest path from any point to any other point by simply following these predecessors back.

That's pretty cool.

And what about memory usage?

Storing two n by n matrices could get pretty big for large networks.

The source mentions that you can actually optimize Floyd -Warshall to use only a single n by n matrix, updating the cost matrix in place.

This space optimized version still takes n cubed time, but uses less memory.

So efficiency in both time and space.

Like it.

But the source also mentions that Floyd -Warshall can be used for more than just finding shortest paths.

It can also be used to find the transitive closure of a directed network.

Yes, transitive closure is all about figuring out whether there's any path at all between two points, even if it's not a direct connection.

Okay, so it's like a yes or no question.

Can you get from point A to point B even if it takes multiple steps?

Exactly.

And we can use Floyd -Warshall for this by thinking about reachability instead of cost.

So instead of storing the cost of the shortest path, we store a true or false value.

True if there's a path, false if there isn't.

So how do we adapt the algorithm to do that?

Do we just change how we initialize the matrix and how we combine the values?

Precisely.

We define t i j to the power of k as true if there's a path from i to j using only points one through k as intermediates and false otherwise.

Initially, t i j to the power of zero is true if i and j are the same or if there's a direct connection from i to j.

The update rule then becomes t i j to the power of k is true if either t i j to the power of k minus one is true, meaning there's already a path without using k or if there's a path from i to k and a path from k to j both using only intermediates up to k minus one.

So it's like we're replacing the min operation with or and the ak the i plus operation with and.

If there's a path from i to k and a path from k to j then there's a path from i to j.

Exactly.

And this transitive closure procedure, which is just Floyd Warshall with these logical operations, still runs in incubed time.

So the same algorithm can be used to solve different but reloaded problems just by tweaking the way we interpret the values in the operations.

That's really cool.

But wait, there's one more trick the source mentions for Floyd Warshall, right?

Something about detecting negative cost cycles.

Oh yes, that's an important one.

After running Floyd Warshall, if we look at the diagonal entries of the final cost matrix, d to the power of n, if any of those values are negative, then we know there's a negative cost cycle somewhere in the network.

Why is that?

Why does a negative value on the diagonal mean a negative cost cycle?

Well, if there are no negative cost cycles, the shortest path from any point back to itself should always have a cost of zero or more.

But if we find a negative value on the diagonal, that means there's a path starting and ending at the same point that has a negative total cost.

And that can only happen if there's a negative cost cycle somewhere along that path.

That's a really neat way to check for a potential issue in the network.

So we've covered the matrix multiplication approach and the very versatile Floyd Warshall algorithm.

Now the source introduces one more algorithm,

Johnson's algorithm.

Yes, Johnson's algorithm is particularly well suited for sparse graphs.

That is graphs with few edges.

Sparse graphs.

So like a social network where not everyone is friends with everyone else, or a road network where not every city has a direct road to every other city.

Exactly.

And Johnson's algorithm achieves its efficiency by using a clever technique called reweighting.

Okay, reweighting sounds interesting.

What is that exactly?

It's basically transforming the edge weights, the costs of the original graph to make them all non -negative, but in a way that doesn't change which paths are the shortest.

So we're manipulating the costs, but not the underlying structure of the shortest paths.

Exactly.

And once we have all non -negative edge weights, we can use Dykstra's algorithm, which as we mentioned earlier, can be very efficient for finding shortest paths from a single source, especially for sparse graphs.

Okay, I see the general strategy.

But how do we reweight the edges without messing up the shortest paths?

It sounds tricky.

It's actually pretty elegant.

We introduce what's called the potential function, denoted by h.

This function assigns a real number, h of v, to each vertex v.

And then we define a new edge weight.

Let's call it w -ot of uv, which is equal to the original weight, w of uv plus h of u minus h of v.

Okay, so we're adding and subtracting these potential values from the original weights.

And the magic is that the differences between the total costs of any two paths remain the same, even though the individual edge costs have changed.

You got it.

It's like adding a constant to every number in a set.

The differences between the numbers stay the same.

And the source points out that this reweighting also preserves the presence of negative cost cycles.

So if the original graph had one, the reweighted graph will too.

But the key is choosing the potential function, h of v, so that all the new edge weights are non -negative.

Right.

Because if we want to use Dykstra's algorithm, we need non -negative edge weights.

So how do we do that?

How do we choose h of v?

Johnson's algorithm cleverly constructs called an augmented graph.

We take the original graph, let's call it g, and we add a new vertex, a dummy source vertex, let's call it s.

And from s we add zero weight edges to every other vertex in g.

If the originally graph g didn't have any negative cost cycles, this augmented graph won't either.

Then we run the Bellman -Ford algorithm on this augmented graph, starting from s.

Okay, so we run Bellman -Ford on this augmented graph.

What do we do with the results?

The result of Bellman -Ford will give us the shortest path cost from s to every other vertex in the augmented graph.

We'll call these shortest path costs delta of s, v.

And we define our potential function, h of v, to be precisely this shortest path cost delta s, v.

Okay, that's an interesting choice.

But why does setting h of v to be delta of s, v guarantee that all the reweighted edge weights will be non -negative?

It boils down to something called the triangle inequality.

For any edge from u to v with the original cost w of u, v, the shortest path from s to v in the augmented graph cannot be longer than the shortest path from s to u plus the direct edge from u to v.

Right, because going from s to u and then directly to v is one possible path from s to v, so the shortest path can't be any longer than that.

Exactly.

And mathematically this means that delta of s, v is less than or equal to delta of s, u plus w of u, v.

And if we rearrange this inequality, we get w of u, v plus delta of s, u minus delta of s, v is greater than or equal to zero.

But that's precisely our reweighted edge weight w -witt of u, v, which is equal to w of u, v plus h of u minus h of v, where h of u is delta of s, u and h of v is delta of s, v.

So we've guaranteed that all our reweighted edge weights are non -negative.

That's pretty slick.

So to recap Johnson's algorithm, first we create the augmented graph by adding the dummy source vertex and the zero weight edges.

Second, we run Bellman -Ford from the dummy vertex to check for negative cost cycles.

If there are any, well, this basic version of the algorithm won't work.

Third, if there are no negative cost cycles, we set the potential function h of v to be the shortest path cost from the dummy vertex to v.

Fourth, we reweight all the edges using this potential function.

Fifth, we run Dijkstra's algorithm from each vertex in the original graph using these new non -negative edge weights.

And finally, we use the potential function again to adjust the resulting shortest path cost back to the original costs.

That's a perfect summary.

And the time complexity of this whole process is dominated by running Dijkstra's algorithm v times, where v is the number of vertices.

And if we use an efficient priority queue like a Fibonacci heap, each run of Dijkstra's takes about v log v plus e time, where e is the number of edges.

So overall, Johnson's algorithm runs in o of v squared log v plus v times e.

And for sparse graphs, where e is much smaller than v squared, this can be much faster than the n cubed algorithms like Floyd -Warshall.

Exactly.

And it nicely illustrates how we can combine different algorithmic techniques to solve a problem efficiently.

We use Bellman -Ford once to check for negative cost cycles and to set up the potential function.

And then we leverage the efficiency of Dijkstra's algorithm to find the shortest paths from each source vertex in the reweighted graph.

So we've got a whole toolkit of algorithms for finding all pairs shortest paths, each with its own strengths and weaknesses.

And the best choice depends on the characteristics of the graph we're dealing with.

And the source mentions an application of all this, finding the diameter of a network.

Right.

The diameter of a network is the longest of all the shortest paths between any two points.

So once we've calculated all the shortest path costs, we just have to find the maximum value in that matrix.

So for example, in a social network, the diameter could give us a sense of how many degrees of separation there are between the most distant individuals in the network.

Or in a transportation network, it could tell us the maximum travel time between any two points.

Exactly.

It's a measure of how spread out or connected the network is.

So to wrap up, we've explored several different approaches to solving the all pairs shortest paths problem.

We started with a naive approach of repeatedly running a single source shortest path algorithm, like Dijkstra's or Bellman -Ford, from each vertex.

Then we saw how matrix multiplication, albeit with a twist in the form of the tropical simmering, could be used to calculate the shortest paths.

We looked at the SLO -APSP algorithm and the much faster APSP algorithm that uses repeated squaring.

We explored the Floyd -Warshall algorithm, which is based on the idea of gradually allowing more and more vertices as intermediate stops, and how it can be adapted to find the transitive closure and detect negative cost cycles.

And finally, we saw how Johnson's algorithm leverages the power of reweighting to efficiently solve the problem for sparse graphs, combining Bellman -Ford and Dijkstra's algorithm in a clever way.

It's been a fascinating journey through the world of algorithms and how they can be used to solve a fundamental problem in graph theory and network analysis.

And for you the listener, I hope this deep dive has not only provided a solid understanding of these specific algorithms, but also shown how computational thinking and problem -solving techniques can be applied to a wide range of challenges.

So as you go about your day, think about the problems you encounter.

Can they be broken down into smaller subproblems?

Can you find creative ways to transform or reframe them to make them easier to solve?

The world is full of networks, connections, and paths, and understanding these algorithms can give you a new perspective on how to navigate them efficiently.

Thanks for diving deep with us.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers
Computing shortest paths between all pairs of vertices in a weighted directed graph is a fundamental problem with direct applications to network routing, graph diameter determination, transitive closure computation, and project scheduling. The all-pairs shortest paths problem requires building an n×n distance matrix where each entry contains the minimum-weight path connecting two vertices, along with an optional predecessor matrix for reconstructing actual paths. A straightforward approach of executing a single-source shortest-path algorithm from each vertex appears reasonable but becomes computationally expensive: repeating Dijkstra's algorithm yields O(V² log V + VE) time complexity while the Bellman-Ford approach costs O(V²E), motivating more efficient specialized techniques. A matrix-based dynamic programming formulation defines lᵢⱼ^(r) as the shortest path weight using at most r edges, iteratively building each layer from the previous through a recurrence that resembles matrix multiplication over the tropical semiring where minimum operations replace standard addition and addition operations replace multiplication, producing SLOW-APSP with Θ(n⁴) runtime. Applying repeated squaring to this matrix framework yields FASTER-APSP achieving Θ(n³ log n) performance. The Floyd-Warshall algorithm offers superior efficiency by iterating through intermediate vertices rather than edge counts, employing the recurrence dᵢⱼ^(k) = min(dᵢⱼ^(k–1), dᵢₖ^(k–1) + dₖⱼ^(k–1)) to progressively construct shortest paths using larger vertex sets, attaining Θ(n³) time complexity with Θ(n²) space usage while simultaneously maintaining predecessor information for path reconstruction. Floyd-Warshall detects negative-weight cycles by checking whether any diagonal entry becomes negative and naturally extends to transitive closure computation by substituting Boolean OR and AND operations for min and addition respectively. For sparse graphs containing negative edge weights but lacking negative cycles, Johnson's algorithm achieves O(V² log V + VE) complexity through an elegant reweighting transformation: introducing an auxiliary source vertex connected to all others with zero-weight edges, applying Bellman-Ford to compute vertex potentials, reweighting edges as ŵ(u,v) = w(u,v) + h(u) – h(v) to eliminate negative values, running Dijkstra from every vertex on the reweighted graph, and finally reversing the transformation to recover original distances.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥