Chapter 14: Dynamic Programming: Rod Cutting, LCS, and Optimal BSTs

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement, not replace the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Hey everyone, welcome back to the Deep Dive.

You know we like to get down to the nitty gritty.

Today, we're diving into dynamic programming, giving you a super efficient path to really grasp it.

Yeah, it's a core technique in algorithm design.

So our deep dive today, we're gonna take you through like a whole chapter on dynamic programming, cover every single section, every big idea, the theories behind it, how it works and all those helpful examples, all in a way you can follow.

We're really gonna hit every piece.

So dynamic programming, just to set the stage, it's this approach where you break down big problems into smaller ones, kind of like that divide and conquer strategy.

But dynamic programming gets interesting because unlike divide and conquer, which splits problems into like separate independent parts, dynamic programming shines when these smaller chunks overlap.

They share even tinier sub -problems, yeah.

And what's so cool about that overlap, well, dynamic programming's like it's side steps this trap of solving the same many problems again and again.

Instead, it figures out each unique sub -problem just once and stores that answer.

It's almost like having a lookup table.

So if we run into that same sub -problem later, boom, we grab the answer we already figured out, saving a ton of effort.

Precisely, and one thing I think sometimes tricks people up is that the programming part isn't about writing code in the traditional sense.

It's more about this structured tabular approach.

Right, right.

And this is really important.

Dynamic programming is a powerhouse for optimization problems.

Those situations where there are tons of possible solutions, but each one has some kind of value attached to it, a cost, a profit, efficiency, whatever.

Our job in optimization is to pinpoint that one solution with the absolute best value could be the lowest cost, the highest gain, using resources the smartest way.

That's what we're talking about when we say optimal solution.

That's the target.

Okay, so if we're gonna actually build these dynamic programming solutions, what's the game plan?

The chapter lays out these four key steps.

First, we gotta understand what an optimal solution looks like.

How does the best answer for the whole problem relate to the best answers for its smaller parts?

Yeah, and step two formalizes that.

We define in a recursive way what the value of that optimal solution is.

Basically, it's an equation connecting the best outcome for the big problem with the best outcomes for those smaller pieces we talked about.

Right, then step three, we gotta actually figure out the numbers, compute the value of our optimal solution.

The chapter focuses on a bottom -up approach here.

We start by solving those teeny tiny sub -problems first, and then systematically work our way up to the big kahuna, our original problem.

And remember that table we mentioned for storing results.

That's where all these optimal values for the sub -problems live.

And finally, step four, we take all that calculated info and build the actual optimal solution.

It's like following a trail.

We can often trace back through our stored results to see exactly which choices led us to that best outcome.

Though, sometimes if all we need is the value, like the maximum profit,

we can maybe skip this reconstruction step.

All right, so what kind of cool problems can we tackle with this dynamic programming magic?

The chapter dives into some interesting ones.

We're gonna see how to maximize profit when cutting a rod, how to multiply a whole bunch of matrices efficiently, find the longest shared part of two strings, and even build the best binary search trees all through the lens of dynamic programming.

You ready to get started?

Absolutely.

Let's start with a classic, the rod cutting problem.

Picture this.

You have a metal rod, let's say an inches long.

You also have this price list showing how much you can sell different lengths of rod for.

The challenge is to find the best way to cut that rod, or not cut it at all to make the most money.

And what's wild is how many different ways you can cut that rod, right?

If it's N inches long, there are like two to the power of N1 ways to chop it up.

That number gets out of control fast as the rod gets longer.

So brute forcing every single combo just isn't feasible.

Dynamic programming gives us a much smoother route to finding that optimal revenue, what we call R0 by 820 by 99 for a rod of length N.

Okay, so how do we tackle this in an optimal way?

Well, the chapter gives us this recursive formula for R0 by E20 by 820 by 820 by 99 is the maximum of a bunch of options.

First we have the price, if you don't cut it at all, just P0 by E20 by 820 by 99, the price for the whole rod length then.

But then we got to consider all the possible first cuts.

Right, like what if we cut off a piece of length thigh?

I could be one inch, two inches, all the way up to N1 inches.

We sell that piece for panger, the price for length thigh, and we're left with a piece NI inches long.

And of course we want to optimally cut that remaining piece too, to get R0 by E20 by 820 by 99, the best revenue for that length.

The formula looks at all these first cut possibilities.

No cut, a first cut of one inch plus the optimal revenue for the rest, a first cut of two inches plus the optimal revenue for what's left, and so on.

Right, we're trying to maximize over all of those.

And to put this recursive idea into code, we could use a top -down approach.

The chapter talks about this procedure called C -key rod.

It basically takes the formula and implements it directly, but here's the catch.

It can be really inefficient.

It ends up with a running time of O2N, which is exponential, not ideal.

Oof, that's not what we want.

It's like this, we keep resolving the same subproblems over and over.

Imagine this branching tree of recursion.

For a rod of length N, there are tons of branches and leaves, and many of those represent the same subproblem being solved again and again.

A lot of wasted effort there, yeah.

But that's where dynamic programming comes in.

The chapter gives us two strategies to make this way more efficient.

The first is still top -down, but uses something called memoization.

That's memoized cut rod.

Memoization, it's all about remembering what we've already calculated, right?

We use a table to store these results.

So before we jump into figuring out the best revenue for a certain length, we check, hey, have we done this before?

If we have, we just grab that stored value and skip the whole recalculation.

If not, we do the calculation, and this is key, store the result in our table before returning it.

The simple act of remembering transforms an exponentially slow algorithm into something much more manageable, giving us a running time of Owen.

Huge difference.

Now, the second approach is the bottom -up method, the bottom -up cut rod procedure.

Here, we're building from the ground up.

Yeah, we tackle those smallest problems first.

We start with the absolute tiniest subproblems.

For rod cutting, that's figuring out the best revenue for a rod of length zero, then one, then two, and so on, all the way up to our target LinkedIn.

And just like before, we keep track of the optimal revenue for each length in an array.

The big advantage here is that whenever we're working on a certain length, we know for sure that we already have the best solutions for all the shorter lengths tucked away in our table.

This one also gives us a running time of Owen.

So both memorization and bottom -up get us to the same efficiency.

The chapter then brings in this cool concept, the subproblem graph.

Imagine a visual map showing how all the subproblems connect and depend on each other.

For rod cutting, each rod length from zero up to one end would be a point on this graph.

And a line between two lengths means solving for one depends on knowing the solution for the other.

It's a way to see the structure of the problem, yeah.

And you can see how our strategies connect to this graph.

Bottom -up, it's like going through the points in reverse copological order, solving from smallest to largest so that we always have the answers we need ready.

Top -down with memorization, it's more like doing a depth -first search through this graph.

It's the same solution just arrived at in different ways.

Now, what if we want more than just the maximum revenue?

What if we wanna know the exact cuts to make?

The chapter shows us how to extend the solution to reconstruct those optimal cuts.

We're gonna add a little something extra.

It's the extended bottom -up cut rod procedure.

It's the bottom -up approach, but with a twist.

Alongside the optimal revenue for each length, we also store the size of the first piece to cut off to get that revenue.

So for a rod of length, Jay, we'd not only have the best price, but we'd know the length I of the initial cut that got us there.

Like a breadcrumb trail.

And then the print -cut rod solution procedure uses those breadcrumbs to actually print out the optimal cut sequence following the trail from the final revenue back to the first cut.

Makes sense.

The chapter also mentions greedy strategies approaches where you try to make the best local choice at each step.

You might think, okay, always cut off the piece with the highest value per inch, the highest density.

But the chapter subtly hints in exercise 14 .1 to two that this isn't always the best overall strategy.

Sometimes a less valuable initial cut can set you up for better cuts later, leading to higher total revenue.

Yeah, it's not always about immediate gratification.

And one more thing.

Even if we added a fixed cost for every cut we made, the chapter notes that we could still solve this modified problem with dynamic programming.

Exercise 14 .13 points that out.

It just needs a little tweak to the formula to account for that cut cost.

A slight adjustment, but the core idea is the same.

So we've gone from cutting rods to get this matrix chain multiplication.

This one's all about multiplying a bunch of matrices in the most efficient way.

You see, with matrix multiplication, you can group them using parentheses however you want that's associativity.

The final result's the same, but the number of calculations needed can change a lot depending on how you group them.

Yeah, parenthesization matters.

It's not just a minor difference either.

The chapter stresses that choosing the wrong grouping can make the computation ridiculously expensive.

And the number of ways to parenthesize N matrices explodes, it grows like E2, which is related to those Catalan numbers, which are fascinating because they pop up in all sorts of unexpected places, like counting the number of binary trees or the number of ways to parenthesize an expression.

Trying every single grouping is just impossible for even moderately long chains of matrices.

Yeah, it's a smarter approach.

Just like with rod cutting, we can use dynamic programming here.

First, we need to figure out how an optimal parenthesization is structured.

The key is that an optimal way to multiply a chunk of matrices, say from A to AA,

must involve splitting it somewhere say between AA 20 by 20 by 9A and AATA 20 by 20 by 9AA.

And then the way we parenthesize those two resulting chunks, AA to AA 20 by AA 20 by 9A and ARD 20 by AA 20 by AA 20 by 980AT, those have to be optimal too.

This is optimal substructure again.

It all boils down to finding the best split point.

So to make this repulsive, let's say MI AI.

J is the minimum number of multiplications to compute the product from AIT.

If I and J are the same, meaning it's just one matrix, the cost is zero, no multiplications needed.

If I is less than J, we need to consider splitting at every point K between I and J1.

And then we add up the costs of those smaller chunks.

Right, for each split at K, the total cost is the cost of optimally multiplying the left chunk, which is A to K, plus the cost of optimally multiplying the right chunk, which is MK plus one J plus the cost of multiplying the two results together.

Now, if the product AIG AA 20 by 820 by 820 by 820 by 9A has dimensions PI AIG AA 20 by 820 by 8B by 8BA by PI AIG AA 20 by 820, and the product AIPAKI 20 by 820 by 9AA is a P0 OB 20 by 820 by 820 by 8B times P0 AI820 by 9A times P scalar multiplications, we need to find the K that minimizes this whole cost.

We also keep track of this best split point K in a table, usually called SI J for each sub problem IJ.

So we're building up these two tables as we go.

For the actual computation, we use the matrix chain order procedure, a bottom up approach to fill those M and S tables.

We start with chains of length one, then two, then three, and so on, until we reach the full chain of length N.

We're solving it piece by piece.

For each chain length L, we look at all the possible starting points I.

For each sub chain E, where J is just I plus L1, we try splitting at every K within the sub chain.

We use our formula to calculate the total cost, which involves the optimal costs we already calculated for the shorter chains.

Then we store the minimum cost in my J and the corresponding split point K in CJ.

Since we're building up, whenever we're figuring out the cost for a chain of length L, we know we already have the best solutions for all the shorter chains we need.

The total running time is O, and we need N space for the tables.

To see the optimal parenthesization, we use the print optimal parents procedure.

This is a recursive function that uses our fresh table and the start and end indices I and J.

So we're taking that table and using it to actually see the solution.

It looks up the optimal split point K in C, if I and J are the same, just one matrix at print CA.

Otherwise, it recursively calls itself to print the optimal parenthesization for the left part from I to K, then prints ABI0IZ20 by 8A20 by 9AA, and then recursively calls itself again to print the optimal parenthesization for the right part from K plus one to J.

It puts parentheses around those recursive calls to show the grouping.

And that's how we actually see the best parenthesization.

All right, we've seen dynamic programming with rod cutting and matrix multiplication.

Now the chapter gets into the nuts and bolts, the underlying principles of dynamic programming that make it so effective.

It comes down to two things,

optimal substructure and overlapping subproblems.

The two key ingredients, yeah.

Let's look at optimal substructure.

Like we've seen, a problem has this property if an optimal solution for the whole problem also contains optimal solutions for its subproblems.

Remember rod cutting.

Finding the best way to cut the rod involves finding the best ways to cut the smaller pieces after that first cut.

And in matrix multiplication,

the best parenthesization relies on the best parenthesizations for smaller chains.

Right, it's a kind of recursion in the optimality.

The chapter talks about this cut and paste idea to see if a problem has optimal substructure.

Say you have a supposedly optimal solution that includes a non -optimal solution for a subproblem.

You could swap that bad subsolution for the optimal one and get a better overall solution, right?

That means our original optimal solution wasn't actually optimal.

A way to test it, yeah.

But important point, not all problems have this optimal substructure.

The unweighted longest simple path problem, for example, finding the longest path in a graph without repeating nodes that usually doesn't have optimal substructure.

The longest path between two points might not use the longest paths between points along the way.

The subproblems aren't independent like they are in good dynamic programming problems.

Choices you make early on can limit your options later.

It's different from the shortest path problem where subproblems are independent.

Right, there are definitely problems where dynamic programming isn't the best tool.

Now, the other key ingredient is overlapping some problems.

This is when a basic recursive solution ends up solving the same subproblems many times, like that first naive recursive solution for rod cutting or a direct recursive approach for matrix multiplication.

Without memorization or the bottom -up approach, they'd make tons of calls to calculate the same things over and over.

A lot of wasted work.

And here's where dynamic programming shines.

It takes advantage of these overlaps by solving each unique subproblem just once and storing the answer.

Then, if it needs that answer again, boom, it just looks it up, no more expensive recalculation.

It's different from divide and conquer, which usually creates brand new subproblems at each step.

The recursion tree for naive matrix chain multiplication really shows how many repeated subproblems can pop up.

Yeah, it's a great visual.

And to deal with those repeated calculations, we have memoization, which we saw with rod cutting.

It's this general technique, a top -down approach that combines recursion with storing results.

It's elegant, really.

A memoized algorithm first checks if it's already computed a subproblem.

If so, it retrieves the stored value.

If not, it calculates it.

But this is important, it stores the result before returning it.

The memoize matrix chain procedure in the chapter shows this perfectly.

It has the same own time as the bottom -up method, but with a more intuitive top -down structure.

It can be easier to grasp that way, yeah.

All right, moving on from rods and matrices,

let's talk about sequences.

Specifically, the longest common subsequence problem, or LCS for short.

The classic string problem.

Here's the question.

Given two sequences, let's call them X and Y, what's the longest subsequence they share?

Quick reminder, a subsequence is what you get when you delete some stuff from a sequence without changing the order.

So if X is A, B, C, B, D, A, then Y is B, D, T, Y, B, B, C, B, A is a common subsequence.

Actually, it's the longest common subsequence in this case.

Finding that overlap.

LCS, it turns out, also has that optimal substructure.

Let's say X is M elements long, ending with XL by D20 by A20 by 99, and Y is N elements long, ending with XL by E20 by A20 by 99.

Yeah, let's work it down.

If those last elements are the same, X by E20 by A20 by 99 equals A0 by E20 by A20 by 99, then that last element has gotta be in any longest common subsequence, right?

And the rest of the LCS is just the LCS of the sequences without those last elements.

XL by A20 by A20 by 99 and YG20 by A20 by A20 by 99.

It's like peeling back layers.

But what if X by A20 by A20 by A20 by 99 isn't equal to I0 by E20 by A20 by 99?

Well, then the LCS can't end with both of them.

So it's either the LCS of X by A20 by A20 by 99 in YU, or it's the LCS of XOYI key 20 by A20 by 99, whichever one is longer.

So we have two options there, and we take the best one.

This lets us define a recursive way to get the length of the LCS.

Let's say CJ is the length of the LCS of the first I elements of X, call it X, and the first J elements of Y, called Y.

If either I or J is zero, meaning one of the sequences is empty, the LCS length is zero, easy.

Now, if C equals Y, then CJ is just Psi1 J1 plus one.

We found a common element, so we add one to the length of the LCS of what came before.

But if CJ and Y are equal, CJ is the bigger of Psi1, JLCS of L pi K20 by A20 by A20 and Y, and CJ1 LCS of X and Y day by A20 by A20 by M plus one, where M is the length of X and N is the length of Y.

It does this systematically, usually row by row.

Right, it's like filling in a grid.

While it's doing that, it also fills out another table, B, which is M by N.

This table keeps track of which subproblem we used to get each entry in C.

So for a cell CJ by J might have a diagonal arrow if we used Psi1 J1 and up arrow if we used Psi1 J, or a left arrow if we used CJ1.

Filling these takes up in time and open space.

So we're keeping track of the choices we made along the way.

Then with the B table, we can use print LCS to actually get one of the longest common subsequences.

We start at the bottom right of BBMN and follow the arrows backwards.

A diagonal arrow means those characters were in the LCS, so we print it and move diagonally.

An up arrow means go up and a left arrow means go left.

We keep going till we hit the edge of the table.

Reconstructing the LCF this way takes OM plus N time.

Pretty efficient, yeah.

The chapter mentions that we could actually ditch the B table entirely and still get the LCS in OM plus N time.

We just look at the C table to figure out which subproblem led to the current value.

If C, J equals C1, J1 plus one, then C and Y must have been the same and in the LCS.

If not, we check if C, J equals C1, J or C, J1 and move in that direction.

We save some space this way, but figuring out the LCS length still needs EM space for C.

But if we only care about the length and not the actual subsequence.

Yeah, if all we need is the length, the chapter says we can get away with OM in space just by remembering the previous row or column of C as we go.

Cool optimization.

All right, last problem, optimal binary search trees.

This one's about building a binary search tree for a set of keys.

We know the chances of searching for each key and the chances of searching for values between keys.

Those are dummy keys.

We wanna arrange the tree so that searches are as fast as possible on average.

We're minimizing the expected search cost.

Think about searching a binary search tree.

You start at the top, compare your key to the one at the current node, go left or right and keep going until you find your key or hit the end.

The depth of a node, how many steps down it is, affects the cost.

Deeper means more comparisons.

So we wanna put frequently searched keys closer to the top and less frequent ones further down.

Making the tree as efficient as possible.

This optimal binary search tree problem also has optimal substructure.

The chapter explains that if an optimal tree T has a subtree T for a range of keys, then T itself has gotta be optimal for that range of keys.

If it wasn't, you could swap it with a better subtree and make T better overall, which means T wasn't truly optimal to start with.

It's like a nested optimality.

Now let's say EJ is the expected cost of an optimal binary search tree for keys up to T.

The base case is when J is I1, meaning there are no keys, just a dummy key.

Do you know knees, but hey 20 by 820 by 8BO.

In that case, E die I1 is just coiming by 820 by 820 by 8BO, the probability of searching for that dummy key.

The simplest possible tree.

But when I is less than or equal to J, we gotta consider every key from kind to K as the root of the subtree.

If we pick care by E20 by 820 by 9B as the root, then kind to care by E20 by 820 by 9B or a form the left subtree and care by 820 by 9B arrows form the right subtree.

The expected cost for this whole subtree is the expected cost of the left subtree arrow I R1 plus the expected cost of the right subtree arrow plus 1J plus the sum of probabilities for all the keys and dummy keys in this subtree that sum is WIJ.

Right, we're adding up the cost of the pieces and the cost of the root.

So our formula for AJ becomes the minimum of E R1 plus or plus one J plus WIJ over all possible root choices Carol by E20 by 820 by 9B where R can be I up to J.

We're trying out all the possible roots and picking the one that gives the lowest expected cost.

We also store the index of this optimal root in root EJ like we did with the split point in matrix multiplication.

The optimal BST procedure uses a bottom up approach to figure out all the AJ and root EJ values.

It uses three intervals, E for the expected costs, R for the optimal root indices and W error for the sums of probabilities.

Pre -calculating those probabilities helps.

The W table is easy to calculate.

Y I1 is just congeal 20 by 820 or by eight euros and then YJ is YJ1 plus P plus Q.

Then the optimal BST procedure goes through all the possible subtree lengths and starting points calculating EJ and root EJ using our formula and the W table.

This whole thing takes oh time.

Another efficient solution.

To wrap up, the chapter has this really helpful glossary of terms.

It starts with the definition of dynamic programming emphasizing optimal substructure and overlapping subproblems.

It explains memoization and bottom up dynamic programming as well as the subproblem graph, optimal solutions and how dynamic programming compares to divide and conquer and greedy strategies.

We see definitions for specific problem terms like subsequence, LCS, expected search costs and optimal binary search tree.

Yeah, it's a nice recap.

And it goes deeper defining terms like parenthesization and scalar multiplication for matrix chain multiplication.

It mentions recursion trees, topological sort and even those Catalan numbers.

For rod cutting, there's density and a nod to the partition function.

To show when dynamic programming might not be the best fit, we have unweighted shortest path and unweighted longest simple path.

It even touches on NP completeness.

It really covers a lot of ground.

And it doesn't stop there.

It includes terms like edit distance, bitonic tour,

conviviality rating, even the Viterbi algorithm and scene carving showing how widely applicable dynamic programming is.

It even mentions war from baseball statistics.

Exactly.

It's awesome to see how these dynamic programming ideas pop up in so many different areas.

It's a versatile tool.

So to sum up this deep dive, we've covered this whole chapter on dynamic programming.

We saw how it works, breaking down problems, solving sub problems and combining solutions.

We compared it to other methods and saw why it's great for optimization.

Right, we laid the groundwork.

We carefully went through the four steps to build a dynamic programming algorithm.

We started with rod cutting, exploring both memoization and bottom -up approaches, even figuring out how to find the best cuts.

Then we tackled matrix chain multiplication, learning how to get that optimal parenthesization.

We saw those classic problems in action.

Then we dug into optimal substructure and overlapping sub problems, the heart of what makes dynamic programming tick and revisited memoization.

We conquered the longest common subsequence problem from the recursive formula to the bottom -up solution and how to build that subsequence.

And we wrapped up with optimal binary search trees, figuring out how to make searches as efficient as possible.

And we went through that glossary, making sure all the key terms and vocabulary are clear.

A complete tour of the chapter, yeah.

This journey through dynamic programming has given you a powerful way to approach all kinds of problems.

It's not just about the specific algorithms, it's about this new way of thinking, breaking down complexity into manageable pieces and solving them systematically.

It's a mind shift as much as anything.

One last thing to ponder.

Now that you've got this dynamic programming knowledge,

look around your world, your work, your hobbies, whatever.

Can you spot complex situations that could be broken down into smaller repeating parts?

Even if you're not coding, think about how this idea of solving a sub problem once and reusing that solution might help you learn or solve problems more efficiently.

It's a powerful concept that goes way beyond just algorithms.

Thanks for joining us on this deep dive.

See you next time.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Dynamic programming addresses optimization problems by exploiting two fundamental structural properties: optimal substructure, which ensures that an optimal solution to a larger problem can be constructed from optimal solutions to its constituent subproblems, and overlapping subproblems, which recognizes that identical subproblems appear multiple times during computation and their results can be cached to eliminate redundant work. The methodology consists of four sequential steps—first characterizing the structure of an optimal solution, then formulating its value through a recursive relationship, subsequently computing that value through either bottom-up tabulation or top-down memoization, and finally reconstructing an actual solution if required. The rod-cutting problem demonstrates how a naive recursive solution degrades to O(2ⁿ) time complexity, while both memoization and bottom-up approaches achieve O(n²) efficiency by storing previously computed maximum revenues; notably, a greedy heuristic based on price-per-unit length fails to guarantee optimality. Matrix-chain multiplication extends these concepts to determine the most efficient way to parenthesize a sequence of matrix products, minimizing the total number of scalar multiplications required—accomplished in O(n³) time by maintaining tables that record both minimum costs and the optimal split points at each subproblem. The longest common subsequence problem identifies the longest shared contiguous subsequence between two input strings in Θ(mn) time and O(mn) space using a two-dimensional recurrence table, though space usage reduces to O(min(m,n)) when only the length computation is necessary. Optimal binary search trees represent a sophisticated application that accounts for nonuniform access probabilities across keys and includes dummy keys representing unsuccessful search attempts, computing minimum expected search cost in O(n³) time with O(n²) space requirements; this approach yields practical benefits in compiler design and database system optimization. Each of these problems reinforces the fundamental insight that decomposing complex problems into overlapping subproblems and storing intermediate results transforms exponential-time solutions into polynomial-time algorithms.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 14: Dynamic Programming: Rod Cutting, LCS, and Optimal BSTs

Related Chapters