Chapter 8: Least-Squares Fitting

Search this chapter

Audio Overview

0:00 / 0:00

Autoplay next chapter

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Imagine you're standing in a freezing laboratory.

Oh, since chilly.

Right, very chilly.

And you are trying to find absolute zero.

Okay.

Like the theoretical coldest possible temperature in the entire universe.

The exact point where, you know, all atomic motion just stops dead.

Which is a pretty profound universal truth.

Exactly.

But here's the problem.

The thermometer in your hand is shaking.

Right.

The pressure gauge you're reading is flickering between numbers.

Maybe your eyesight isn't perfect.

Yeah, the human element.

Exactly.

So how on earth are you supposed to pinpoint a flawless universal constant using like flawed, messy human measurements?

I mean, that gap right there, that gap between the messy reality of a lab and the clean elegance of universal laws, that's the central drama of physics.

For a long time, we talk about taking isolated measurements.

You know, like you measure the length of a dust 10 times, you find the average and boom, you have your answer.

You're just circling one static truth.

Right.

But the moment you start asking how this thing affects that thing.

Like how temperature affects pressure.

Exactly.

Or how mass affects the stretch of a spring.

The mathematics of uncertainty gets, well, incredibly daunting.

Because you're no longer just dealing with a point.

You're dealing with a relationship.

Right.

And that transition is exactly our mission for this deep dive.

I'm excited for this one.

Me too.

We're taking a massive conceptual leap today, looking at some brilliant source material on error analysis,

specifically chapter eight of introduction to error analysis.

It's a classic.

It really is.

And we're going to master this statistical superpower known as least squares fitting.

Least squares.

Because the goal here is to understand how to sift through a cloud of scattered imperfect data points to find the hidden mathematically most probable relationship driving them.

Right.

Because the data is never perfect.

Never.

So we're going to look at the intuition behind fitting a straight line, how we quantify our doubt when making predictions,

and what happens when the universe just refuses to give us a straight line at all.

Which happens a lot.

So to start making sense of this, we really have to visualize the most basic scenario.

Okay.

Paint a picture.

You're running an experiment with two variables.

You have your independent variable, which we'll call X, and your dependent variable Y.

And you strongly suspect that if you plotted these on a graph, they'd form a straight line.

Right.

So the classic equation for this would be Y equals A plus B times X.

Exactly.

Where A is the intercept on the Y axis and B is the steepness, you know, the slope of the line.

Right.

And in a pristine theoretical universe, every single measurement you take would land directly on that line.

Like pearls on a string.

Yeah, exactly.

But we live in reality.

We do.

When you actually plot your data, it looks like a shotgun blast loosely grouped around an invisible diagonal line.

It's a mess.

It is.

So to even begin to tackle the math of finding the best possible line through that scatter, we have to make a very strategic assumption.

Right.

A simplification.

Yeah.

We assume that all of our measurements of X are perfectly precise and all the messy, frustrating uncertainty lives entirely in our measurements of Y.

It's deliberate, but it's a very powerful simplification.

Right.

We're basically saying the uncertainty in X is so small, it's negligible compared to the wild variations in Y.

Right.

Okay.

Furthermore, we assume that these errors in Y are completely random and follow a Gaussian distribution.

A bell curve.

Exactly.

A classic bell -shaped curve.

That means if you took, say, a million measurements at the exact same X value, the Y values would cluster around the true line in that bell shape.

You got it.

Centered right around the theoretical line.

So to make this concrete for you listening, imagine you're dropping a heavy steel ball off a balcony to measure gravity.

Classic physics lab.

Right.

And you're timing the fall with this hyper precise digital laser gate stopwatch.

Fancy.

Yeah.

Spared no expense.

That stopwatch is your X variable.

It is completely reliable.

Okay.

But to record the height, you have a friend standing on a ladder with a marker.

Yeah.

Physically jabbing at a piece of paper on the wall as the ball zooms past.

That's going to be messy.

Super messy.

That shaky hand, the slow reaction time, the terrible hand -eye coordination that is your Y variable.

Right.

So all the significant error in this experiment is happening on the Y axis.

Exactly.

And building on that scenario, we can deploy something known as the principle of maximum likelihood.

Which is like the engine of this whole thing, right?

It is the engine that drives least squares fitting.

Because we assumed the errors in E follow that Gaussian bell curve, we can actually write down a concrete mathematical formula.

A formula for what exactly?

For the probability of obtaining the exact messy data set you just collected.

Oh, wow.

So the ultimate goal here isn't just to like draw a line that looks okay to the naked eye.

No, no, not at all.

The goal is to mathematically prove that the line we draw makes our specific messy data the most probable outcome in the universe.

Exactly.

And the mechanics of how we do that are quite beautiful.

Think about how probabilities combine.

Like flipping coins.

Right.

If you want to know the probability of multiple independent events happening, flipping a coin 10 times and getting heads every time, you multiply the individual probabilities together.

Right.

Half times half times half.

Exactly.

Now, the formula for a Gaussian probability involves an exponent.

The mathematical deviation from the true value is sitting right up there in the exponent.

Okay.

And it's squared.

Now, when you multiply terms with exponents together, what do you do to the exponents?

You add them.

So if we want to maximize the overall probability, if we want to find the absolute peak of that combined bell curve— We have to look at that summed exponent.

Right.

And because of a negative sign in the Gaussian formula,

maximizing the whole probability is mathematically identical to making that summed squared exponent as tiny as possible.

As small as mathematically allowed.

Yes.

And we call this critical number chi -squared.

Chi -squared.

Right.

Okay.

So finding chi -squared means taking the vertical distance from your actual data point down to your theoretical line, squaring that distance, dividing it by the uncertainty squared, and then adding up all those values for every single point on your graph.

You got it.

The true best fit line is the one specific angle and placement that makes that final total sum as small as possible.

Hence, least squares fitting.

Exactly.

I love the physical analogy for this.

It really helped me picture it.

Oh, the rubber band one.

Yeah.

So imagine taking a piece of wooden pegboard and plotting your data by sticking pegs into it.

Okay.

I'm picturing it.

Now, you take a rigid metal rod, which represents your line, and you lay it across the board.

You would catch tiny rebel bands from each peg to the metal rod, pulling perfectly straight up or down along the i -axis.

Right.

Along the axis, because that's where all our error is.

Exactly.

The physical tension, like the potential energy of a stretched rubber band, actually scales with the square of the distance it's stretched.

Which perfectly mimics the math.

Right.

So if you just let go of the metal rod, it will naturally snap into the exact resting position that minimizes the total square tension of all those rubber bands pulling against each other.

It finds its own equilibrium.

Yes.

That physical equilibrium point is your least squares best fit line.

That's a great way to look at it.

And it perfectly illustrates a key feature of this math, too.

It punishes large deviations heavily.

Because you're squaring the distance.

Right.

A data point that is two inches away from the line pulls four times as hard as a point one inch away.

Wow.

A point three inches away pulls nine times as hard.

So it aggressively forces the line to try and accommodate the extreme outliers without abandoning the tightly clustered points.

That makes total sense.

Now, obviously, we don't build pegboards for every experiment.

No, that would be exhausting.

Right.

We use calculus to find that point of minimum tension.

If you want to find the very bottom of a valley in a mathematical function, you take the derivative and set it to zero.

And the brilliant part is that the calculus only has to be done once by the mathematicians.

Thank goodness.

Right.

Taking the derivative of our chi squared function with respect to the intercept a and then again with respect to the slope b.

That leaves us with two algebraic equations.

Universally known as the normal equations.

You simply plug in the raw data from your experiment.

So like the sum of all your x values, the sum of your y values.

The sum of x squared and the sum of x multiplied by y.

You crunch those sums and out pops the mathematically perfect slope and intercept.

It's like magic.

Like consider a classic undergraduate lab experiment calibrating a spring balance.

Hooke's law.

Exactly.

Hooke's law tells us that the length of a loaded spring is linearly related to the mass hanging from it.

Length is our messy y.

Mass is our perfectly known x.

Right.

So a student hangs five different pristine weights, carefully measures the slightly fluctuating length of the bouncing spring five times, and feeds those numbers into the normal equations.

And suddenly a cloud of raw trial data snaps into a highly predictable linear equation.

It's incredibly satisfying.

But in physics, an equation without a margin of error is a dangerous thing.

Oh, absolutely.

Finding the best fit line is only the first half of the battle.

The next critical maneuver is quantifying our doubt.

Right.

We have to figure out the uncertainty in our original y measurements, which is denoted as sigma y.

Now the formula for this uncertainty looks incredibly similar to a standard deviation.

It does.

You take the square root of the sum of those square deviations we were just talking about.

But there's a massive conceptual trap hidden in the denominator of this formula.

The n minus two.

Yes.

When I first encountered it, it completely threw me.

Instead of dividing by n, which is the total number of measurements you took, you divide by n minus two.

Wait a second.

If I run an experiment and only take two measurements, my denominator becomes two minus two, which is zero.

And you can't divide by z.

The universe implodes.

Yeah.

Why does the math force this n minus two condition on us?

It's a great question.

You're heading on the core problem of perspective in statistics.

It all comes down to a concept called degrees of freedom.

Degrees of freedom.

Okay.

Think about the geometry of what you just described.

If you only have two data points on a graph, you can draw a perfectly straight line that passes exactly through the dead center of both of them.

Two points to find a line that's just basic high school geometry.

Exactly.

But does drawing that perfect line mean your experiment was flawless?

I mean, probably not.

Right.

Does it mean your VA measurements have zero uncertainty?

Absolutely not.

It simply means you lack the perspective to see your own mistakes.

Oh, wow.

It's like asking two people what the best movie of the year is, getting two different answers, and having no third person to act as a tiebreaker.

You have no independent data to reveal the scatter.

Precisely.

You have to account for the information you already spent.

So the math is literally punishing you for having too little data.

It is.

To calculate the uncertainty in way, you first had to establish the line itself.

You had to mathematically estimate two distinct parameters, the intercept A and the slope B.

Right.

Finding those two parameters essentially uses up two independent pieces of information from your data pool.

I see.

Therefore, the number of independent ways your data can physically vary around the line, your remaining degrees of freedom to actually observe the scatter, is your total number of measurements n minus the two parameters you just locked down.

Hence, n minus 2.

Exactly, n minus 2.

So the moment you get that third data point, n becomes 3, the denominator becomes 1, the mass works, and you finally see how far off your perfect line you actually are.

Exactly.

Okay, so we find our uncertainty in y,

but the messy reality of the lab doesn't stop there.

Of course not.

Because that scatter in our y values naturally infects our confidence in the line itself.

Oh, right.

Uncertainty propagates.

It does.

Because the individual y points are fuzzy, the constants A and B that we calculated from them must also be fuzzy.

Yeah, they can.

There are specific error propagation formulas for sigma A and sigma B.

Conceptually, it just means that the wider the vertical spread in your data points, the less certain you are about exactly where the line crosses the axis.

And how steep that slope truly is.

Right.

You might have found the best line, but you have a wide margin of error on whether it's the true line.

Okay, I want to push back on the fundamental premise we started with, though.

Okay, go for it.

We built this whole mathematical fortress on the assumption that x, our independent variable, was perfectly precise.

We assumed the laser stopwatch never fails.

Well, we did assume that.

But what if my stopwatch is actually a cheap plastic timer from a cereal box?

Oh.

Right.

What if I have significant unavoidable uncertainty in both my x and y variables?

Does the whole least squares method just fall apart?

It doesn't fall apart, but it requires a brilliant geometric workaround to prevent the calculus from becoming unsolvable.

Because juggling independent errors on both axes simultaneously sounds like a nightmare.

It is a nightmare.

So instead of dealing with both, we use a trick to convert the error in x, let's call it delta x, into an equivalent error in y.

Let's visualize how that conversion actually works.

Picture a steep diagonal line on your graph.

If your hand slips and you make a mistake in measuring your x value, you're essentially shifting your data point horizontally to the left or right, away from the true line.

Okay.

But critically, that exact same distance away from the line could have been achieved by making a mistake in y, shifting the point vertically up or down.

Oh.

So the steeper the slope of the line, the more a tiny horizontal nudge pushes you vertically away from the line.

That is the exact mathematical relationship.

To convert your horizontal x error into a vertical y error, you simply multiply the uncertainty in x by the steepness of the line.

Which is d over dx.

Or just your constant b.

If the line is almost a flat horizontal wall,

and error in x barely matters at all, the equivalent y error is tiny.

Right.

If the line is practically a sheer cliff face, a microscopic error in x translates to a massive equivalent error in y.

That is incredibly clever.

So now we have our original y uncertainty, and we have this new converted x to y uncertainty.

We just add them together to get our total massive margin of error.

We combine them, but not by simple addition.

We combine them in quadrature.

Right, quadrature.

The Pythagorean theorem of statistics.

You know it.

You square the original y error, you square the converted x error, add those squares together, and take the square root of the result.

But why do we go through that specific geometric dance instead of just stacking them?

Because we assume the errors are independent and random.

If you accidentally hit the stopwatch too early on the x -axis, you might have also accidentally read the marker too low on the y -axis.

Oh, I see.

Those two independent errors might actually push your data point closer to the true line, effectively canceling each other out.

Right.

Quadrature accounts for the probabilistic reality that random errors don't always conspire to create the absolute worst -case scenario.

It gives you a more realistic, slightly smaller combined uncertainty.

This equivalent error trick brings a seemingly impossible two -dimensional problem back into the solvable realm of a single axis.

It's brilliant.

It really is.

Yeah.

Now, let's take all of this heavy machinery and apply it to a real high -stakes experiment.

Let's go back to that quest for absolute zero using a constant volume gas thermometer.

This is a beautiful demonstration of least squares in action.

The textbook walks through it perfectly.

The setup is straightforward.

A student places a flask of gas into different temperature baths, ranging from boiling water down to freezing mixtures.

Right.

At five different known temperatures, the student measures the pressure of the gas.

The underlying physical law states that temperature is a linear function of pressure.

T equals A plus B times P.

And the stakes here are incredible.

We aren't just looking for the slope.

We're hunting for the intercept A.

Right.

The theory dictates that if you could cool the gas until the pressure drops completely to zero, that temperature intercept would be absolute zero.

Following our rules, we treat pressure as our X variable, assuming it has negligible uncertainty, and temperature as our Y variable, where the thermometer's uncertainty lives.

Makes sense.

The student's five measurements range from roughly 65 to 105 millimeters of mercury for pressure and negative 20 to 127 degrees Celsius for temperature.

So the student crunches the raw data.

They sum up the pressures, the squares of the pressures, the pressure multiplied by the temperature.

The numbers get quite large.

They do.

But they feed them into our trusty normal equations for the intercept.

And the math grinds through the data and spits out A equals negative 263 .35 degrees Celsius.

Which is a stunningly good result for a simple lab setup.

Truly.

The universally accepted true value of absolute zero is negative 273 .15 degrees Celsius.

Our student is less than 10 degrees away.

That's amazing.

But the analysis isn't over.

We must calculate the propagated uncertainty in that intercept to see if our prediction is statistically sound.

Right.

When the student runs the error propagation formulas for sigma A, the result is an uncertainty of 18 degrees Celsius.

So the rigorously reported answer is negative 263 plus or minus 18 degrees Celsius.

Exactly.

And if you look closely, that plus or minus 18 degree window neatly envelops the true value of negative 273 .15.

The experiment is a success.

It is.

But looking at these numbers, a margin of error of 18 degrees feels aggressively large.

It does.

The student was using a decent thermometer that was probably only uncertain by a degree or two during the actual measurements.

Where did this massive cloud of doubt come from?

You're witnessing the extreme danger of extrapolation.

Extrapolation.

Look at the range of the raw data again.

The student's lowest physically measured temperature was negative 20 degrees.

But they're using that localized cluster of data to predict an event at negative 263 degrees.

Far off the edge of their graph.

Exactly.

It's exactly like trying to aim a laser pointer at a distant target.

Oh, that's a perfect analogy.

If you're pointing at a spot on a piece of paper, like one inch from your hand, a tiny tremor in your fingers barely moves the dot.

Right.

But if you're trying to hit a specific break on a wall across a massive football field size auditorium.

That exact same microscopic tremor at the source is magnified into a sweeping, chaotic 20 -foot swing of the laser dot on the far wall.

The geometry of extrapolation magnifies error mercilessly.

Even if the calculated slope of your best fit line has a remarkably tiny uncertainty, projecting that angle over a huge distance acts like a lever arm.

A lever arm, yeah.

A slight wiggle at the fulcrum becomes a giant sweeping arc at the end of the lever, blowing up your uncertainty about exactly where the line will eventually cross the axis.

That is a harsh lesson in the limits of prediction.

It really is.

Okay, so we have thoroughly conquered straight lines.

We know how to find them, doubt them, and extrapolate them.

But what if the natural world simply refuses to cooperate?

As it often does.

What if the relationship between our variables is undeniably curved?

Well, the physical world is full of curves.

The good news is that the foundational logic we just spent all this time building the maximum likelihood principle and minimizing the squared deviations applies perfectly to nonlinear relationships.

Yeah.

Take a polynomial, for example.

If you track the height of a falling body over time, gravity dictates it's a quadratic function.

y equals a plus bx plus cx squared.

So instead of just intercepting slope, we have a third constant to worry about, c.

Exactly.

The objective remains identical.

You want to minimize the chi -squared sum.

The calculus just gets a bit more tedious.

I can imagine.

You take the derivative with respect to all three constants, which leaves you with three simultaneous normal equations to solve instead of two.

The algebra is heavier, but the philosophy is untouched.

But there is another type of curve that shows up constantly in physics and biology, and it is a totally different beast.

Ah, the exponential function.

Yes.

We're talking about radioactive isotopes decaying, capacitors draining their charge, or populations of bacteria multiplying.

Very common.

The formula usually looks like y equals a times e to the power of bx.

You try to run our standard least squares calculus directly on that, the math grinds to a halt.

It generates equations that simply cannot be solved analytically.

You get stuck.

So what do we do?

Well, physicists are fundamentally pragmatic problem solvers.

If the math of the curve is too hard, we use a technique called linearization to force the curve to behave like a straight line.

You essentially warp the graph paper itself.

Mathematically, you take the natural logarithm of both sides of the equation.

So y equals ae to the bx transforms into natural log of y equals natural log of a plus bx.

And just like magic,

if you look at the structure of that new equation, it's a straight line.

Exactly.

If you define a brand new variable, say z, where z is the natural log of y, your equation is just z equals a constant plus bx.

You can immediately plug this into the exact same normal equations we used for the spring balance.

So if you're tracking a population of bacteria that's dying off exponentially, you don't plot the raw bacteria count.

You take the natural logarithm of every single count, plot those log numbers against time, and you will get a beautiful, easily solvable straight line.

It's an incredibly elegant and frequently used workaround.

But there's always a but.

You have to be very careful because there is a severe statistical trap waiting for you.

I had a feeling the magic trick came with a cost.

What is it?

Think back to the very first assumption we made about least squares fitting.

We assumed that all of our city measurements had roughly the same amount of uncertainty.

They shared a uniform sigma y.

Right.

The scatter was equally bad across the whole line.

But when you take the natural logarithm of your c values, you are fundamentally warping the mathematical scale.

A uniform uncertainty in your raw data does not translate into uniform uncertainty in your logarithmic data.

Wait, really?

Yeah.

Large values compress aggressively under a logarithm, while small values stretch out.

Oh, I see it.

Imagine drawing your data points on a sheet of rubber, and you draw thick error bars on every point.

Okay, good visual.

If you grab the edges of the rubber and stretch it unevenly to straighten out the curve, which is what the logarithm is doing, you're stretching and squishing those drawn -on error bars too.

Exactly.

The error bars at the top of the graph might get squashed tiny, while the ones at the bottom get stretched out massively.

I'm literally destroying the fundamental assumption that all points are equally uncertain.

You are.

Applying standard, unweighted least squares to linearized data is technically a mathematical violation.

A violation.

To do it perfectly, you would need to use an advanced algorithm called weighted least squares, where the math assigns different levels of importance to each point based on how severely its specific uncertainty was warped by the logarithm.

So is the simple linearization trick we just learned completely useless?

Should we just throw it out?

Not at all.

In everyday laboratory practice, especially when the fractional uncertainties in your raw data are relatively small and consistent.

Like in the bacteria example from the text.

Right.

Performing an ordinary, unweighted fit on the linearized data is an accepted, highly practical compromise.

It provides an unambiguous, straightforward way to get a very reasonable estimate for your decay constants or growth rates.

It's the classic tension between perfect mathematical purity and the pragmatic need to get a working answer before the lab session ends.

Well said.

We've journeyed through an incredible mathematical landscape today.

We really covered a lot of ground.

We started by abandoning the safety of single measurements to tackle the complex relationships between interacting variables.

Right.

We learned how to harness Gaussian probability to find the best fit straight line by minimizing the sum of squared errors.

We saw how the concept of degrees of freedom, that vital n minus two denominator.

Super important.

Prevents us from having false confidence in a line drawn with too little perspective.

And we explored the geometric trick of equivalent errors, allowing us to survive the reality of flawed measurements on both axes.

And we combined those doubts using the probabilistic balancing act of quadrature.

Yep.

We lived through the terror of extrapolation in the absolute zero experiment, proving mathematically why a tiny tremor at the source becomes a massive swing in uncertainty over long distance.

That lever arm.

And finally, we learned how to beat an exponential curve into a straight line while acknowledging the hidden cost of warping our own error bars.

It is a profound and durable set of conceptual tools.

But I actually want to leave you with one final thought to mull over.

Hate on me.

Something that connects the chalkboards of classic physics to the cutting edge of the modern world.

OK.

I'm intrigued.

The foundational engine of everything we just learned.

The mathematical drive to minimize the sum of squared errors to find a predictive best fit.

Yeah.

It's not confined to calibrating spring scales or finding absolute zero.

That exact same mathematical logic is the beating heart inside modern machine learning algorithms.

Wait, really?

Yeah.

When an artificial intelligence model is trained to recognize a pattern or to draw a predictive trend line through billions of scraped data points,

fundamentally under the hood, it is executing the calculus of least squares fitting.

That is absolutely mind blowing.

By mastering how to find truth in a messy college physics lab, you've actually just learned the foundational language of AI.

The same geometry that tells us about gravity is teaching neural networks how to think.

That's incredible.

You've got your conceptual toolkit ready for whatever data the universe throws at you.

On behalf of the last minute lecture team, thank you for joining us.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Least-squares fitting provides the mathematical framework for extracting optimal relationships between experimental variables while simultaneously quantifying the precision of those estimates. Beginning with linear regression, the method seeks to identify a line of the form y equals A plus Bx that minimizes the sum of squared vertical deviations from measured data points, grounded in the maximum likelihood principle under the assumption that uncertainty concentrates in the dependent variable. Solving the resulting normal equations yields explicit formulas for both slope and intercept. A crucial aspect involves estimating measurement uncertainty directly from residual scatter around the fitted line, with the denominator using N minus two rather than N to account for the two parameters already extracted from the data, reflecting the concept of degrees of freedom. Once baseline uncertainty is determined, error propagation techniques quantify individual uncertainties for each fitted parameter. The framework extends naturally to polynomial relationships, where additional parameters require proportionally more simultaneous equations, and to exponential relationships through logarithmic transformation, though such linearization implicitly changes uncertainty weighting and may necessitate explicit weighted methods. Multiple regression addresses situations where a single variable depends linearly on several independent predictors simultaneously. The treatment also covers specialized scenarios including fits constrained to pass through the origin, eliminating the need to determine an intercept, and weighted least-squares methods where unequal measurement uncertainties are accommodated by assigning each data point a weight inversely proportional to its variance. Throughout the chapter, the dual benefit of least-squares analysis emerges clearly: it produces parameter estimates that are statistically optimal under the assumed conditions, and it generates quantifiable uncertainty bounds for those estimates, transforming raw experimental observations into defensible physical relationships supported by rigorous statistical reasoning.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 8: Least-Squares Fitting

Related Chapters