Chapter 12: More About Regression
Welcome to Last Minute Lecture!
This free chapter overview is designed to help students review and understand key concepts.
These summaries supplement not replaced the original textbook and may not be redistributed or resold.
For complete coverage, always consult the official text.
Have you ever stared at a scatter plot, maybe for your AP Stads class, and just wondered, can I really trust this pattern I'm seeing?
Or maybe the pattern isn't even street.
How do I actually turn that raw data mess into real insight you can count on?
Well, you've definitely come to the right place.
Today we're diving deep into Chapter 12 of the Practice of Statistics.
It's called More About Regression, our mission, to kind of cut through the complex stuff, regression inference, data transformations, and really pull out the core ideas, the vocabulary, the why.
We want you to feel well informed, ready for practical data analysis, and yeah, ready for that AP exam.
That's a great way to put it.
This isn't about just memorizing formulas.
We want to equip you so you understand not just what the tools are, but why they actually matter in the real world, making these kind of abstract stats concepts feel more concrete.
Okay, so let's unpack this then.
We already know if a scatter plot looks pretty linear, think old faithful, eruption duration versus wait time, we use the least squares line from our sample to predict stuff.
But here's the kicker.
What if our data is just like one sample from a much bigger picture?
Exactly, and that's where inference comes in.
It helps answer the big questions like is the linear pattern we see in our sample real?
Does it reflect the whole population?
Or did we just get lucky with this particular sample?
Is it chance?
And if it is real, how much does it typically change when x goes up by one unit in the population?
What's that true slope really?
Right, that's a crucial difference.
Moving from our sample to the population.
So we talk about two lines.
There's the regression line, the true underlying relationship, the ideal one.
We write it weo plus zero plus one x.
Here weo is the mean y for a given x, weo is the true intercept, and weo is that true slope we're after.
That's the theoretical one.
And then there's our sample regression line.
That's the one we actually calculate from our data.
E equals b zero plus b one x is our estimated meaning y b zero is the sample
slope.
And the key idea is simple.
Our sample line, the d zero plus b one x, it's our best guess for that true population line.
We use b one to estimate b one.
And this leads to a really cool concept, the sampling distribution of b one.
Think about old faithful again.
What if we took hundreds, maybe thousands of different random samples of eruptions, calculated the slope b one for each sample?
They wouldn't all be the same, right?
Not at all.
They'd vary, but they'd vary in a predictable way.
If you plot all those different b one values, you'd see a shape.
It would look roughly symmetric, single peaked, kind of like a normal distribution actually.
Okay.
In the center of that distribution, the average of all those sample slopes, it would be really, really close to the true population slope b one.
That tells us b one is what we call an unbiased estimator of b one one.
On average, it hits the target.
Makes sense.
And the spread, how much they wobble.
That's the variability, the standard deviation of all those b ones.
We call it sublieve.
It tells us how much our sample slope typically jumps around the true slope from sample to sample.
And that wobble depends on three things, mainly one, how spread out the actual data points are around the true line.
That's the standard deviation of the residuals.
Two, how spread out your X values are wider range of X less wobble.
And three, your sample size and bigger M less wobble, more stable estimate.
Got it.
More data, more spread and X usually means a better, less wobbly estimate of the true slope.
Okay.
But before we jump into making inferences, we need rules, right?
Conditions.
Absolutely crucial.
Can't skip this.
We use the acronym liner.
Think of it as like a pre -flight check for your regression inference.
Liner.
Okay.
What's the L?
L is for linear.
Is the relationship actually linear in the population?
We check the scatter plot first, obviously.
Does it look straight overall?
But even more important is the residual plot.
You want that to look completely random.
No curves, no megaphone shapes, just random scatter above and below zero.
Like spilled coffee grounds.
No pattern.
Exactly.
No pattern is good pattern here.
Then I for independent.
Are your observations independent?
If you're sampling without replacement, check the 10 % condition sample size, less than 10 % of the population and generally avoid time series data where one point clearly affects the next.
Okay.
L, I, N.
N for normal.
This one's about the residuals again.
For any given X, are the Y values normally distributed around the regression line?
We check this by looking at a histogram of the residuals or maybe a normal probability plot of them.
We're looking for roughly symmetric, no major skewness, no crazy outliers.
Make sense.
E.
E for equal SD or equal variance.
Is the scatter of points around the line roughly the same for all X values?
Back to the residual plot.
You want the vertical spread of the residuals to be pretty consistent across the whole plot.
No fanning out or narrowing in.
Right.
No megaphone shape.
And finally, R.
R for random.
Where did the data come from?
A random sample, a randomized experiment.
This is fundamental for inference.
Without random data collection, our conclusions are, well, much weaker.
Okay.
Liner, linear, independent, normal, equal SD, random.
Gotta check them all.
Let's think about that helicopter experiment example.
They dropped paper helicopters from different heights, randomly assigned, scatter plot looked
residual plot, no curve.
So L looks okay.
Random assignment, helicopters dropped independently.
Eyes is good.
Histogram of residuals.
Didn't show strong skew or outliers and seems reasonable.
Residual plot scatter looked pretty consistent across drop heights.
E is met.
And random assignment covers R.
So liner checks out.
We can proceed.
Perfect.
So if liner holds, we can estimate those population parameters.
Exactly.
Our sample slope B1 estimates the true slow Bay Viewer and B0 estimates Bay 0.
And they're unbiased, which is good.
And that typical prediction error, how far points tend to fall from the line, we estimate that population standard deviation using S, the standard deviation of the residuals.
Think of S as your average missed distance.
Okay.
And what about the wobble in the slope estimate B1?
Ah, yes.
That's where sub one, the standard error of the slope comes in.
It estimates that sub one we talked about.
Sub one tells us roughly how much our calculated sample slope B1 is likely to differ from the true population slope B1 just due to random sampling variation.
And we find that on computer output.
Yep.
Usually right next to the coefficient for the slope.
Super convenient.
Like in the helicopter example, if B1 is 0 .0057 and sub one is 0 .0002, it means we estimate flight time increases by 0 .00057 seconds per centimeters of drop height.
And that estimate typically varies from the true value by about 0 .0002.
You know, this is about individual prediction error.
Sub one is about the error or wobble in our slope estimate.
Precisely.
All right, so we have our estimates B1 and sub one.
How do we build a confidence interval for the slope?
It follows that classic pattern, you know, statistic,
critical value, standard deviation of statistic.
Right.
So for the slope, it's B1, Lenny text T, S, EB1.
Right.
T.
Because we're using S to estimate psi, which introduces a bit more uncertainty.
So we use a T distribution with N2 degrees of freedom.
Remember that N2 for regression.
N2 degrees of freedom.
Got it.
Let's take that.
How much is that truck worth?
Example, Ford F150 prices versus miles driven.
Say we check liner.
It looks OK.
We want a 90 % confidence interval for the true slope ballista one.
The computer output gives us B1 x 0 .1629 and sub one equals 0 .0310 from a sample of N and 16 trucks.
OK, so degrees of freedom is 16 minus two, which is 14.
Perfect.
We find that Eid critical value for 14 degrees of freedom and 90 % confidence.
Let's say it's 1 .761.
Don't we just plug it in?
98 .1629, negative 0 .1 .77, negative 0 .108.
Exactly.
And the interpretation.
We are 90 % confident that the interval from negative 0 .217 to negative 0 .108 captures the true slope of the population regression line relating Ford F150 price to miles driven.
Or maybe more practically.
Yeah, you could say we're 90 % confident that for each additional mile driven, the average price decreases by somewhere between 10 .8 cents and 21 .7 cents.
Nice.
And AP exam tip.
Use the computer output.
Definitely don't waste precious minutes typing in lists if they give you the summary stats.
Use B1, sub one and N directly from the output.
OK, confidence intervals make sense.
What about significance tests for the slope testing a specific claim?
Right.
The null hypothesis usually looks like H0 and null hypothesis usually looks like H0.
BA1 equals zero.
Why zero?
Because if the true slope at one is zero, it means there's no linear relationship between X and Y in the population.
The line is flat.
Changing X doesn't predict any change in the average Y.
So testing 101 equals zero is like testing if there's any linear association at all.
Precisely.
It's equivalent to testing if the population correlation are one is zero.
OK, so how do we test it?
Test statistic.
It's a T statistic, again using N is two degrees of freedom.
The formula is TB1 hypothesized slope, S sub one.
So for H0, F is one, one is zero.
It simplifies to TB1, S, EB1.
You calculate this T value, find the P value using the T distribution with N2DF and compare it to your significance level alpha.
Same logic as other T tests then.
State, plan, do, conclude.
Exactly.
Take the crying in IQ example.
They looked at crying peaks in infancy and later IQ scores for 38 infants.
Let's say they wanted to see if there's evidence of a positive linear relationship.
No,
you'd check liner.
Then find B1 and SEB1 from output.
Calculate T equals B1, S, EB1.
Find the P value for T with 38, 2 equals 36 degrees of freedom, looking only at the upper tail because HA is zero.
And if that P value is small, say 0 .002.
Then you'd reject 8, 0.
You'd conclude there is convincing evidence of a positive linear relationship between crying peaks and IQ scores in the population represented by this sample.
Does that mean crying more causes higher IQ?
Great question.
And the answer is no, not necessarily at all.
This was an observational study.
We found an association, a correlation, but we can't conclude causation.
There could be lurking variables maybe related to parenting style or genetics that influence both crying and IQ.
Super important distinction.
Always remember association is not causation, especially from observational studies.
Okay.
So this is all great if the relationship is linear to begin with.
But what if you make a scatter plot and it's clearly curved?
Yeah, that happens a lot.
You can't just slap a straight line on a curve and expect meaningful results.
Your linear model won't fit well.
So what do we do?
Give up?
Not at all.
Oh, this is where we bring out another powerful tool, transforming data.
Transforming, like changing the numbers.
Sort of.
We change the scale of measurement for one or sometimes both variables.
The goal is to straighten out the curved relationship in the scatter plot.
Once it looks linear on the transformed scale, then we can use all our familiar linear regression techniques.
Fit the line, check residuals, make inferences, but on the transformed data.
Interesting, like putting on special glasses that make the curve look straight.
That's a perfect analogy.
And there are a few common ways to do this.
One approach involves powers and roots.
Powers and roots, like squaring things or taking square roots.
Exactly.
These are often useful when you suspect an underlying power model, something like Y equals ASP.
Where am I you see that?
Think about basic geometry or physics.
Area of a square is side squared.
Area is S2.
Volume of a cube is side cubed V equals S3.
Or like in the go fish example, fish weight is often related to length cubed because weight is kind of like volume 3D and length is 1D.
Ah, okay.
So if weight is proportional to length cubed, plotting weight versus length would be curved.
Right.
But if you suspect way length three, you could try two things.
One, transform the X variable.
Plot Y versus XP.
So plot weight versus length three.
If the model is right, this plot should look pretty linear.
Okay, transform X.
What's the other way?
Transform the A variable.
Plot Y1P versus X.
So plot weight 13, the cube root of weight versus length.
Algebraically, if weight is length three, then weight 13, which is over 13.
That's linear.
Clever.
So you can transform X or transform Y with powers roots.
Yep.
And then you fit your line to the transformed data.
But remember, if you make a prediction, say you predict the cube root of weight.
You have to cube it at the end to get back to the original units of weight.
Right.
Undo the transformation.
Exactly.
Critical final step.
Okay.
Powers and roots.
What else is in the transformation toolbox?
The real heavy hitter, the most versatile tool, is usually transforming with logarithms.
Logarithms.
Okay.
LEN or log base 10.
Either can work.
Often natural log, LN is used.
Logarithms are fantastic for linearizing both power models and exponential models.
Let's start with power models again.
Y equals XP.
How do logs help there?
If you take a log of both sides of Y, XP, using log rules, you get log Y plus log XP.
And log SP is the same as P log X.
So the transformed equation is log Y log A plus P log X.
Wait.
That looks like YB0 plus B1X, where YX equals log X.
The intercept B0 is log X.
And the slope B1B will be.
Right.
Got it.
It's linear in log Y and log X.
So if your original data follows a power model, a plot of log Y versus log X should look straight.
So for the fish example, plot log Y versus log
Precisely.
Fit a line to that.
Then if you predict log weight, you need to undo the log usually by exponentiating, like 10 predicted log weight or E predicted LN weight to get the predicted weight.
Okay.
Logs for power models.
What about exponential models like Y equals AVEX.
Population growth, radioactive decay.
Right.
Where Y changes by a constant factor for each unit increase in X, think Moore's law.
Number of transistors, Y doubling B every couple of years, X.
For these, you take the logarithm of only the Y
If Y AVEX, then log Y plus log E plus X log B.
So log Y is linear with X.
The slope is log B and the intercept is log A.
Exactly.
So for Moore's law, plotting LN transistors versus year or maybe years since 1970 to keep numbers smaller should linearize the data.
And again, predict LN transistors then undo with E to get the transistor count prediction.
You've got the hang of it.
Okay.
This is powerful stuff, but it raises the question.
If my data is curved, which transformation should I choose?
Power, exponential, log this, log that.
That's the practical challenge.
Here's a common strategy.
First, look at the original curved scatterplot.
Does it look like it might fit a power model or an exponential model?
Sometimes theory helps like with the fish weight or Moore's law.
If you're unsure, try both common log transformations.
One, plot log Y versus log X, testing a power model fit.
Two, plot log Y versus X, testing an exponential model fit.
And see which one looks straighter.
Pretty much.
Use your eyes first.
Look at the what's a planet anyway.
Example with Kepler's third law relating orbital period Y to distance from the sun X.
The original plot is very curved.
If you plot LN period versus distance LNY this X, it might still look a bit curved.
But if you plot LN period versus LNY versus LNX, it looks remarkably straight.
That suggests a power model is better.
Okay.
So visual inspection first.
What next?
Once you have one or two candidate transformations that look linear, fit the least squares line to the transformed data, then critically examine the residual plot for that transformed regression.
Ah, back to residuals.
We want random scatter there too.
Absolutely.
Even if the transformed scatter plot looks okay, the residual plot is the ultimate judge.
You want random scatter around zero, no leftover curves or patterns.
The transformation that yields the most patternless residual plot is usually your
What if both transformations look pretty linear and both residual plots look pretty random?
Good question.
Then you can bring in the numbers.
Look at the R squared value and the standard deviation of the residuals S for each transformed model.
Generally, you prefer the model with the higher R squared, explains more variation in the transformed Y, and the smaller S, less prediction error on the transformed scale.
Those often point to the same model.
Makes sense.
Look, check residuals, then maybe check R squared and S.
And don't forget to interpret the residuals from your chosen transformed model.
If you see patterns there, like maybe all residuals for small X values or positive, it tells you something about how your model might be slightly off, maybe under predicting in that region.
Got it.
And technology helps here too, right?
Calculators and software.
Oh, definitely.
As the technology corner notes, calculators can do these transformations, plot the transformed data, run the regression and show you the residuals.
It saves a ton of time in calculation, especially under pressure like on the AP exam.
Okay, great.
So let's try to wrap this up.
Quick recap.
We really dug into inference for regression, understanding how our sample slope relates to the true population slope using confidence intervals and tests.
We stress the importance of the liner conditions, can't build a good model on a shaky foundation.
And then we tackled curve data, unlocking the power of transformations, powers, roots and especially logs to straighten things out so we can use linear models effectively.
Yeah.
And I think the big takeaway and maybe the beauty of it all isn't just the math, it's realizing that statistics gives us these tools to find hidden linear structures within data that looks messy or curved on the surface.
It lets us model complex relationships, make predictions and, you know, understand the world from planets to fish to computer chips just a little bit better.
It's about finding clarity in the complexity.
That's a great way to end it.
Turning chaos into clarity.
Thank you so much for joining us on this deep dive.
This has been a last minute lecture, helping you be well informed one deep dive at a time.
ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.
Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.
Support LML ♥Related Chapters
- About the SAT EssayThe Official SAT Study Guide (2018)
- About the SAT Writing and Language TestThe Official SAT Study Guide (2018)
- Critically Appraising Quantitative Evidence for Clinical Decision MakingEvidence-Based Practice in Nursing & Healthcare: A Guide to Best Practice
- Holistic StatisticsElementary Statistics
- Hypothesis TestingElementary Statistics
- Other Kinds of Writing About LiteratureA Short Guide to Writing about Literature