Main content

## Statistics and probability

### Course: Statistics and probability > Unit 5

Lesson 4: Least-squares regression equations- Introduction to residuals and least squares regression
- Introduction to residuals
- Calculating residual example
- Calculating and interpreting residuals
- Calculating the equation of a regression line
- Calculating the equation of the least-squares line
- Interpreting slope of regression line
- Interpreting y-intercept in regression model
- Interpreting a trend line
- Interpreting slope and y-intercept for linear models

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Calculating residual example

Calculating residual example.

## Want to join the conversation?

- How do we find the residual when there are two y values for one x value?

Thanks,

~HarleyQuinn(5 votes)- Then it wouldn't be a function. Things that aren't functions really tick me off.(1 vote)

- At around3:52, why didn't he add the 1/3 to 52? I guess it's not too big of a difference but wouldn't that make the residual -1 and 1/3?

Thanks,

GoldenDoodle(3 votes)- He already added the 1∕3 to 155∕3 to get 156∕3, which simplifies to 52.(6 votes)

- Where does the whole 1/3 part come in?(4 votes)
- The y-intercept and the slope are 1/3. The general equation for the least squares regression is

^

Y = b + mx.

where b is the why intercept and m is slope.

1/3 itself is just a preset value.(4 votes)

- why did sal put the line right there on the graph I do not understand that(5 votes)
- Where did he get the points for the graph?(2 votes)
- Sal probably had the points before we saw the video. I think he made up these numbers.(4 votes)

- At1:27he said the line is trying to minimize the square between the distance why?(2 votes)
- So that it fits the data best(4 votes)

- @3:27; Why did Sal plug in 155 as the x? Why is it not 51?(2 votes)
- The equation calculates the height of the bike frame, so that means our output (y) would be the bike frame's height and our input (x) would then be the height of the customer. So when we are plugging in a value for x we use 155 because that is the height of our customer while 51 is the height of the frame.(4 votes)

- I'd like an answer as soon as possible, please! ^.^

What if the "actual" numbers are a lot larger, like 12, or 28, or larger? I have a problem like this but when I use the equation given, I get huge numbers like 101 and 429, so when I do y-r (y-value minus residual) I get numbers like -89, which are too large to plot on my graph. What am I doing wrong?(3 votes)- Here's your answer, six years later:(1 vote)

- How do you know which number is the "y" and which is the "x"? Because in this problem, he had a scatterplot which said that the frame size is the "y" axis, and the height is the "x" axis. That is why he knew that 51 is the given "y" and the 155 was the given "x" which he could use to figure out where the data point should have been according to the slope. But what if you don't have a chart that tells you which is which?(2 votes)
- The problem states that the equation predicts the bicycle frame size from the height of the customer,

which means that the height is the independent variable (𝑥) and the frame size is the dependent variable (𝑦).(3 votes)

- How do we know that the bike frame is the actual and not the customer's height.(1 vote)
- Although the customer might grow, at the time the customers height is the same. When buying a bike, normally you'd get a frame that matches your height. So the size of the bike frame you buy depends on your height(5 votes)

## Video transcript

- [Instructor] Vera rents
bicycles to tourists. She recorded the height, in
centimeters, of each customer and the frame size, in centimeters, of the bicycle that customer rented. After plotting her results, Vera noticed that the relationship between the two variables
was fairly linear, so she used the data to calculate the following least
squares regression equation for predicting bicycle frame size from the height of the customer. And this is the equation. So before I even look at this question, let's just think about what she did. So she had a bunch of customers, and she recorded, given
the height of the customer, what size frame that person rented. And so she might've had
something like this, where in the horizontal
axis you have height measured in centimeters, and in the vertical
axis you have frame size that's also measured in centimeters. And so there might've been someone who measures 100 centimeters in height who gets a 25 centimeter frame. I don't know if that's reasonable or not, for you bicycle experts,
but let's just go with it. And so she would've plotted it there. Maybe there was another person
of 100 centimeters in height who got a frame that was slightly larger, and she plotted it there. And then, she did a
least squares regression. And a least squares regression is trying to fit a line to this data. Oftentimes, you would use a
spreadsheet or use a computer. And that line is trying
to minimize the square of the distance between these points. And so the least squares regression, maybe it would look something like this, and this is just a rough estimate of it. It might look something, let me get my ruler tool, it might look something like, it might look something like this. So let me plot it. So this, that would be the line. So our regression line, y-hat, is equal to 1/3 plus 1/3 x. And so this, you could view
this as a way of predicting, or either modeling the relationship or predicting that, hey,
if I get a new person, I could take their height and put as x and figure out what frame
size they're likely to rent. But they ask us, what is
the residual of a customer with a height of 155 centimeters who rents a bike with a 51 centimeter frame? So how do we think about this? Well, the residual is
going to be the difference between what they actually
produce and what the line, what our regression line
would have predicted. So we could say residual,
let me write it this way, residual is going to be actual, actual minus predicted. So if predicted is larger than actual, this is actually going
to be a negative number. If predicted as smaller than actual, this is gonna be a positive number. Well, we know the actual. They tell us that. They tell us that they rent, it's a, the 155 centimeter person rents a bike with a 51 centimeter frame, so this is 51 centimeters. But what is the predicted? Well, that's where we can
use our regression equation that Vera came up with. The predicted, I'll do that in orange, the predicted is going to be equal to 1/3 plus 1/3 times the person's height. Their height is 155. That's the predicted. Y-hat is what our linear
regression predicts or our line predicts. So what is this going to be? This is going to be equal
to 1/3 plus 155 over three, which is equal to 156 over three, which comes out nicely to 52. So the predicted on our line is 52. And so here, so this person is 155, we can plot 'em right over here, 155. They're coming in slightly below the line. So they're coming in slightly
below the line right there, and that distance, which is, and we can see that
they are below the line, so the distance is going to be, or in this case, the residual
is going to be negative. So this is going to be negative one. And so if we were to
zoom in right over here, you can't see it that well, but let me draw it. So if we zoom in, let's say
we were to zoom in the line, and it looks like this. And our data point is right, our data point is right over here. We know we're below the line, and it's just gonna be
a negative residual. And the magnitude of that residual is how far we are below the line. And in this case, it is negative one. And so that is our residual. This is what actual, the actual data minus what was predicted by our regression line.