If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Introduction to residuals and least squares regression

Introduction to residuals and least squares regression.

Want to join the conversation?

  • duskpin seed style avatar for user AliciaKay
    I am so confused. Which one is the actual y value and which one is the predicted y value ?? Why is 100 the actual value ? And also, at , how did he get that point ?? He literally just said the predicted value was right there, but he did not even explain how he got it...
    (5 votes)
    Default Khan Academy avatar avatar for user
    • aqualine ultimate style avatar for user jane smith
      100 is the actual weight because he measured someone who was 60" tall and that person weighed 100 pounds. He plotted that on the graph at (60,100). He created a line, by “eyeballing” the data points for what looked like a best fit for the data. He used that diagonal line to predict a person’s height from their given weight. Using the line, a person who is 60” is predicted to weigh 150 pounds. You can find that by drawing a line straight up from the x-axis at 60 and see where it meets the diagonal line. Draw a horizontal line from that point to the y-axis and you can read the y value, which is the weight predicted by using the line.
      (8 votes)
  • blobby green style avatar for user qicai1995
    Is residual same as variance in machine learning?
    (4 votes)
    Default Khan Academy avatar avatar for user
  • aqualine ultimate style avatar for user Rohan Suri
    Since sum of squared residuals is more sensitive to outliers (as squaring assigns greater proportion of the sum to the outlier), why is sum of absolute residuals used less in regression?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      The sum of squared residuals is used more often than the sum of absolute residuals because squaring the residuals gives more weight to outliers, making the method more sensitive to extreme data points. This sensitivity to outliers can be advantageous in certain cases as it helps to identify and account for significant deviations from the regression line, providing a more robust model.
      (1 vote)
  • blobby green style avatar for user castro, jackie
    this confused me even more.
    (2 votes)
    Default Khan Academy avatar avatar for user
  • leaf green style avatar for user Nair Tarun
    So we have to find the predicted value and the we use the actual to get our residual
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      Yes, to calculate the residual for a data point, you first find the predicted value using the regression line equation (y = mx + b), substituting the corresponding value of x. Then, subtract the actual observed value of y from the predicted value to obtain the residual. If the actual value is above the line, the residual is positive; if it's below the line, the residual is negative.
      (2 votes)
  • primosaur seedling style avatar for user Byson Burt
    so what is the easiest way of doing this and understanding because the way my math teacher explained, its hard.
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      The easiest way to understand linear regression is to grasp the concept of fitting a line to a scatterplot in a way that summarizes the relationship between two variables. Understanding the slope-intercept form of a linear equation (y = mx + b), where m is the slope and b is the y-intercept, is crucial. Then, comprehend how the line minimizes the differences (residuals) between the actual data points and the predicted values from the line.
      (1 vote)
  • duskpin tree style avatar for user calculator
    Is this pretty much finding slope y=mx+b
    (1 vote)
    Default Khan Academy avatar avatar for user
  • aqualine sapling style avatar for user Nikhita Biju
    That was kind of confusing all I had to understand was how could you solve it when the actual number is above the line.
    (1 vote)
    Default Khan Academy avatar avatar for user
  • starky ultimate style avatar for user SrikarC
    What is the purpose of these residuals?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user daniella
      The purpose of residuals in linear regression is to measure the discrepancy between the observed values of the dependent variable and the values predicted by the regression model. Residuals help assess how well the model fits the data points and identify any patterns or trends that the model might not capture effectively
      (1 vote)
  • male robot hal style avatar for user mynames
    I just wanted to ask this question - Can't we use least squares approximation from linear algebra to find the line of best fit in this case?
    (0 votes)
    Default Khan Academy avatar avatar for user

Video transcript

- [Narrator] Though I'm interested in finding the relationship between people's height in inches and their weight in pounds. And so I'm randomly sampling a bunch of people measuring their heights, measuring their weight and then for each person I'm plotting a point that represents their height and weight combination. So for example let's say I measure someone who is 60 inches tall, that'll be about five feet tall and they weigh 100 pounds. And so I'd go to 60 inches and then 100 pounds. Right over there so that point right over there is the point 60 comma, 60 comma, 100. One way to think about it, height we could say is being measured on our X axis or plotted along our X axis and then weight along our Y axis. And though this point from this person is the 0.60, 100 representing 60 inches, 100 pounds. And so so I've done it for one, two, three, four five, six, seven, eight, nine people and I could keep going but even with this I could say, well look, it looks like there's a roughly linear relationship here. It looks like it's positive, that generally speaking as height increases so does weight. Maybe I could try to put a line that can approximate this trend. Let me try to do that so this is my line tool. I could think about a bunch of lines. Something like this seems like it would be, you'd be, most of the data is below the line so that seems like it's not right. I could do something like, I could do something like this but that doesn't seem like a good fit. Most the data seems to be above the line. And so once again I'm just eyeballing it here, in the future you will learn better methods of finding a better fit. But that's something like this and I'm just eyeballing it looks about right. So that line, you could view this as a regression line. We could view this as y equals mx plus b. Where we would have to figure out the slope and the Y intercept and we could figure it out based on what I just drew or we could even think of this as weight. Weight is equal to our slope times height. Times height plus whatever our Y intercept is, if you think of the vertical axis as the weight axis you could think of it as your weight intercept. But either way this is the model that I'm just through eyeballing, this is my regression line. Something that I'm trying to fit to these points. But clearly it can't go through, one line won't be able to go to all of these points. There's going to be for each point some difference or not for all of them but for many of them, some difference between the actual and what would have been predicted by the line. And that idea, the difference between the actual four point and what would have been predicted, given say the height that is called a residual. Gonna write that down. The A residual for each of these data points. And so for example if I call this right here, if I call that point one, the residual for point one. Is going to be well, for our variable, for our height variable 60 inches. The actual here is 100 pounds. From that we would subtract what would be predicted. So what would be predicted is right over here. I could just substitute 60 into this equation so it would be M times 60 plus b. So I could write it as M, maybe let me write it this way, 60 M plus B. Once again I would just take the 60 pounds and put it into my model here and say, well what weight would that have predicted. And I can even, just for the sake of having a number here. I can look, I can, let me get my line tool out. And try to get a straight line from that point. So from this point let me get a straight line. So that doesn't look quite straight, okay a little bit, okay. Okay so looks like it's about 150 pounds. So my model would have predicted 150 pounds. So the residual here is going to be equal to negative 50. So a negative residual is when your actual is below your predicted. So this right over here. This is our one, it is a negative residual. If you had, if you tried to find, let's say this residual right over here, for this point. This r2, this would be a positive residual because the actual is larger than what would have actually been predicted. And so a residual is good for saying, well how good does your line, does your regression, does your model fit a given data point or how does a given data point compare to that. What you probably want to do is think about some combination of all the residuals and try to minimize it. Now you might say, well why don't I just add up all the residuals and try to minimize that. But that gets tricky because some are positive and some are negative and so a big negative residual could counterbalance the big positive residual and it would look, it would add up to zero and then it look like there's no residual so you could just add up the absolute values. So you could say, well let me just take the sum of all of the residual, of the absolute value of all the residuals. And then let me change M and B for my line to minimize this and that would be a technique of trying to create a regression line. But another way to do it and this is actually the most typical way that you will see in statistics is that people take the sum of the squares of the residuals. The sum of the squares and when you square something whether it's negative or positive, it's going to be a positive so it takes care of that issue of negatives and positives canceling out with each other. And when you square a number, things with large residuals are gonna become even larger, relatively speaking. You know, if you square a large, you know one is, if you think about this way, let me put regular numbers, one, two, three, four. These are all one apart from each other but if I were to square them, one, four, nine, 16, they get further and further apart and so something, the larger the residual is when you square it, when the sum of squares is going to represent a bigger proportion of the sum. And so what we'll see in future videos is that there is a technique called least squares regression. Least squares regression. Where you can find an M and a B for a given set of data so it minimizes the sum of the squares of the residual. And that's valuable and the reason why this is used most is it really tries to take in account things that are significant outliers. Things that sit from pretty far away from the model, something like this is going to really, with a least squares regression. It's going to try to be minimized or it's going to be weighted a little bit heavier because when you square it becomes even a bigger factor in this. But this is just a conceptual introduction. In future videos we'll do things like calculate residuals. And we'll actually derive the formula for how do you figure out an M and a B for a line that actually minimizes the sum of the squares of the residuals.