If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Calculating the equation of a regression line

Calculating the equation of a least-squares regression line. Intuition for why this equation makes sense.

Want to join the conversation?

• What video is he referring to in the beginning?
• Why for a least-squares regression line I'm definitely going to have the sample mean of x and y on the line?
• At ,why regeression line must go through the point (mean of x,mean of y)?
• Why do we not use x hat in the equation of the least regression line?
y hat = m (x) + b?
• A hat over a variable in statistics means that it is a predicted value. In general, the explanatory variable is on the x-axis and the response variable is on the y-axis. The response variable can be predicted based on the explanatory variable. The response variable is not exact, while the explanatory variable is exact. This is why the response variable (y) is written with a hat.
• Why is r always between -1 and 1?

I know that this question has been asked before but the answers are either too technical or too naive. Could someone please provide an answer that is mathematical in nature but can be understood by someone who have ok but not strong mathematical foundation.

• The number and the sign are talking about two different things. If the scatterplot dots fit the line exactly, they will have a correlation of 100% and therefore an r value of 1.00 However, r may be positive or negative depending on the slope of the "line of best fit". So, a scatterplot with points that are halfway between random and a perfect line (with slope 1) would have an r of 0.50, because there is a 50% correlation and because the slope is positive.
• In later videos we see another formula for calculating m, which is m = (X_bar*Y_bar - XY_bar) / X_bar^2 - X^2_bar, which is derived by taking the partial derivatives of the square errors function with respect to m and b. and here we see another formula m = r*Sy/Sx. can someone please say if there is any relationship between these two?
• If r = 0 then slope is 0, then how can line pass through
Y mean not y-intercept?
• For those who don't get it.

Goal is to find regression line that best fits the data point. He shows formula to get the correlation coefficient, but they have already done all the calculation to get the best correlation coefficient. They have also provided x,y mean and stddev.

Now the way they derive the y=mx+b.

First they use the Xmean and Ymean as reference. The Ymean is NOT the y intercept. And then he draws 1 stddev lines for x and y axis. Then he shows that rise over run, which is slope, is equal to Sy/Sx. But the r also factors into this calculation. Therefore m = r*Sy/Sx. But we still have to find y intercept.

We know for a fact that for the regression line function, we have Xmean and Ymean as part of its points or at its intersection. So we substitute the m, Xmean, Ymean, and then get Y intercept.

Honestly it's pretty smart. Wouldn't have thought about it and was going to skip this video. But glad I spent time to understand it.

This has applications in machine learning and AI - FYI.
• I am still quite confused. Why is m=r(Sy/Sx)? I think r is just to measure the strength of the correlation, no? What is r doing in this formula? Thanks for your help in advance!
• Given the spread of x values and the spread of y values, the correlation coefficient still influences the slope of the line of best fit. If the correlation is very weak (r is near 0), then the slope of the line of best fit should be near 0. The more strongly positive the correlation (the more positive r is), the more positive the slope of the line of best fit should be. The more strongly negative the correlation (the more negative r is), the more negative the slope of the line of best fit should be.
• Why is this the least squares regression line. It seems we do not use the least squares anywhere?
• All examples and practice problems have showed simple applications of least square, check them.