If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Proof (part 4) minimizing squared error to regression line

Proof (Part 4) Minimizing Squared Error to Regression Line. Created by Sal Khan.

## Want to join the conversation?

• At , why does Sal decide to subtract one equation from the other? And how is this okay? • For any equation, there are two sides with " = " in between so let's call that
Left Hand Side (L.H.S) = Right Hand Side (R.H.S)
Consider two equations
L.H.S.1 = R.H.S.1 ........................... say equation1
L.H.S.2 = R.H.S.2 ........................... say equation2
We know that we can perform an add/subtract operation on both sides of any equation and the equation still stands valid i.e.
L.H.S.1 + x = R.H.S.1 + x or L.H.S.1 - x = R.H.S.1 - x
Now let's subtract L.H.S.2 (say x = L.H.S.2) in equation1 above
L.H.S.1 - L.H.S.2 = R.H.S.1 - L.H.S.2 ......................... say equation3
But from equation2, L.H.S.2 = R.H.S.2
So we can substitute L.H.S.2 from equation 3 with R.H.S.2 to get
L.H.S.1 - L.H.S.2 = R.H.S.1 - R.H.S.2 ........................ equation4
Now if we observe equation1,equation2 and equation4 we find that
equation4 is just another inference from (equation1 - equation2) So subtraction of equations is quite okay :)
• I calculated M by subtracting the first formula [(mx^2)+bx=xy] from the second (y=mx+b). Which is the opposite of what sal does @. I get a different formula for M, is this OK? I can't equate the two formulas. Here is the formula I get (all the x and y should have the mean sign. M= [(xy/x)-y]/[(x^2/x)-x) 1. I understand that we 'square' the distances to the best fitting line because that will eliminate the negatives. I'm wondering however whether squaring skews the results somehow, so that the points that are furthest from the best fitting line exert more of a force in their direction?

2. The formula sought the minimum vertical distances between points and the best fitting line. Would the same result be achieved if, instead of minimizing the vertical distances, we minimized the absolute distance between the points and the line?

Thank you! • Those are some astute questions.

[1.] Yes and no. The more extreme points will exert a larger influence on the line, but there are some caveats. We have two variables, X and Y, and so points can be out of whack in either the x-direction or the x-direction. Points that are further out in the x-distance will exert a strong pull on the line. There is actually a statistic to measure this called "leverage." Outliers in the y-direction don't impact the regression nearly so much.

[2.] I'm not sure that you stated your question properly. The formula we used (called "Simple Linear Regression") minimizes the squared vertical distances between the points and the line. We could use the absolute value instead, though that would still be looking at the vertical distance.

There is also a type of regression which does not measure vertical distance, it's called Deming Regression. In one special case of this type of regression, instead of vertical distances, we look at distances orthogonal / perpendicular to the regression line.
• A few videos back Sal presented a formula for the least squares regression line where the slope m=r(Sy/Sx), that is, the correlation coefficient times the sample standard deviation in Y divided by the sample standard deviation in X. Is this formula equivalent to the one presented in this video, and if so how does one establish their equivalence? • we have worked out m = ( x^_ y^_ - xy^_ ) / ( x^_ )^2 - ( x^2 ) ^_ . but what if somehow I get the denominator to be zero? does it mean there is a limitation of this 'analytical solution'? thanks
(1 vote) • @ :
I'm having trouble seeing that both P1 = (x bar, y bar) and P2 = ( (x^2 bar / x bar ), (xy bar / x bar) ) are both on the best fit line.
I imagine that any (u, v) such that v = mu + b is on the line y = mx + b; and therefore so are P1 and P2.
The 2 derivatives tell us that (1) y bar = m * x bar + b, and (2) xy bar = m * x^2 bar + b * x bar (or, dividing through by x bar, (3) xy bar / x bar = m * (x^2 bar / x bar) + b). solving (1) and (2) for m and b gives m1 and b1 (in terms of x and y). So (1) and (3) are both true using m1 and b1. Since (1) and (3) are both of the form v = m1u + b1, then P1 and P2 are both on the best fit line. • what if the mean is 0.The second point will cease toexist. • In a previous video on the equation of Regression Line, m is derived from r*Sy/Sx, r being the Correlation Coefficient. Is this a different way of calculating m?   