If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Proof (part 3) minimizing squared error to regression line

Proof (Part 3) Minimizing Squared Error to Regression Line. Created by Sal Khan.

## Want to join the conversation?

• Why can't you divide "mean of x^2" with the "mean of x" like normal "x^2/x" and end up with just "mean of x"? • In notation, the mean of x is: `xbar = Σ(xi) / n`
That is: we add up all the numbers `xi`, and divide by how many there are.

But the "mean of x^2" is not the square of the mean of x. We square each value, then add them up, and then divide by how many there are. Let's call it x2bar: `x2bar = Σ(xi^2) / n`

Now, x2bar is not the same as xbar^2. The reason for this is because we're squaring, and then adding up. Those two operations are not interchangeable: the "sum of the squares" is not equal to the "square of the sum". You can try it out with a really small exercise in algebra. Take two numbers `a` and `b`, and check whether `(a + b)^2` is equal to `(a^2 + b^2)`. The second expression is pretty simple - we just have the square of `a` and the square of `b`. The first expression needs to be expanded first:
``(a + b)^2 = (a + b)*(a + b)(a + b)^2 = a^2 + 2*a*b + b^2``

Now compare that to `(a^2 + b^2)`. We have the two squared terms, but we also have that `2*a*b` term, or the "crossproduct". That crossproduct is what makes the "sum of the squares" and the "square of the sum" not be equal, and hence why "mean of x^2" divided by "mean of x" doesn't give us "mean of x".
• Intuitively it makes sense that there would only be one best fit line. But isn't it true that the idea of setting the partial derivatives equal to zero with respect to m and b would only locate a REGIONAL minimum in the 3D "bowl." There could be other minima present with partial derivatives both equal to zero. Correct? And who's to say which minima is the one minimum, if it exists?

Also, intinutively, there is no maximum for the best fit line, but the partial derivates would equal zero at a maximum point in the 3D surface as well, right? • Under the assumptions of linear regression, that won't happen. The "loss function" (that is, how we measure the closeness of the predictions, in this case the sum of squared residuals) is convex, so the surface won't be bumpy like you're envisioning. It will be a smooth curve.

And yes, for any local maximum or local minimum, the derivative will be zero.
• The 3D surface is explained to be parabolic. Who do we know that it's going to be parabolic? Why not any other 3 dimensional surface like the cylinder or conical? • Thanks for the extremely helpful video series. At Sal is dividing the equation by mean of x. What happens when mean of x's is zero. Is the derivation/formula valid is such cases? • Likewise, at , how did Sal take the partial derivative of nb^2 and get 2nb (or 2bn)? Also, I thought the object here was to factor out the b? • I don't understand the part at . xy-bar does not equal x-bar * y-bar. How is he dividing out an x-bar to obtain this y-bar? • at how did he get 2bn , shouldn't it be just bn by taking out one b • Where can I find more information regarding the surface that Sal drew at the beginning? • Why is the derivative is used. What does that mean?  