If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Least squares approximation

The least squares approximation for otherwise unsolvable equations. Created by Sal Khan.

## Want to join the conversation?

• I understand how Khal arrived at the final equation A(t)Ax* = A(t)b. Then, at Khal says that Ax = b does not have the same solution as A(t)Ax = A(t)b. I know that the former equation does not have a solution while the latter one does. However, isn't A(t)A = A(t)A the same as A = A as we could just remove A(t) from both sides of the equation? • The premise here is that A(-1) does not exist (otherwise, the solution would simply be x = A(-1) b). So, (A(T))(-1) doesn't exist, either; because, (A(T))(-1) == (A(-1))(T). So, it isn't possible to left-multiply both sides of "A(T) A x* = A(T) b" by (A(T))(-1) to get back to "A x* = b".

Basically, the effect of A is to map vectors in the original N-dimensional space to a smaller K-dimensional subspace. That's obviously a many-to-one operation, with no unambiguous way to reverse; information is lost. The "trick" here is to squash the vector on the right-hand side of the equation down to the same smaller subspace spanned by matrix A, and then solve this smaller, fully-determined problem.

In a nutshell, it's a bit like having equations "1x + 0y = 1; 0x + 0y = 2;", which cannot be solved, and then multiplying both sides by A(T), leaving us with: "1x + 0y = 1; 0x + 0y = 0;". Voila! Troublesome equation reduced to trivial "0 = 0", and we can find x=1. However, this destructive operation is obviously irreversible. There's no getting back "0x + 0y = 2" from "0 = 0".
• I still don't get why this method is minimizing the squares of the distance. Isn't it just minimizing the distance without squares? Sal says in the video, we want to minimize ||b-Ax||, so why do I need to square it? • Where does the orthogonal complement of C(A) = N(A) transpose come from? • How come A-transpose multiplied by (Ax-b) equal to zero vector? This statement comes after the part where you state that Ax-b belongs to Null space of A-transpose. • What if instead of trying to minimize || b-Ax* || we tried to minimize b-Ax* itself?, I mean, if b and Ax* where to be equal both equations would be zero, so it should not matter which one we minimize. Maybe || b-Ax* || it's easier to minimize than b-Ax*, but I am not sure. • If we were to minimize b-Ax*, that would mean minimizing a vector instead of the vector magnitude. But what does it mean to minimize the vector? Make it as close to the zero vector, I presume. Therefore, it would make more sense to minimize in terms of the distance from the zero vector (the origin): which happens to be ||b-Ax*||
• Does this imply that if there is no solution, A' isn't invertible? Otherwise, if you left-multiplied by the inverse of A', you would get Ax = b. • At , he refers to the nullspace of the transpose of A. Could someone tell me which videos introduce this concept? • At Sal subtracts b from both sides of the equation Ax* = proj of b onto C(A). This makes sense algebraically but I don't understand how it makes sense spatially. It seems to me that b - Ax* would be equal to the vertical component of b(which would also be orthogonal to C(A) and is the thing we are trying to minimize). It doesn't make sense to me that instead, substracting a vector from its horizontal component(in other words Ax* - b) would be equal to its vertical component. • Vertical and horizontal are a bit relative, but i think I get what you're saying.

It might help to think of Ax*-b as being -(b-Ax*) So b-Ax* can be seen as the vector with its tail at the head of Ax* and then points to the head of b, meanwhile Ax*-b is the opposite direction but the same vector otherwise.

You can always think of vector subtraction as connecting the tip of one to the other. So in Ax*-b we go fromt he tip of b to the tip of Ax* It is still orthogonal to Ax*

If that doesn't make sense let me know maybe a general example can help.

say you have two vectors in R2 A = <a1, a2> and B = <b1, b2>. now let's subtract one from the other. first A-B. This gets us <a1-b1, a2-b2> NOW what if we add B + A-B. You kinda don't have to even do vector addition if you just add the variables. B+A-B = A So that means if you go from the tip of B and go the distance and direction of vector A-B then you get the vector A. Similarly you can get the same vector but in the opposite direction if you add A + B-A, B-A goes from the tip of A to the tip of B.

I hope this all helped.
(1 vote)
• Is this linear regression by least square approximation in the case of R2? • I'm a little confused about the logic of this...
Ax*=(projection of B on to the column space of A)
Multiplying both sides by A(t) yields A(t)Ax*=A(t)(projection of B onto the column space of A)
But we also know from the video that A(t)Ax*=A(t)b
That means that A(t)b=A(t)(projection of B onto the column space of A)
Is this right? 