Main content

## Linear algebra

### Course: Linear algebra > Unit 3

Lesson 2: Orthogonal projections- Projections onto subspaces
- Visualizing a projection onto a plane
- A projection onto a subspace is a linear transformation
- Subspace projection matrix example
- Another example of a projection matrix
- Projection is closest vector in subspace
- Least squares approximation
- Least squares examples
- Another least squares example

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Least squares approximation

The least squares approximation for otherwise unsolvable equations. Created by Sal Khan.

## Want to join the conversation?

- I understand how Khal arrived at the final equation A(t)Ax* = A(t)b. Then, at14:35Khal says that Ax = b does not have the same solution as A(t)Ax = A(t)b. I know that the former equation does not have a solution while the latter one does. However, isn't A(t)A = A(t)A the same as A = A as we could just remove A(t) from both sides of the equation?(8 votes)
- The premise here is that A(-1) does not exist (otherwise, the solution would simply be x = A(-1) b). So, (A(T))(-1) doesn't exist, either; because, (A(T))(-1) == (A(-1))(T). So, it isn't possible to left-multiply both sides of "A(T) A x* = A(T) b" by (A(T))(-1) to get back to "A x* = b".

Basically, the effect of A is to map vectors in the original N-dimensional space to a smaller K-dimensional subspace. That's obviously a many-to-one operation, with no unambiguous way to reverse; information is lost. The "trick" here is to squash the vector on the right-hand side of the equation down to the**same**smaller subspace spanned by matrix A, and then solve this smaller, fully-determined problem.

In a nutshell, it's a bit like having equations "1x + 0y = 1; 0x + 0y = 2;", which cannot be solved, and then multiplying both sides by A(T), leaving us with: "1x + 0y = 1; 0x + 0y = 0;". Voila! Troublesome equation reduced to trivial "0 = 0", and we can find x=1. However, this destructive operation is obviously irreversible. There's no getting back "0x + 0y = 2" from "0 = 0".(18 votes)

- I still don't get why this method is minimizing the squares of the distance. Isn't it just minimizing the distance without squares? Sal says in the video, we want to minimize ||b-Ax||, so why do I need to square it?(4 votes)
- Sal squares it because
`|a|² = a•a`

. Since dot products are easy to calculate and have nice properties which we already understand, the magnitude squared is quicker and easier to find than the actual magnitude.(4 votes)

- Where does the orthogonal complement of C(A) = N(A) transpose come from?(4 votes)
- This is derived in the first video on orthogonal complements, https://www.khanacademy.org/math/linear-algebra/alternate_bases/othogonal_complements/v/linear-algebra-orthogonal-complements

If you already know that the nullspace of A: N(A) = the orthogonal complement of the row space C(A^T), then just take A = B^T and substitute for B.(3 votes)

- How come A-transpose multiplied by (Ax-b) equal to zero vector? This statement comes after the part where you state that Ax-b belongs to Null space of A-transpose.(3 votes)
- Because A-transpose is really any vector in Col(A). Since (Ax-b) is orthogonal to Col(A), it must be orthogonal to any vector in Col(A). Therefore A-transpose dot (Ax-b) = 0. When you apply dot product to two vectors, you are really multiplying Vector1-transpose and Vector2's column vector representation.(2 votes)

- What if instead of trying to minimize || b-Ax* || we tried to minimize b-Ax* itself?, I mean, if b and Ax* where to be equal both equations would be zero, so it should not matter which one we minimize. Maybe || b-Ax* || it's easier to minimize than b-Ax*, but I am not sure.(2 votes)
- If we were to minimize b-Ax*, that would mean minimizing a vector instead of the vector magnitude. But what does it mean to minimize the vector? Make it as close to the zero vector, I presume. Therefore, it would make more sense to minimize in terms of the distance from the zero vector (the origin): which happens to be ||b-Ax*||(3 votes)

- Does this imply that if there is no solution, A' isn't invertible? Otherwise, if you left-multiplied by the inverse of A', you would get Ax = b.(2 votes)
- At10:27, he refers to the nullspace of the transpose of A. Could someone tell me which videos introduce this concept?(2 votes)
- At8:25Sal subtracts b from both sides of the equation Ax* = proj of b onto C(A). This makes sense algebraically but I don't understand how it makes sense spatially. It seems to me that b - Ax* would be equal to the vertical component of b(which would also be orthogonal to C(A) and is the thing we are trying to minimize). It doesn't make sense to me that instead, substracting a vector from its horizontal component(in other words Ax* - b) would be equal to its vertical component.(2 votes)
- Vertical and horizontal are a bit relative, but i think I get what you're saying.

It might help to think of Ax*-b as being -(b-Ax*) So b-Ax* can be seen as the vector with its tail at the head of Ax* and then points to the head of b, meanwhile Ax*-b is the opposite direction but the same vector otherwise.

You can always think of vector subtraction as connecting the tip of one to the other. So in Ax*-b we go fromt he tip of b to the tip of Ax* It is still orthogonal to Ax*

If that doesn't make sense let me know maybe a general example can help.

say you have two vectors in R2 A = <a1, a2> and B = <b1, b2>. now let's subtract one from the other. first A-B. This gets us <a1-b1, a2-b2> NOW what if we add B + A-B. You kinda don't have to even do vector addition if you just add the variables. B+A-B = A So that means if you go from the tip of B and go the distance and direction of vector A-B then you get the vector A. Similarly you can get the same vector but in the opposite direction if you add A + B-A, B-A goes from the tip of A to the tip of B.

I hope this all helped.(1 vote)

- Is this linear regression by least square approximation in the case of R2?(2 votes)
- I'm a little confused about the logic of this...

Ax*=(projection of B on to the column space of A)

Multiplying both sides by A(t) yields A(t)Ax*=A(t)(projection of B onto the column space of A)

But we also know from the video that A(t)Ax*=A(t)b

That means that A(t)b=A(t)(projection of B onto the column space of A)

Is this right?(2 votes)- We're trying to get the least distance, which we know is the projection. Since the projection onto a subspace is defined to be in the subspace, then there HAS to be a solution to Ax*=projection onto C(A) of b. Essentially, we know what vector will give us an answer closest to b, so we replace b with that. A^T*b=/=A^T*(projection onto C(A) of b), since that implies that b=projection onto C(A) of b. Since the projection onto C(A) of b is defined to be in the subspace C(A), Ax=b would HAVE to have a solution and this whole process would be useless.(1 vote)

## Video transcript

Let's say I have
some matrix A. Let's say it's an n-by-k
matrix, and I have the equation Ax is equal to b. So in this case, x would have to
be a member of Rk, because we have k columns here, and
b is a member of Rn. Now, let's say that it just so
happens that there is no solution to Ax is equal to b. What does that mean? Let's just expand out A. I think you already know
what that means. If I write a like this, a1, a2,
if I just write it as its columns vectors right there,
all the way through ak, and then I multiply it times x1,
x2, all the way through xk, this is the same thing as
that equation there. I just kind of wrote out
the two matrices. Now, this is the same thing as
x1 times a1 plus x2 times a2, all the way to plus xk times ak
is equal to the vector b. Now, if this has no solution,
then that means that there's no set of weights here on the
column vectors of a, where we can get to b. Or another way to say it is, no
linear combinations of the column vectors of a will
be equal to b. Or an even further way of saying
it is that b is not in the column space of a. No linear combination of these
guys can equal to that. So let's see if we can
visualize it a bit. So let me draw the column
space of a. So maybe the column space of
a looks something like this right here. I'll just assume it's
a plane in Rn. It doesn't have to be a plane. Things can be very general, but
let's say that this is the column space. This is the column space of a. Now, if that's the column space
and b is not in the column space, maybe we
can draw b like this. Maybe b, let's say this is the
origin right there, and b just pops out right there. So this is the 0 vector. This is my vector b, clearly
not in my column spaces, clearly not in this plane. Now, up until now, we would
get an equation like that. We would make an augmented
matrix, put in reduced row echelon form, and get a line
that said 0 equals 1, and we'd say, no solution, nothing
we can do here. But what if we can do better? You know, we clearly can't
find a solution to this. But what if we can find
a solution that gets us close to this? So what if I want to find some
x, I'll call it x-star for now, where-- so I want to find
some x-star, where A times x-star is-- and this is
a vector-- as close as possible-- let me write this--
as close to b as possible. Or another way to view it, when
I say close, I'm talking about length, so I want to
minimize the length of-- let me write this down. I want to minimize the length
of b minus A times x-star. Now, some of you all
might already know where this is going. But when you take the difference
between 2 and then take its length, what
does that look like? Let me just call Ax. Ax is going to be a member
of my column space. Let me just call that v. Ax is equal to v. You multiply any vector in Rk
times your matrix A, you're going to get a member of
your column space. So any Ax is going to be
in your column space. And maybe that is the vector v
is equal to A times x-star. And we want this vector to get
as close as possible to this as long as it stays--
I mean, it has to be in my column space. But we want the distance between
this vector and this vector to be minimized. Now, I just want to show you
where the terminology for this will come from. I haven't given it its
proper title yet. If you were to take this
vector-- let just call this vector v for simplicity-- that
this is equivalent to the length of the vector. You take the difference between
each of the elements. So b1 minus v1, b2 minus v2,
all the way to bn minus vn. And if you take the length of
this vector, this is the same thing as this. This is going to be equal
to the square root. Let me take the length
squared, actually. The length squared of this is
just going to be b1 minus v1 squared plus b2 minus v2 squared
plus all the way to bn minus vn squared. And I want to minimize this. So I want to make this value the
least value that it can be possible, or I want to get the
least squares estimate here. And that's why, this last minute
or two when I was just explaining this, that was just
to give you the motivation for why this right here is called
the least squares estimate, or the least squares solution,
or the least squares approximation for the equation
Ax equals b. There is no solution to this,
but maybe we can find some x-star, where if I multiply A
times x-star, this is clearly going to be in my column space
and I want to get this vector to be as close to
b as possible. Now, we've already seen in
several videos, what is the closest vector in any
subspace to a vector that's not in my subspace? Well, the closest vector to
it is the projection. The closest vector to b, that's
in my subspace, is going to be the projection of
b onto my column space. That is the closest
vector there. So if I want to minimize this,
I want to figure out my x-star, where Ax-star is equal
to the projection of my vector b onto my subspace or onto
the column space of A. Remember what we're
doing here. We said Axb has no solution, but
maybe we can find some x that gets us as close
as possible. So I'm calling that my least
squares solution or my least squares approximation. And this guy right here is
clearly going to be in my column space, because you take
some vector x times A, that's going to be a linear combination
of these column vectors, so it's going to
be in the column space. And I want this guy to be as
close as possible to this guy. Well, the closest vector in my
column space to that guy is the projection. So Ax needs to be equal
to the projection of b on my column space. It needs to be equal to that. But this is still pretty
hard to find. You saw how, you know, you took
A times the inverse of A transpose A times A transpose. That's hard to find that
transformation matrix. So let's see if we can find an
easier way to figure out the least squares solution, or kind
of our best solution. It's not THE solution. It's our BEST solution
to this right here. That's why we call it the least
squares solution or approximation. Let's just subtract b from
both sides of this and we might get something
interesting. So what happens if we take Ax
minus the vector b on both sides of this equation? I'll do it up here
on the right. On the left-hand side we
get A times x-star. It's hard write the x and
then the star because they're very similar. And we subtract b from it. We subtract our vector b. That's going to be equal to the
projection of b onto our column space minus b. All I did is I subtracted
b from both sides of this equation. Now, what is the projection
of b minus our vector b? If we draw it right here, it's
going to be this vector right-- let me do it in
this orange color. It's going to be this
right here. It's going to be that vector
right there, right? If I take the projection of b,
which is that, minus b, I'm going to get this vector. you
we could say b plus this vector is equal to
my projection of b onto my subspace. So this vector right
here is orthogonal. It's actually part of the
definition of a projection that this guy is going to be
orthogonal to my subspace or to my column space. And so this guy is orthogonal
to my column space. So I can write Ax-star minus
b, it's orthogonal to my column space, or we could
say it's a member of the orthogonal complement
of my column space. The orthogonal complement is
just the set of everything, all of the vectors that are
orthogonal to everything in your subspace, in your column
space right here. So this vector right here
that's kind of pointing straight down onto my plane
is clearly a member of the orthogonal complement
of my column space. Now, this might look familiar
to you already. What is the orthogonal
complement of my column space? The orthogonal complement of
my column space is equal to the null space of a transpose,
or the left null space of A. We've done this in many,
many videos. So we can say that A times my
least squares estimate of the equation Ax is equal to
b-- I wrote that. So x-star is my least squares
solution to Ax is equal to b. So A times that minus
b is a member of the null space of A transpose. Now, what does that mean? Well, that means that if I
multiply A transpose times this guy right here, times
Ax-star-- and let me, no I don't want to lose the vector
signs there on the x. This is a vector. I don't want to forget that. Ax-star minus b. So if I multiply A transpose
times this right there, that is the same thing is that,
what am I going to get? Well, this is a member of the
null space of A transpose, so this times A transpose has
got to be equal to 0. It is a solution to A transpose
times something is equal to the 0 vector. Now. Let's see if we can simplify
this a little bit. We get A transpose A times
x-star minus A transpose b is equal to 0, and then if we add
this term to both sides of the equation, we are left with A
transpose A times the least squares solution to Ax
equal to b is equal to A transpose b. That's what we get. Now, why did we do
all of this work? Remember what we started with. We said we're trying to find a
solution to Ax is equal to b, but there was no solution. So we said, well, let's find
at least an x-star that minimizes b, that minimizes
the distance between b and Ax-star. And we call this the least
squares solution. We call it the least squares
solution because, when you actually take the length, or
when you're minimizing the length, you're minimizing the
squares of the differences right there. So it's the least squares
solution. Now, to find this, we know
that this has to be the closest vector in our
subspace to b. And we know that the closest
vector in our subspace to b is the projection of b onto our
subspace, onto our column space of A. And so, we know that A--
let me switch colors. We know that A times our least
squares solution should be equal to the projection of b
onto the column space of A. If we can find some x in Rk that
satisfies this, that is our least squares solution. But we've seen before that
the projection b is easier said than done. You know, there's a
lot of work to it. So maybe we can do
it a simpler way. And this is our simpler way. If we're looking for this,
alternately, we can just find a solution to this equation. So you give me an Ax equal to
b, there is no solution. Well, what I'm going to do is
I'm just going to multiply both sides of this equation
times A transpose. If I multiply both sides of this
equation by A transpose, I get A transpose times Ax is
equal to A transpose-- and I want to do that in the same
blue-- A-- no, that's not the same blue-- A transpose b. All I did is I multiplied
both sides of this. Now, the solution to this
equation will not be the same as the solution to
this equation. This right here will always
have a solution, and this right here is our least
squares solution. So this right here is our
least squares solution. And notice, this is some matrix,
and then this right here is some vector. This right here is
some vector. So long as we can find a
solution here, we've given our best shot at finding a solution
to Ax equal to b. We've minimized the error. We're going to get Ax-star,
and the difference between Ax-star and b is going
to be minimized. It's going to be our least
squares solution. It's all a little bit abstract
right now in this video, but hopefully, in the next video,
we'll realize that it's actually a very, very
useful concept.