Main content

## Linear algebra

### Course: Linear algebra > Unit 3

Lesson 2: Orthogonal projections- Projections onto subspaces
- Visualizing a projection onto a plane
- A projection onto a subspace is a linear transformation
- Subspace projection matrix example
- Another example of a projection matrix
- Projection is closest vector in subspace
- Least squares approximation
- Least squares examples
- Another least squares example

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Another least squares example

Using least squares approximation to fit a line to points. Created by Sal Khan.

## Want to join the conversation?

- At12:00it is y = (2/5)x + (4/5), since m* = (2/5) and b* = (4/5).(19 votes)
- So, if I am correct, this is how a calculator takes a set of data points and creates a linear regression? Because it seems as if the "closest solution" to the line is easily translatable to "best fit line". If so, would any other type of regression just be non-linear transformations?(9 votes)
- No, most calculators actually use the statistics approach, not the linear algebra approach. The way that works is it first calculates the correlation coefficient by averaging the products of the x-coordinate's z-score and the y-coordinate's z-score. It then multiplies the correlation coefficient by the standard deviation of the y's and divides by the standard deviation of the x's. That number is the slope. From there, it is trivial to find the y-intercept given the fact that the line must pass through the grand mean, the point whose x-coordinate is the x-mean and whose y-coordinate is the y-mean.

The reason calculators don't use the Linear Algebra method is because that entails finding the inverse of a potentially very large matrix. That takes a long time, even for a calculator. Of course, it does still give the same answer.(11 votes)

- This is linear regression, right?(10 votes)
- Why did he write 2/5 for b and graph it when [m_star, b_star] = [2/5, 4/5]? He starts writing it at the12:05mark. I do not know what I am missing here, please help me!(6 votes)
- You didn't miss a thing. In fact, you caught something Sal missed. I understand that he doesn't want to remake the video, but they should add an annotation stating that the LSE fit line should be y = 2/5x + 4/5.(5 votes)

- So basically, this means that for the calculated straight f*(x) = m*x + b*, the total sum of the distances of each given point to f*(x) is the smallest for m* and b*. Am I correct? Would've been a nice geometrical and algebraic interpretation of the result :)

Correct me if I'm wrong(5 votes)- Close. The sum of the
*squares*of the distances of each given point is minimized. That's why it's called a "least-squares" approximation. And also note that if you're interpreting it geometrically, the distance you're considering isn't the straight-line distance given by scalar projection, it's the vertical distance f*(x) − f(x). Check out the statistics trail for more stuff on linear regressions.(5 votes)

- In the Video "Regression Line Example" you use the least squares method with equations for m, and b. Couldn't you use that strategy for this example too? Are the least squares solutions the same as they are in the regression line? Mainly, my question is, what are is the difference between this video and "Regression Line Video"? Thanks.(3 votes)
- You are right.(1 vote)

- I understand the technique Sal used, but I can not understand something more fundamental: should not the solution set of Ax=b remains the same (i.e. no solution) when we multiply both side by A_transpose? Its seems like multiplying both sides by the same quantity somehow produce new solution.(3 votes)
- Sal explains how he got that one clip earlier(1 vote)

- Why cant we take the line equation as "ax+by=1" instead of "y=mx+b" and find the values of a and b using least square method? I tried doing so, but iam arriving at a different answer.(3 votes)
- In the previous video, Sal did it the way you suggest (ax+by=c, for the three lines of the triangle). However, this doesn't optimize for the distance y, it optimizes for c, which has dubious value.

By first converting the lines to mx+b=y, we can now optimize (via least-squared distance) for y.(1 vote)

- Is this Gauss Markov theorem?(2 votes)
- If Ax = b and b is not in C(A) and A^T*A is a singular matrix, how can we find the least square solution of Ax = b?(1 vote)
- That is not possible. You can never have a vector b which is equal to Ax and yet not in C(A). The C(A) is, by definition, the space of all vectors b such that Ax = b.(2 votes)

## Video transcript

So I've got four Cartesian
coordinates here. This first one is minus 1, 0. I tried to draw them
ahead of time. So minus 1, 0 is this
point right there. Doing this in these
new colors. The next point is
a 0, 1, which is that point right there. Then the next point is
1, 2, which is that point right up there. And then the last point is 2, 1,
which is that point there. Now my goal in this video is to
find some line, y equals mx plus v, that goes through
these points. Now the first thing I'd say
is, hey Sal, there is not going to be any line that goes
through these points, and you can see that immediately. You could find a line that
maybe goes through these points, but it's not
going to go through this point over here. If you try to make a line to
goes through these two points, it's not going to go through
those points there. So you're not going to be able
to find a solution that goes through those points. Let's set up the equation that
we know we can't find the solution to and maybe we can
use our least squares approximation to find a line
that almost goes through all these points. Or it's at least the best
approximation for a line that goes through those points. So this first one, I can
express my line, y equals mx plus b. Let me just express it as f of
x is equal to mx plus b, or y is equal to f of x. We can write it that way. So our first point right there
-- let me do it in that color, that orange -- that tells us
that f of minus 1, which is equal to m times -- let me just
write this way -- minus 1 times m, it's minus m plus
b, that that is going to be equal to 0. That's what that first
equation tells us. The second equation tells us
that f of 0, which is equal to 0 times m, which is just
0 plus b is equal to 1. f of 0 is 1. This is f of x. The next one -- let me do it in
this yellow color -- tells us that f of 1, which is equal
to 1 times m, or just m, plus b, is going to be equal to 2. And then this last one down
here tells us that f of 2, which is of course 2 times m
plus b, that that is going to be equal to 1. These are the constraints. If we assume that our line can
go through all of these points, then all of these
things must be true. Now you could immediately, if
you wish, try to solve this equation, but you'll find that
you won't find a solution. We want to find some m's
and b's that satisfy all of these equations. Or another way of writing this
-- We want to write it as a matrix vector or a
matrix equation . We could write it like this. Minus 1, 1, 0, 1, 1, 1, 2, 1,
times the vector mv has got to be equal to the vector
0, 1, 2, 1. These two systems, this system
and this system right here, are equivalent statements,
right? Minus 1 times m plus 1 times b
has got to be equal to that 0. 0 times m plus 1 times b has
got to be equal to that 1 That's equivalent to that
statement right here. And this isn't going
to have a solution. The solution would have to go
through all of those points. So let's at least try to find
a least squares solution. So if we call this a, if we call
that x, and let's call this b, there is no solution
to ax is equal to b. Now maybe we can find a least
-- Well, we can definitely find a least squares solution. So let's find our least squares
solution such that a transpose a times our least
squares solution is equal to a transpose times b. Our least squares solution
is the one that satisfies this equation. We proved it two videos ago. So let's figure out what a
transpose a is and what a transpose b is, and
then we can solve. So a transpose will
look like this. b minus 1, 1, 0, 1, 1,
1, and then 2, 1. This first column becomes this
first row; this second column becomes this second row. So we're going to take the
product of a transpose and then a-- a is that thing right
there --minus 1, 0, 1, 2, and we just get a bunch of 1's. So what does this equal to? We have a 2 by 4
times a 4 by 2. So we're going to have
a 2 by 2 matrix. So this is going to be --
Let's do it this way. Well, we're going to have minus
1 times minus 1, which is 1, plus 0 times 0, which is
0 -- so we're at 1 right now --plus 1 times 1. So that's 1 plus the other
1 up there, so that's 2, plus 2 times 2. 2 times 2 is 4, so we get 6. That's that row, dotted with
that column, was equal to 6. Now let's take this row dotted
with this column. So it's going to be negative 1
times 1, plus 0 times 1, so all of these guys times
1 plus each other. So minus 1 plus 0 plus 1 --
that's all 0's --plus 2. So it's going to get a 2. I just dotted that guy
with that guy. Now I need to take the dot of
this guy with this column. So it's just going to be 1 times
minus 1 plus 1 times 0 plus 1 times 1 plus 1 times 2. Well, these are all 1 times
everything, so it's minus 1 plus 0 plus 1, which
is 0 plus 2. It's going to be 2. And then finally -- Well. I mean, I think you see
some symmetry here. We're going to have to take the
dot of this guy and this guy over here. So what is that? That's 1 times 1, which is 1,
plus 1 times 1, which is 2, plus 1 times 1. So we're going to have 1
plus itself four times. So we're going to get that
it's equal to 4. So this is a transpose a. And let's figure out what a
transpose b looks like. Scroll down a little bit. So a transpose is this matrix
again-- let me switch colors --minus 1, 0, 1, 2. We get all of our 1's
just like that. And then the matrix
b is 0, 1, 2, 1. We have a 2 by 4 times a 4 by 1,
so we're just going to get a 2 by 1 matrix. So this is going to be equal
to a 2 by 1 matrix. We have here, let's see, minus 1
times 0 is 0, plus 0 times 1 is still 0. Plus 1 times 2, which
is 2, plus 2 times 1, which is 4, right? This is 2 plus 2, so it's going
to be 4 right there. And then we have 1 times 0, plus
1 times 2, plus-- So 1 times all of these
guys added up. So 0 plus 1 is 1, 1 plus
2 is 3, 3 plus 1 is 4. So this right here
is a transpose b. So just like that, we know
that the least squares solution will be the solution
to this system. 6, 2, 2, 4, times our least
squares solution, is going to be equal to 4, 4. Or we could write it this way. We could write it 6, 2, 2, 4,
times our least squares solution, which I'll write--
Remember, the first entry was m . I'll write it as m star. That's our least square m, and
this is our least square b, is equal to 4, 4. And I can do this as an
augmented matrix or I could just write this as a system
of two unknowns, which is actually probably easier. So let's do it that way. So this, if I were to write it
as a system of equations, is 6 times m star plus 2 times
b star, is equal to 4. And then I get 2 times m star
plus 4 times b star is equal to this 4. So let me solve for my m
stars and my b stars. So let's multiply this second
equation, actually let's multiply that top
equation by 2. This is just straight
Algebra 1. So times 2, what do we get? We get 12m star plus 4b
star is equal to 8. We just multiplied that
top guy by 2. Now let's multiply this magenta
1 by negative 1. So this becomes a minus, this
becomes a minus, that becomes a minus, and now we can add
these two equations. So we get minus 2 plus 12m
star, that's 10m star. And then the minus 4b and the 4b
cancel out, is equal to 4, or m star is equal to 4 over
10, which is equal to 2/5. Now we can just go and
back-substitute into this. We can say 6 times m
star-- This is just straight Algebra 1. So 6 times our m star, so 6
times 2 over 5, plus 2 times our b star is equal to 4. Enough white, let
me use yellow. So we get 12 over 5 plus 2b
star is equal to 4, or we could say 2b star-- let me
scroll down a little bit --2b star is equal to 4. Which is the same thing as 20
over 5, minus 12 over 5, which is equal to-- I'm just
subtracting the 12 over 5 from both sides --which is
equal to 8 over 5. And you divide both sides of the
equation by 2, you get b star is equal to 4/5. And just like that, we got our
m star and our b star. Our least squares solution
is equal to 2/5 and 4/5. So m is equal to 2/5 and
b is equal to 4/5. And remember, the whole point
of this was to find an equation of the line. y is equal to mx plus b. Now we can't find a line that
went through all of those points up there, but this
is going to be our least squares solution. This is the one that minimizes
the distance between a times our vector and b. No vector, when you multiply
times that matrix a-- that's not a, that's transpose a --no
other solution is going to give us a closer solution to
b than when we put our newly-found x star into
this equation. This is going to give us
our best solution. It's going to minimize
the distance to b. So let's write it out. y is equal to mx plus b. So y is equal to
2/5 x plus 2/5. Let's graph that out. y is equal to 2/5 x plus 2/5. So its y-intercept is 2/5,
which is about there . This is at 1. 2/5 is right about there. And then its slope is 2/5. Let's think of it this way: for
every 2 and 1/2 you go to the right, you're going
to go up 1. So if you go 1, 2 and 1/2,
we're going to go up 1. We're going to go
up 1 like that. So our line-- and obviously this
isn't precise --but our line is going to look
something like this. I want to do my best shot
at drawing it because this is the fun part. It's going to look something
like that. And that right there is my least
squares estimate for a line that goes through
all of those points. And you're not going to find a
line that minimizes the error in a better way, at least when
you measure the error as the distance between this vector
and the vector a times our least squares estimate. Anyway, thought you would
find that neat.