If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Least squares examples

An example using the least squares solution to an unsolvable system. Created by Sal Khan.

## Want to join the conversation?

• Is this least squares solution equivalent to the orthocenter of the triangle?
If the shortest distance is the projection, then the solution point should be on the perpendicular to each line.
...or did I miss something? • Nope, it's not the orthocenter. The least squares solution is always inside the triangle whereas the orthocenter can be outside. Nor does the least squares solution minimize the sum of the perpendicular distances to each line as some commenters claim, nor the distances to the corners of the triangle which was my second guess. In fact, the least squares solution has no geometric significance in R^2 because it's not unique! If you take the equation for one of the lines and multiply through by a constant then the least squares solution changes, even though it's the exact same line. It turns out every point in the triangle is a valid least squares solution if you use the right coefficients.

The way to see what's going on geometrically in this problem is to look at the codomain in R^3 and the solution vector b = [2, 1, 4]. Applying transformation A to the original triangle gets you a new triangle in R^3 whose corners are [2, 1, 1] [2, 6, 4] and [17, 1, 4]. If we subtract b from each corner vector we get [0, 0, -3] [0, 5, 0] and [15, 0, 0]. And now, if we multiply the original linear equations by the constants c1, c2, and c3 respectively then those difference vectors become [0, 0, -3*c1] [0, 5*c2, 0] and [15*c3, 0, 0]. So without changing the original triangle we can stretch the transformed triangle relative to b along all 3 axes at will. And if we stretch it the right way we can move the shortest vector from b to the transformed triangle wherever we like, and thus we can move the least squares point of the original triangle wherever we like.
• I'm having trouble figuring out why I'm getting two different answers. The second line written in slope intercept form has a slope of -1/2 and an intercept of 1/2. So I would think the line could also be written as (1/2)x + y = 1/2. However, when I then go through the process of finding the approximate intersection using these values in my matrices (using for the second rows of A and b [1/2 1] and [1/2] instead of [1 2] and  as they are in the video) I end up with a different approximate intersection (x: 52/31 y: 69/62). I've gone back through the math a couple times and don't seem to be finding a place where I'm making an error. Am I missing something? Or is such an approximate solution not unique and I'll find different ones depending on which numbers I use? Thanks. • At around 9 minutes, when he swaps the rows (1,6 and 6,1), why does it not make the matrix negative? • At around 9 minutes, when he swaps the rows (1,6 and 6,1), why does not swap x1 and x2, so the result it [3/7; 10/7]; if you don swap x1 and x2, [6 1; 1 6]*[10/7; 3/7] is not equal [9; 4] • If he didn't swap the rows, he would've gotten the same answer:
`[6 1 | 9][1 6 | 4]`
divide R1 by 6:
`[1 1/6 | 3/2][1 6 | 4]`
R2-R1:
`[1 1/6 | 3/2][0 35/6 | 5/2]`(4-3/2 = 8/2-3/2 = 5/2)
R2*6/35:
`[1 1/6 | 3/2][0 1 | 3/7]`(5/2*6/35 = 30/70 = 3/7)
R1-R2/6:
`[1 0 | 10/7]`(3/2-(3/7)/6 = 3/2-3/42 = 21/14-1/14 = 20/14 = 10/7)
`[0 1 | 3/7]`
This is the same thing he got in the video, but I never swapped the 2 rows.
• Shouldn't it be x ~= [(A^T * A)^-1] * [A^T * b]?
(the inverse of A^T * A) • That's basically what he did, except he solved the system by reducing to reduced row echelon form instead of multiplying by the inverse.

1st he multiplied A^T*A to get a matrix, we'll call it C
2nd he multiplied A^T*b to get a vectord we'll call it d
Lastly he solved Cx=d
which you could solve by reducing to reduce row echelon form or you could solve by C^-1*d=x
which is the same as (A^T*A)^-1*(A^T*b)=x

Hope this makes sense
• Is there a way to associate a confidence with the final intersection point in this least squares example? I've been looking at confidence videos and lessons and not finding a direct way to apply that to this example. • In the previous video this formula was derived with the idea of vectors in R3 and its projection onto some C(A). How does this link to finding a point closest to 3 lines? • I can't really visualise how this fits in with the idea of projections (like the example in the last video did). Would someone be able to share an intuition of how this example works in terms of a projection of b onto C(A)?
(1 vote) • How does this work in a more general setting? Say we have A*f(x) + B*f(y) = C ? Being more specific/example, something like having an equation like y=Ae^(Bt) + C. How would we regress a sample of data points if we have such a model? I tried working this out using a linear model as a template but the B coefficient remained. I am now guessing there is no general way to regress coefficients using reduced row echelon form.
(1 vote) • i love this lesson because it feels like the entire class up to this point is just exposition for this extremely useful application
(1 vote) ## Video transcript

So I've got three lines in R2, and I want to find their intersection. So the first one is 2x minus y is equal to 2, the second one is x plus 2y is equal to 1, and the third one is x plus y is equal to 4. So let's first just graph these, just to have a visual representation of what we're trying to do. So I like writing my lines in y equals mx plus b form. So this top line becomes what? Minus y is equal to minus 2x, minus 2x plus 2. I just subtracted 2x from both sides. Or we could write that y is equal to 2x minus 2. That's the first line here. The second line-- I'm doing it in green --we could write this as 2y is equal to minus x plus 1, or we could write that y is equal to minus 1/2 x plus 1/2. I just divided both sides by 2. And this last line right here, we could write this as y is equal to minus x plus 4. Go straight here. y is equal to minus x plus 4. Now let me graph these. Let me draw an axis, that is my y-axis. I can call it my y-axis since we're actually dealing with x and y's now. I'll do it slightly lighter. I'll do it in this grey as well. I'd say that is my x-axis. Just like that. And this first guy is going to be 2x minus 2. So it's y-intercept is going to be at minus 2. It's going to have a slope of 2, so it's going to be a pretty steep line, just like that. So that's that first line, just like that. 2x minus 2. The next line is 1/2, minus 1/2, plus 1/2. So if we go plus 1/2, that's right there, and then the slope is minus 1/2. So for every 2 we go over, we go down 1. So it's going to be like that. It's actually going to be orthogonal, right? Because it's the negative inverse of this guy. So it's going to look something like that. We draw it like that, just like that. And this guy is minus x plus 4. This is 1, 2, 3, 4, and you go minus x, so for every 1 you go over, you go down 1. So this other line is going to look something like this. This other line, this last line, is going to look something like that. Just like that. Now, I said at the beginning of this video that I want to find the intersection of these three lines. But notice, there is no intersection of these three lines. They all intersect the other two, but they don't all intersect each other in one point. We can kind of call the system as being overdetermined. We've overconstrained it. There is no intersection of all three of these points. So if I were to actually try to solve this system, I would find no solution. And to say that this has no solution is equivalent to saying that this matrix, or this equation has no solutions. Let me write this. I'm just going to rewrite the system like this. This is equivalent to the matrix, let me make sure I get this right, the matrix times the vector xy is equal to 2, 1, and 4. And so, this first equation is 2 times x, minus 1 times y. So it's 2 and minus 1. 2 times x minus 1 times y is equal to 2. That's that first equation over there. The second equation-- actually I could even, well I won't color code it, that'll take forever --that's 1 times x plus 2 times y is equal to 1. And then we have x plus y is equal to 4. This system and this equation, this system right here, these are equivalent. Now, this isn't going to have any solution. You could try to find a solution to this. You could create an augmented matrix, put it in reduced row echelon form. But there is no intersection to the three things. So you're not going to find a solution to A times some vector-- we could call this some vector x --is equal to B. Or another way to say it, is that B is not in the column space of this matrix right here. Now, we learned in the last video that sure, we can't find a solution to Ax equals B. Ax equals B has no solution. We see it graphically here; these lines don't intersect with each other. And you could prove it for yourself algebraically by trying to find a solution here. You'll end up with a 0 equals 1. But we can almost get there by finding a least squares solution. And we find a least squares solution if we multiply both sides by A transpose. We know that A transpose times A times our least squares solution is going to be equal to A transpose times B. So at least we can find the closest fit for our solution. So let's find the vector x this is our least squares solution. So what is A transpose times A? A transpose looks like this, you'll have 2 minus 1, 1, 2, 1, 1. That is A transpose. And then of course A is just this thing: 2 minus 1, 1, 1, 1, 1. So A transpose A is going to be equal to-- We have a 2 by 3 times a 3 by 2 matrix, so it's going to be a 2 by 2 matrix. So it's going to be a 2 by 2 matrix. So what do we get, we get 2 times 2 which is 4, plus 1 times 1, plus 1 times 1. So it's 4 plus 1 plus 1. So that's equal to 6. And then we have 2 times minus 1, which is minus 2, plus 1 times 2, so those cancel out. Minus 2 plus 2 is 0, plus 1 times 1. That's just going to be 1. And then we get minus 1 times 2, which is minus 2, plus 2 times 1, which is 2. So the minus 2 plus 2 is 0, plus 1 times 1, so we get a 1. And then finally, we get minus 1 times minus 1, which is positive 1. Plus 2 times 2, which is 4, so we're now at 5. Plus 1 times 1, so this is going to be 6. So this is A transpose A. Now, what is A transpose times B? A transpose is 2, 1, 1, minus 1, 2, 1. And then B is just the 3 by 1 vector, the vector that's a member of R3, 2, 1, 4. So what is this going to be equal to? This is going to be equal to-- We get a 3. Sorry, this is a 2 by 3 times a 3 by 1. We're going to get a 2 by 1 vector. You get a 2 by 1 vector here. So 2 times 2 is 4, plus 1 times 1, so that's plus 1, so that's 5. Plus-- let me actually write it down, I'm going to make a careless mistake --2 times 2, which is 4, plus 1 times 1, which is 1, plus 1 times 4, which is 4. Then here, you have minus 1 times 2, which is minus 2, 2 times 1, which is 2, plus 1 times 4, which is 4. So A transpose times B is equal to 9, and this is going to be 4. So we can rewrite this guy right here as the matrix A transpose A. Which is just 6, 1, 1, 6 times my least squares solution-- so this is actually going to be in the column space of A --is equal to A transpose times B, which is just the vector 9 4. And this'll be a little bit more straightforward to find a solution for. In fact, there will be a solution. We proved it in the last video. So to find a solution, let's create our little augmented matrix: 6, 1, augmented with a 9. You have the 4 there, you get a 1 and a 6. Just like that. Let's put the left hand side in reduced row echelon form. Actually, first I'm going to swap these two rows. That's my first row operation that I choose to do, just because I like to have that 1 there. It's a nice pivot entry. So then it goes to 1, 6, 4, and 6, 1, 9. And then let me replace my second row with the second row minus 6 times the first row. So I'm going to keep my first row the same. So I have 1, 6, and 4. And then my second row, I'm going to replace my second row with a second row minus 6 times the first row. So 6 minus 6 is 0. 1 minus 6 times 6, that's 1 minus 36, so that's minus 35. And then a 9 minus 6 times 4, is 9 minus 24. This one always gets me in trouble. So 9 minus 24, that's the negative of 24 minus 9, so that is minus 15. Let me make sure I didn't make a careless mistake. 1 minus 36 is minus 35, 9 minus 24, is minus 15. So that's what I get right there. And then let me go to the right now, so let me let me divide this row right here, let me divide it by minus 35. So I'm going to keep my first row the same. 1, 6, and 4. And then this guy right here, I'm going to divide by minus 35. So I'm going to get 0, 1, and then minus 15 over minus 35. That's 15 over 35, or that's 3 over 7, so that is equal to 3/7. And then let me just put this in complete reduced row echelon form. That'll be nice. Let me keep my second row the same. So my second row is 0, 1, and 3/7. And then my first row, I'm going to replace it with my first row minus 6 times my second row. So 1 minus 6 times 0 is 1, 6 minus 6 times 1 is 0, and then we have 4 minus 6 times 3/7. So 4 is 28/7. Let me write it up here. So we have 4, which is 28/7, minus 6 times 3/7, so minus 18/7, right? That's 6 times 3/7. So this is going to be equal to 10/7. And just like that, I've solved this new equation. So you could say that-- Let me write it this way. We could write that x star-- This is going to the first entry of x star, which we could call x, is going to be 10/7. Let me write this. x star, the solution is going to be 10/7 and 3/7. So I'm saying that if you take x is equal to 10/7 and y is equal to 3/7, you're going to get as close to a solution as possible. Let's see what that looks like visually? What is 10/7? Let me write this down. x star is equal to 10/7 and 3/7. We're saying the closest-- Our least squares solution is x is equal to 10/7, so x is a little over one. And then y is going to be 3/7, a little less than 1/2. So our least squares solution is going to be this one, right there. And so this, when you put this value for x, when you put x is equal to 10/7 and y is equal to 3/7, you're going to minimize the collective squares of the distances between all of these guys. I drew this a little bit too small to show that. But let's actually figure out what our least, what our minimized difference is. Remember, the whole point of the is to minimize the distance between Ax star, is to minimize the distance between Ax star and B. Or between B and Ax star. Now, what was Ax star equal to? Ax star was equal to 9/4. So this right here. Sorry, Ax star, that's not equal to 9/4, that's A transpose Ax star is equal to 9/4. Ax star is our original matrix A, which is this one right here, so 2-- Let me write it down here. I know it's off the page right now. So our original matrix A was 2, minus 1, 1, 2, 1, 1. That was our original matrix A right there. And then our x star, we were able to determine, is 10/7 and 3/7. So Ax star is going to be this product, so what does it equal to? It is equal to, it is going to be a 3 by 1 matrix. So we got 2 times 10/7, which is 20/7, minus 1 times 3/7, so minus 3/7. And then we get, let me see, yep, that's 10 minus 3/7, and we have 10/7, minus 2 times 3/7, so minus 6/7. Or plus, sorry, this is a plus. Right? 2 times 10/7 is 20/7 minus 1 times 3/7, then we have 1 times 10/7 plus 2 times 3/7. And then we have 10/7 plus 3/7. So Ax, so this is A and x star, our least squares approximation for x, is equal to what is this? This is 17/7, this is 16/7, and this is 13/7. We want to find out with this minimum distance is. So, let's see, this is going to be this thing. So 17/7, 16/7, and 13/7 minus our original B. Our original B was 2, 1, and 4. So I'm claiming that my solution that we just found, this minimizes this distance. Because this is the projection of B onto the column space of A. We saw that before. So our B we see all the way up here is 2, 1, 4, just like that. So if we take the length of this-- let me switch colors --this is equal to the length. Let me write all of us in sevenths-- I'll just do it in my head. I don't want to waste too much time. So this is 17/7 minus 14/7, right? 2 is 14/7, so this is going to be 3/7. Then it's 16/7 minus 7/7, so that's 9/7. And then we have 13/7 minus 28/7, so that is minus 15 over 7. So this is the vector that separates the B that was not in my column space of A from the projection of B. And if we find it's length, it's length is going to be equal to-- Let's find the square of it's length first. The square of its length is going to be 3/7 squared, so that is 9/49, plus 9/7 squared, which is 81/49, plus minus 15/7 squared. What's 15 squared? 15 squared is 225, I think. Let me make sure. I'm prone to careless mistakes. 5 times 5 is 25, 1 times 5, this is 75, and then I have a 150. Yep, 225. So plus 225/49. Which is equal to-- So 9 plus 81 is 90, and then so 225 plus 90, to get a 5, was a 315. So this is equal to 315/49. Or, if we actually wanted the difference, that's going to square root of this. So if we take just the regular distance, that's equal to the square root of that. So it's equal to the square root of 315 over 7. And square root of 315, it looks like that is simplifiable. Does 9 go into it? Looks like 9 goes into it, maybe 35 five times? So it would be what, 3 square roots of 35 over 7. So that is just a measure. Let me put it this way, you're not going to be able to find any member of R2, any values of x and y, that's going to give you a smaller value than this when you find the distance between it's solution and the solution you were trying to get to. So this, based on our least squares solution, is the best estimate you're going to get. x is equal to 10/7, y is equal to 3/7. A little bit right, just like that. Anyway, hopefully you found that useful, and you're starting to appreciate that the least squares solution is pretty useful.