If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Linear transformations as matrix vector products

Showing how ANY linear transformation can be represented as a matrix vector product. Created by Sal Khan.

Want to join the conversation?

  • blobby green style avatar for user doug.hawkes
    I believe I have watched all videos up to this point and this is the first one that has confused me. It seems to jump straight into transforming a matrix whilst the overall subject has been transforming vectors. Whilst I understand that the matrix can be considered as a collection of column (or row) vectors it doesn't explain the apparent jump to matrix transformation in the usual thorough way. So whilst we started of transforming a vector we appear to have transformed a collection of vectors and used the result to transform the vector! What does the transformed matrix of basis vectors (the transformed I matrix) represent?
    (10 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Lenny Leonard
    If any matrix-vector multiplication is a linear transformation then how can I interpret the general linear regression equation? y = X β.
    X is the design matrix, β is a vector of the model's coefficients (one for each variable), and y is the vector of predicted outputs for each object.
    Let's say X is a 100x2 matrix and β is a 2x1. Then y is a 100x1 matrix.

    The concept is clear but from a Linear Transformation point of view what doest it mean?

    I take a vector of coefficients in Rˆ2 and through X I transform it into a
    vector in Rˆ100. I can't visualize it logically...🤔
    (4 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user kubleeka
      A 100x2 matrix is a transformation from 2-dimensional space to 100-dimensional space. So the image/range of the function will be a plane (2D space) embedded in 100-dimensional space. So each vector in the original plane will now also be embedded in 100-dimensional space, and hence be expressed as a 100-dimensional vector.
      (5 votes)
  • blobby green style avatar for user doug.hawkes
    I'm confused as to what is being taught here. Is the the lesson saying that the transformation of a vector is equivalent to transforming the original basis and then using the result to transform the vector?
    (5 votes)
    Default Khan Academy avatar avatar for user
  • leafers tree style avatar for user José N. Olmos
    so, can i just arrange the linear 'instructions' in ascending order of the components of vector x take their coefficients of each term and plug it in to the matrix thats to be multiplied by the x vector ?? seems like a pretty legit shortcut now that i have an intuitive understanding of it
    (4 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Philip Tonaczew
    what do you call that matrix? Is it the standard matrix?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Araoluwa Filani
    In the statement, from previous discussion I think "The sum equal to the sum of their transformation: " x1T(e1) + x2T(e2) + ...+ xnT(en)
    should be written this way: e1T(x1) + e2T(x2) +...+ enT(xn). Can you explain why you put it that way and not like the way I thought it ought be written? Thanks
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user MBCory
      Remember that the transformation acts on vectors, not scalars. The terms x_1, x_2, etc are all scalars, so by definition, T(x_1) = x_1 always. However, a transformation CHANGES (ie transforms) vectors, so T(e_1) =\= e_1 necessarily. So when he applies T to the sum, he has T(x_1e_1) + ... + T(x_ne_n) = x_1T(e_1) + ... + x_nT(e_n), since you can't simplify it the other way. Hope this helps!
      (1 vote)
  • aqualine seed style avatar for user Erika
    This series have been helping me a lot and I am thankful for it, but this one made me so confused to the point I got a little desperate.
    (3 votes)
    Default Khan Academy avatar avatar for user
  • starky tree style avatar for user sixhundredandsixtysix
    At why does Sal want to avoid using L(x)?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user MBCory
      I think it is less that he wants to avoid the notation L(x) and more that since he is talking about an arbitrary linear transformation, it makes sense to use the notation T(x) which is standard when writing/describing a transformation. Hope this helps!
      (1 vote)
  • leaf green style avatar for user henrik.edman
    At and forward, what was the purpose of the last example? I don't think I understand what it's supposed to prove.
    (2 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user siddhantsaboo
      Well as the video name is says it just proves that any linear transformation process can be represented by the Matrix vector product.
      Which basically is saying that any transformation from n dimension to m dimension if properly defined could be computed potentially by a matrix vector product (Ax=B) which we have seen in the previous videos.
      (3 votes)
  • piceratops seed style avatar for user Kelly
    Is there going to be further explaination of the matrix multiplication? Because it was shown very fast at the end of this video and it leaves me a bit puzzled.
    (2 votes)
    Default Khan Academy avatar avatar for user

Video transcript

Let's say I have an n-by-n matrix that looks like this. So let me just see if I can do it in general terms. In the first row and first column, that entry has a 1, and then everything else, the rest of the n minus 1 rows in that first column are all going to be zeroes. So it's going to be zeroes all the way down to the nth term. And then the second column, we have a 0 in the first component, but then a 1 in the second component. And then it goes zeroes all the way down. And you keep doing this. In the third row, or let me say third column, although it would've applied to the third row as well, the 1 shows up in the third component, and then it's zeroes all the way down. Essentially, you have the ones filling up the diagonal of this matrix right here. So if you go all the way to the nth column or the nth column vector, you have a bunch of zeroes until you get-- you have n minus 1 zeroes, and then the very last component, the nth component there will be a 1. So you have essentially, a matrix with ones down the diagonal. Now, this matrix has a bunch of neat properties and we'll explore it more in the future. But I'm just exposing you to this because it has one very neat property relative to linear transformations. But I'm going to call this the identity matrix and I'll call this I sub n, and I called that sub n because it's an n-by-n identity matrix. I sub 2 would be equal to a 2-by-2 identity matrix, so it would look like that. And I sub 3 would look like this: 1 0 0, 0 1 0, 0 0 1. I think you get the point. Now, the neat thing about this identity matrix becomes evident when you multiply it times any vector. We can multiply this guy times the n-component vector, a member of Rn. So let's do that. So if we multiply this matrix times-- let's call this vector x. This is x1, x2, all the way down to xn, what is this going to be equal to? So this is vector x right here. So if I multiply matrix I, my identity matrix, I sub n, and I multiply it times my vector x, where x is a member of Rn, has n components, what am I going to get? Well, I'm going to get 1 times x1 plus 0 times x2 plus 0 times x3 plus 0 times x4, all of that. So essentially, I'm going to have-- you can kind of view it as this row dotted with the vector. So the only nonzero term is going to be the 1 times the x1. So it's going to be x1-- sorry, let me do it like this. So you're going to get another vector in Rn like that. And so the first term is that row essentially being dotted with that column, and so you just get x1. And then the next entry is going to be this row, or you could view it as the transpose of this row dotted with that column, so 0 times x1 plus 1 times x2 plus 0 times everything else. So the only nonzero term is the 1 times x2, so you get an x2 there. And then you keep doing thatt, and what are you going to get? You're going to get an x3 because the only nonzero term here is the third one and you're going to go all the way down until you get an xn. But what is this thing equal to? This is just equal to x. So the neat thing about this identity matrix that we've created is that when you multiply it times any vector, you got the vector again. The identity matrix times any vector in Rn-- it's only defined for vectors in Rn-- is equal to that vector again. And actually, the columns of the identity matrix have a special-- I guess the set of columns has a special name. So if we call this first column e1 and this second column e2 and the third column e3 and we go all the way to en, these vectors, these column vectors here, the set of these-- so let's say e1, e2, all the way to en-- this is called the standard basis for Rn. So why is it called that? Well, the word basis is there, so two things must be true. These things must span Rn and they must be linearly independent. It's pretty obvious from inspection they're linearly independent. If this guy has a 1 here and no one else has a 1 there, there's no way you can construct that 1 with some combination of the rest of the guys. And you can make that same argument for each of the ones in each of the components. So it's clearly linearly independent. And then to see that you can span, that you can construct any vector with a linear combination of these guys, you just really have to-- you know, whatever vector you want to construct, if you want to construct x1-- let me put it this way. If you want to construct this vector-- let me write it this way. Let me pick a different one. Let's say you want to construct the vector a1, a2, a3 all the way down to an. So this is some member of Rn, you want to construct this vector. Well, the linear combination that would get you this is literally a1 times e1 plus a2 times e2 plus all the way to an times en. This scalar times this first column vector will essentially just get you-- what will this look like? This will look like a1 and then you'd have a bunch of zeroes. You'd have n minus 1 zeroes plus 0 and you'd have an a2 and then you'd have a bunch of zeroes. And then you'd keep doing that, and then you would have a bunch of zeroes, and then you would have an an. Obviously, by our definition of vector addition, you add all these things up, you get this guy right here. And it's kind of obvious, because this right here is the same thing as our identity matrix times a1. I just wanted to expose you to that idea. Now, let's apply what we already know about linear transformations to what we've just learned about this identity matrix. I just told you that I can represent any vector like this. Let me rewrite it in maybe terms of x. I can write any vector x as a linear combination of the standard basis, which are really just the columns of the identity matrix. I can write that as x1 times e1 plus x2 times e2, all the way to xn times en. And remember, each of these column vectors right here, like for e1, is just 1 in the first entry and then all the rest are zeroes. e2 is a 1 in the second entry and everything else is 0. e5 is a 1 in the fifth entry and everything else is 0. And this I just showed you, and this is a bit obvious from this right here. Now, we know that by definition, a linear transformation of x-- let me put it this way. A linear transformation of x, of our vector x, is the same thing as taking the linear transformation of this whole thing-- let me do it in another color-- is equal to the linear transformation of-- actually, instead of using L, let me use T. I used L by accident because I was thinking linear. But if I were take the linear transformation of x, because that's the notation we're used to, that's the same thing as taking a linear transformation of this thing. They're equivalent. So x1 times e1 plus x2 times e2, all the way to plus xn times en. It's equivalent statements. Now, from the definition of linear transformations, we know that this is the same thing, that the transformation of the sum is equal to the sum of the transformation. So this is equal to the transformation of x1 e1 plus the transformation of x2 e2 where this is just any linear transformation. Let me make that very clear. This is any linear transformation. By definition, linear transformations have to satisfy these properties. So the transformation times x2 e2, all the way to this transformation times this last entry, the scalar xn times my standard basis vector en. And we know from the other property of linear transformations that the transformation of a vector multiplied by the scalar is the same thing as the scalar multiplied by the transformation of the vector. That's just from our definition of linear transformations. Plus x2 two times the transformation of e2 plus all the way to xn times the transformation of en. Now, what is this? I could rewrite this, so everything I've done so far, so the transformation of x is equal to that, which just using our properties of linear transformations, all linear transformations, this has to be true for them. I get to this and this is equivalent. This is equal to-- if we view each of these as a column vector, this is equal to what? This is equal to the matrix where this is the first column, T e1. And then the second column is T e2. And then we go all the way to T en times-- let me put it this way-- x1, x2, all the way to xn. We've seen this multiple, multiple times. Now what's really, really, really neat about this is I just started with an arbitrary transformation. And I just showed that an arbitrary linear transformation of x can be rewritten as a product of a matrix where I'm taking that same linear transformation of each of our standard basis vectors, and I can construct that matrix, and multiplying that matrix times my x vector is the same thing as this transformation. So this is essentially showing you that all transformations-- let me be careful. All linear transformations can be a matrix vector product. Not only did I show you that you can do it, but it's actually a fairly straightforward thing to do. This is actually a pretty simple operation to do. Let me show you an example. I don't know, I think this is super neat. Let's say that I just-- I'm just going to make up some transformation. Let's say I have a transformation and it's a mapping between-- let's make it extra interesting-- between R2 and R3. And let's say my transformation, let's say that T of x1 x2 is equal to-- let's say the first entry is x1 plus 3x2, the second entry is 5x2 minus x1, and let's say the third entry is 4x1 plus x2. This is a mapping. I could have written it like this. I could write T of any vector in R2, x1, x2, is equal to-- and maybe this is just redundant, but I think you get the idea. I like this notation better. x1 plus 3x2, 5x2 minus x1, and then 4x1 plus x2. This statement and this statement I just wrote are equivalent. And I like to visualize this a little bit more. Now, I just told you that I can represent this transformation as a matrix vector product. How do I do it? Well, what I do is I take the transformation of this guy. My domain right here is R2, and I produce a vector that's going to be in Rn. So what I do is, let's see. So I'm concerned with multiplying things times vectors in R2. So what we're going to do is we're going to start with the identity matrix, identity 2 because that's my domain and it just looks like this: 1, 0, 0, 1. I'm just going to start with that. And all I do is I apply my transformation to each of the columns, each of my standard bases. These are the standard bases for R2. I showed you that they're bases, how do I know that they're standard? Why are they called the standard bases? And I haven't covered this in a lot of detail right yet, but you could take the dot product of any of these guys with any of the other guys, and you'll see that they're all orthogonal to each other. The dot product of any one of these columns with the other is always zero, so that's a nice clue. And they all have length of 1, so that's a nice reason why they're called the standard bases. But anyway, back to our attempt to represent this transformation as a matrix vector product. So we say look, our domain is in R2, so let's start with I2, or we could call it our 2-by-2 identity matrix. And let's apply the transformation to each of its column vectors where each of its column vectors are a vector in the standard basis for R2. So I'm going to write it like this. The first column is T of this column, and then the second column is going to be T of 0, 1. And I know I'm getting messier with my handwriting. What is T of the vector 1, 0? Well, we just go here. We construct another vector. So we get 1 plus 3 times 0 is 1. Then we get 5 times 0 minus 1, so that's minus 1. x2 is zero in this case. And then we get 4 times 1 plus 0, so that's just 4. So that's T of 1, 0. And then what is T of 0, 1? T of 0, 1 is equal to-- so we have 0 plus 3 times 1 is 3. Then we have 0 minus 1 is minus 1. Let me make sure I did this one right. What was this? This was 5 times 1 minus 1. Yeah, 5 times 0 minus x1, which is 1. Now, this case, it's 5 times-- oh, I have to be careful. This is 5 times x2. x2 is 1. So 5 times 1 minus 0, so it's 5. And then I have 4 times 0 plus x2, plus 1. And I just showed you if I replace each of these standard basis vectors with the transformation of them, what do I get? I get this vector right here. So I already figured out what they are. If I take this guy and evaluate it, it's the vector 1, minus 1, 4. And then this guy is the vector 3, 5, and 1. So what we just did and this is-- I don't know. For some reason, I find this to be pretty amazing. We can now rewrite this transformation here as the product of any vector. So if we define this to be equal to a, or we could write it this way. We can now write our transformation. Our transformation of x1, x2 can now be rewritten as the product of this vector. I'll write it in green. The vector 1, 3, minus 1, 5, 4, 1 times our input vector, x1, x2, which is super cool because now we just have to do a matrix multiplication. Instead of this, and if we have some processor that does this super fast, we can then use that. I don't know, I think this is especially elegant, because what happens here is we applied the transformations to each of the columns of a 2-by-2 matrix, and we got a 3-by-2 matrix. And we know what happens when you multiply a 3-by-2 matrix times a vector that's in R2. Or you can almost view this as a 2-by-1 matrix. You're going to get a vector that is in R3. Because you're going to have these guys times that guy's going to be the first term. These guys times these guys are going to be the second term. These guys times those guys are going to be the third term. So by kind of creating this 3-by-2 matrix, we have actually created a mapping from R2 to R3. Anyway, for some reason I find this to be especially neat. Hopefully, at least you find this somewhat instructive.