Main content

### Course: Linear algebra > Unit 2

Lesson 1: Functions and linear transformations- A more formal understanding of functions
- Vector transformations
- Linear transformations
- Visualizing linear transformations
- Matrix from visual representation of transformation
- Matrix vector products as linear transformations
- Linear transformations as matrix vector products
- Image of a subset under a transformation
- im(T): Image of a transformation
- Preimage of a set
- Preimage and kernel example
- Sums and scalar multiples of linear transformations
- More on matrix addition and scalar multiplication

© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Linear transformations as matrix vector products

Showing how ANY linear transformation can be represented as a matrix vector product. Created by Sal Khan.

## Want to join the conversation?

- I believe I have watched all videos up to this point and this is the first one that has confused me. It seems to jump straight into transforming a matrix whilst the overall subject has been transforming vectors. Whilst I understand that the matrix can be considered as a collection of column (or row) vectors it doesn't explain the apparent jump to matrix transformation in the usual thorough way. So whilst we started of transforming a vector we appear to have transformed a collection of vectors and used the result to transform the vector! What does the transformed matrix of basis vectors (the transformed I matrix) represent?(10 votes)
- It is not actually the matrix that you transform nor the column vectors of the matrix, it is the vector that you transform by multiplying it by the matrix(11 votes)

- If any matrix-vector multiplication is a linear transformation then how can I interpret the general linear regression equation?
`y = X β`

.

X is the design matrix, β is a vector of the model's coefficients (one for each variable), and y is the vector of predicted outputs for each object.

Let's say X is a 100x2 matrix and β is a 2x1. Then y is a 100x1 matrix.

The concept is clear but from a Linear Transformation point of view what doest it mean?

I take a vector of coefficients in Rˆ2 and through X I transform it into a

vector in Rˆ100. I can't visualize it logically...🤔(4 votes)- A 100x2 matrix is a transformation from 2-dimensional space to 100-dimensional space. So the image/range of the function will be a plane (2D space) embedded in 100-dimensional space. So each vector in the original plane will now also be embedded in 100-dimensional space, and hence be expressed as a 100-dimensional vector.(5 votes)

- I'm confused as to what is being taught here. Is the the lesson saying that the transformation of a vector is equivalent to transforming the original basis and then using the result to transform the vector?(5 votes)
- Yes, the basis for R^2 into a 2x3 matrix using the equations above.(3 votes)

- so, can i just arrange the linear 'instructions' in ascending order of the components of vector x take their coefficients of each term and plug it in to the matrix thats to be multiplied by the x vector ?? seems like a pretty legit shortcut now that i have an intuitive understanding of it(4 votes)
- I agree. The use of the identity matrix is unnecessary. The coefficients of the matrix are directly taken from the transformation specification.(3 votes)

- 15:55what do you call that matrix? Is it the standard matrix?(3 votes)
- That two column one? THere's no proper name really, but it represents a transformation matrix, since you could multiply a vector by it for the transformation shown in the video. Let em know if that didn't answer your question.(3 votes)

- In8:09the statement, from previous discussion I think "The sum equal to the sum of their transformation: " x1T(e1) + x2T(e2) + ...+ xnT(en)

should be written this way: e1T(x1) + e2T(x2) +...+ enT(xn). Can you explain why you put it that way and not like the way I thought it ought be written? Thanks(3 votes)- Remember that the transformation acts on vectors, not scalars. The terms x_1, x_2, etc are all scalars, so by definition, T(x_1) = x_1 always. However, a transformation CHANGES (ie transforms) vectors, so T(e_1) =\= e_1 necessarily. So when he applies T to the sum, he has T(x_1e_1) + ... + T(x_ne_n) = x_1T(e_1) + ... + x_nT(e_n), since you
**can't**simplify it the other way. Hope this helps!(1 vote)

- This series have been helping me a lot and I am thankful for it, but this one made me so confused to the point I got a little desperate.(3 votes)
- At7:40why does Sal want to avoid using L(x)?(3 votes)
- I think it is less that he wants to avoid the notation L(x) and more that since he is talking about an arbitrary linear transformation, it makes sense to use the notation T(x) which is standard when writing/describing a transformation. Hope this helps!(1 vote)

- At11:00and forward, what was the purpose of the last example? I don't think I understand what it's supposed to prove.(2 votes)
- Well as the video name is says it just proves that any linear transformation process can be represented by the Matrix vector product.

Which basically is saying that any transformation from n dimension to m dimension if properly defined could be computed potentially by a matrix vector product (Ax=B) which we have seen in the previous videos.(3 votes)

- Is there going to be further explaination of the matrix multiplication? Because it was shown very fast at the end of this video and it leaves me a bit puzzled.(2 votes)

## Video transcript

Let's say I have an n-by-n
matrix that looks like this. So let me just see if I can
do it in general terms. In the first row and first
column, that entry has a 1, and then everything else, the
rest of the n minus 1 rows in that first column are all
going to be zeroes. So it's going to be zeroes all
the way down to the nth term. And then the second column,
we have a 0 in the first component, but then a 1 in
the second component. And then it goes zeroes
all the way down. And you keep doing this. In the third row, or let me say
third column, although it would've applied to the third
row as well, the 1 shows up in the third component, and then
it's zeroes all the way down. Essentially, you have the ones
filling up the diagonal of this matrix right here. So if you go all the way to
the nth column or the nth column vector, you have a bunch
of zeroes until you get-- you have n minus 1 zeroes,
and then the very last component, the nth component
there will be a 1. So you have essentially,
a matrix with ones down the diagonal. Now, this matrix has a bunch of
neat properties and we'll explore it more in the future. But I'm just exposing you to
this because it has one very neat property relative to
linear transformations. But I'm going to call this the
identity matrix and I'll call this I sub n, and I called that
sub n because it's an n-by-n identity matrix. I sub 2 would be equal to a
2-by-2 identity matrix, so it would look like that. And I sub 3 would look like
this: 1 0 0, 0 1 0, 0 0 1. I think you get the point. Now, the neat thing about this
identity matrix becomes evident when you multiply
it times any vector. We can multiply this guy times
the n-component vector, a member of Rn. So let's do that. So if we multiply this
matrix times-- let's call this vector x. This is x1, x2, all the way down
to xn, what is this going to be equal to? So this is vector
x right here. So if I multiply matrix I, my
identity matrix, I sub n, and I multiply it times my vector x,
where x is a member of Rn, has n components, what
am I going to get? Well, I'm going to get 1 times
x1 plus 0 times x2 plus 0 times x3 plus 0 times
x4, all of that. So essentially, I'm going to
have-- you can kind of view it as this row dotted
with the vector. So the only nonzero term
is going to be the 1 times the x1. So it's going to be x1-- sorry,
let me do it like this. So you're going to get another
vector in Rn like that. And so the first term is that
row essentially being dotted with that column, and
so you just get x1. And then the next entry is going
to be this row, or you could view it as the transpose
of this row dotted with that column, so 0 times x1 plus
1 times x2 plus 0 times everything else. So the only nonzero term
is the 1 times x2, so you get an x2 there. And then you keep doing thatt,
and what are you going to get? You're going to get an x3
because the only nonzero term here is the third one and you're
going to go all the way down until you get an xn. But what is this
thing equal to? This is just equal to x. So the neat thing about this
identity matrix that we've created is that when you
multiply it times any vector, you got the vector again. The identity matrix times any
vector in Rn-- it's only defined for vectors in Rn-- is
equal to that vector again. And actually, the columns of
the identity matrix have a special-- I guess the set of
columns has a special name. So if we call this first column
e1 and this second column e2 and the third column
e3 and we go all the way to en, these vectors, these column
vectors here, the set of these-- so let's say e1, e2,
all the way to en-- this is called the standard
basis for Rn. So why is it called that? Well, the word basis is there,
so two things must be true. These things must span Rn and
they must be linearly independent. It's pretty obvious from
inspection they're linearly independent. If this guy has a 1 here and
no one else has a 1 there, there's no way you can construct
that 1 with some combination of the
rest of the guys. And you can make that same
argument for each of the ones in each of the components. So it's clearly linearly
independent. And then to see that you can
span, that you can construct any vector with a linear
combination of these guys, you just really have to-- you know,
whatever vector you want to construct, if you want
to construct x1-- let me put it this way. If you want to construct
this vector-- let me write it this way. Let me pick a different one. Let's say you want to construct
the vector a1, a2, a3 all the way down to an. So this is some member
of Rn, you want to construct this vector. Well, the linear combination
that would get you this is literally a1 times e1 plus a2
times e2 plus all the way to an times en. This scalar times this first
column vector will essentially just get you-- what will
this look like? This will look like
a1 and then you'd have a bunch of zeroes. You'd have n minus 1 zeroes plus
0 and you'd have an a2 and then you'd have
a bunch of zeroes. And then you'd keep doing that,
and then you would have a bunch of zeroes, and then
you would have an an. Obviously, by our definition
of vector addition, you add all these things up, you get
this guy right here. And it's kind of obvious,
because this right here is the same thing as our identity
matrix times a1. I just wanted to expose
you to that idea. Now, let's apply what we already
know about linear transformations to what we've
just learned about this identity matrix. I just told you that
I can represent any vector like this. Let me rewrite it in
maybe terms of x. I can write any vector x as a
linear combination of the standard basis, which are really
just the columns of the identity matrix. I can write that as x1 times e1
plus x2 times e2, all the way to xn times en. And remember, each of these
column vectors right here, like for e1, is just 1 in the
first entry and then all the rest are zeroes. e2 is a 1
in the second entry and everything else is 0. e5 is a 1 in the fifth entry
and everything else is 0. And this I just showed you, and
this is a bit obvious from this right here. Now, we know that by definition,
a linear transformation of x-- let
me put it this way. A linear transformation of x,
of our vector x, is the same thing as taking the linear
transformation of this whole thing-- let me do it in another
color-- is equal to the linear transformation of--
actually, instead of using L, let me use T. I used L by accident because
I was thinking linear. But if I were take the linear
transformation of x, because that's the notation we're used
to, that's the same thing as taking a linear transformation
of this thing. They're equivalent. So x1 times e1 plus x2 times
e2, all the way to plus xn times en. It's equivalent statements. Now, from the definition of
linear transformations, we know that this is the same
thing, that the transformation of the sum is equal to the sum
of the transformation. So this is equal to the
transformation of x1 e1 plus the transformation of x2 e2
where this is just any linear transformation. Let me make that very clear. This is any linear
transformation. By definition, linear
transformations have to satisfy these properties. So the transformation times
x2 e2, all the way to this transformation times this last
entry, the scalar xn times my standard basis vector en. And we know from the other
property of linear transformations that the
transformation of a vector multiplied by the scalar is the
same thing as the scalar multiplied by the transformation
of the vector. That's just from our definition
of linear transformations. Plus x2 two times the
transformation of e2 plus all the way to xn times the
transformation of en. Now, what is this? I could rewrite this, so
everything I've done so far, so the transformation of x is
equal to that, which just using our properties of linear
transformations, all linear transformations, this has
to be true for them. I get to this and this
is equivalent. This is equal to-- if we view
each of these as a column vector, this is equal to what? This is equal to the matrix
where this is the first column, T e1. And then the second
column is T e2. And then we go all the way to
T en times-- let me put it this way-- x1, x2, all
the way to xn. We've seen this multiple,
multiple times. Now what's really, really,
really neat about this is I just started with an arbitrary
transformation. And I just showed that
an arbitrary linear transformation of x can be
rewritten as a product of a matrix where I'm taking that
same linear transformation of each of our standard basis
vectors, and I can construct that matrix, and multiplying
that matrix times my x vector is the same thing as this
transformation. So this is essentially showing
you that all transformations-- let me be careful. All linear transformations can
be a matrix vector product. Not only did I show you that
you can do it, but it's actually a fairly
straightforward thing to do. This is actually a pretty
simple operation to do. Let me show you an example. I don't know, I think
this is super neat. Let's say that I just-- I'm
just going to make up some transformation. Let's say I have a
transformation and it's a mapping between-- let's make
it extra interesting-- between R2 and R3. And let's say my transformation,
let's say that T of x1 x2 is equal to-- let's
say the first entry is x1 plus 3x2, the second entry is 5x2
minus x1, and let's say the third entry is 4x1 plus x2. This is a mapping. I could have written
it like this. I could write T of any vector
in R2, x1, x2, is equal to-- and maybe this is just
redundant, but I think you get the idea. I like this notation better. x1
plus 3x2, 5x2 minus x1, and then 4x1 plus x2. This statement and this
statement I just wrote are equivalent. And I like to visualize this
a little bit more. Now, I just told you that
I can represent this transformation as a matrix
vector product. How do I do it? Well, what I do is I take the
transformation of this guy. My domain right here is R2, and
I produce a vector that's going to be in Rn. So what I do is, let's see. So I'm concerned with
multiplying things times vectors in R2. So what we're going to do is
we're going to start with the identity matrix, identity 2
because that's my domain and it just looks like
this: 1, 0, 0, 1. I'm just going to
start with that. And all I do is I apply my
transformation to each of the columns, each of my
standard bases. These are the standard
bases for R2. I showed you that they're bases,
how do I know that they're standard? Why are they called the
standard bases? And I haven't covered this in a
lot of detail right yet, but you could take the dot product
of any of these guys with any of the other guys, and you'll
see that they're all orthogonal to each other. The dot product of any one of
these columns with the other is always zero, so that's
a nice clue. And they all have length of 1,
so that's a nice reason why they're called the
standard bases. But anyway, back to our attempt
to represent this transformation as a matrix
vector product. So we say look, our domain is in
R2, so let's start with I2, or we could call it our 2-by-2
identity matrix. And let's apply the
transformation to each of its column vectors where each of
its column vectors are a vector in the standard
basis for R2. So I'm going to write
it like this. The first column is T of this
column, and then the second column is going to
be T of 0, 1. And I know I'm getting messier
with my handwriting. What is T of the vector 1, 0? Well, we just go here. We construct another vector. So we get 1 plus
3 times 0 is 1. Then we get 5 times 0 minus
1, so that's minus 1. x2 is zero in this case. And then we get 4 times 1 plus
0, so that's just 4. So that's T of 1, 0. And then what is T of 0, 1? T of 0, 1 is equal to-- so we
have 0 plus 3 times 1 is 3. Then we have 0 minus
1 is minus 1. Let me make sure I did
this one right. What was this? This was 5 times 1 minus 1. Yeah, 5 times 0 minus
x1, which is 1. Now, this case, it's 5 times--
oh, I have to be careful. This is 5 times x2. x2 is 1. So 5 times 1 minus
0, so it's 5. And then I have 4 times
0 plus x2, plus 1. And I just showed you if I
replace each of these standard basis vectors with the
transformation of them, what do I get? I get this vector right here. So I already figured
out what they are. If I take this guy and evaluate
it, it's the vector 1, minus 1, 4. And then this guy is the
vector 3, 5, and 1. So what we just did and this
is-- I don't know. For some reason, I find this
to be pretty amazing. We can now rewrite this
transformation here as the product of any vector. So if we define this to be equal
to a, or we could write it this way. We can now write our
transformation. Our transformation of x1, x2
can now be rewritten as the product of this vector. I'll write it in green. The vector 1, 3, minus 1, 5, 4,
1 times our input vector, x1, x2, which is super cool
because now we just have to do a matrix multiplication. Instead of this, and if we have
some processor that does this super fast, we
can then use that. I don't know, I think this is
especially elegant, because what happens here is we applied
the transformations to each of the columns of a 2-by-2
matrix, and we got a 3-by-2 matrix. And we know what happens when
you multiply a 3-by-2 matrix times a vector that's in R2. Or you can almost view this
as a 2-by-1 matrix. You're going to get a vector
that is in R3. Because you're going to have
these guys times that guy's going to be the first term. These guys times these
guys are going to be the second term. These guys times those
guys are going to be the third term. So by kind of creating this
3-by-2 matrix, we have actually created a mapping
from R2 to R3. Anyway, for some reason I find
this to be especially neat. Hopefully, at least you find
this somewhat instructive.