Main content

## Multivariable calculus

### Course: Multivariable calculus > Unit 2

Lesson 5: Multivariable chain rule- Multivariable chain rule
- Multivariable chain rule intro
- Multivariable chain rule intuition
- Multivariable chain rule
- Vector form of the multivariable chain rule
- Multivariable chain rule and directional derivatives
- More formal treatment of multivariable chain rule

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Vector form of the multivariable chain rule

The multivariable chain rule is more often expressed in terms of the gradient and a vector-valued derivative. This makes it look very analogous to the single-variable chain rule. Created by Grant Sanderson.

## Want to join the conversation?

- if (vector v. f.) v = [x(t),y(t)] and f(x,y)=xy

is f(x(t), y(t)) = xy = f(v(t))?(4 votes) - isn't the gradient a row vector? Otherwise how can you take the dot product(2 votes)
- It kind of depends. We can represent the gradient as a row or column vector, but like you mentioned we need to always be sure to line up our matrices in the right way as to make our matrix multiplication work out! :-)(2 votes)

- At2:15how come when we take the derivative of the vector valued function on the left side we get a vector of the respective derivatives of the variables, but when we take the derivative of the parametric equation on the right side we get a dot product of the gradient with the vector of the derivatives of the variables? I though the vector valued function and the parametric equation were just the same thing in different notation? I'm missing something somewhere!(2 votes)
- Hi Jordan, as you are saying vector valued function v(t) = x(t) i + y(t) j, and parametric equations x(t), y(t) which define xy plane, both are same because they both defining position of a point on xy plane.

But at right hand side,**there is a new introduced function f, inputs are the xy plane points**, so in the video grant first take parametric equation of x and y coordinates and write the chain rule equation, then transform it as input of into vector valued functions.(2 votes)

- I think Grand has to be careful with putting a vector into the gradient, since it's quit confusing to take partial derivative of a vector.(2 votes)
- Can you write all parametric functions as a vector function? How?(1 vote)
- Yes!

Let's say that you have a parametric equation`p(t) = {h(t)=2t, L(t)=3.5t+4, b(t)=7`

You'd define the vector function to be`r(t) = <h(t), L(t), b(t)>`

You take the 1-nth functions in the parametric function and make them the corresponding 1-nth elements of the vector function.

I hope this helps!(1 vote)

- Is the the gradient of f evaluated with v(t) (where v is a vector-valued function) the same as gradient of f evaluated with a point, where the point's x, y, z, etc coordinates are just the corresponding components of v(t)?(1 vote)
- Why wouldn't one just substitute v into f, creating a scalar valued single-variable function, and then take the derivative normally?(1 vote)
- I asked this exact same question in my Calc III class. The answer is yes, of course you could and make your life easier, but that doesn't show you the importance of learning how to do the multivariable chain rule. It's important to learn because you aren't always given the definitions of the other functions. Sometimes you are given one multivariable function and you are asked to find the partial of one variable with respect to another, and this is a case where you don't simply have a function to plug in to another (a fun example, the partial of pressure with respect to volume in the van der Waals equation).(1 vote)

- At2:30instead of writing:

v'(t)

Can you write:

∇v(t) ?(0 votes)- I have the same question. I think we could, but I'm not sure. Did you find an answer in the last ~7 months? If you did, please let me know!(1 vote)

## Video transcript

- [Voiceover] So in the
last couple of videos, I talked about the
multi-variable chain rule, which I have up here, and if you haven't seen those go take a look. Here and I want to write
it out in vector notation, and this helps us generalize
it a little bit when the intermediary space is a
little bit higher dimensional. So, instead of writing X of T and Y of T as separate functions,
and just trying to emphasize "oh they have the same input space, and whatever X takes in that's
the same number Y takes in." It's better and a little bit cleaner if we say there's a vector valued function that takes in a single number "T," then it outputs some kind of vector. In this case you could say the components of V are X of T and Y
of T, and that's fine. But I want to talk about
what this looks like if we start writing
everything in vector notation, and just since we see
DX/DT here and DY/DT here, you might start thinking,
"oh we should take the derivative of that
vector valued function." The derivative of V, with respect to T, and when we compute this
it's nothing more than taking the derivatives of each component. So in this case, the derivative of X, so you'd write DX/DT,
and the derivative of Y, DY/DT. This is the
vector value derivative. And now you might start
to notice something here. Okay so we've got one of
those components multiplied by a certain value and
another component multiplied by a certain value, you might recognize this as a dot product. This would be the dot product between the vector that contains the derivatives, the partial derivatives,
partial of F with respect to Y, partial of F with respect to X, oh, whoops, don't know why
I wrote it that way, but up here that's with respect
to X, and then here to Y. So this whole thing, we're
taking the dot product with the vector that contains
ordinary derivative DX/DT and ordinary derivative DY/DT. And of course both of
these are special vectors, they're not just random, the left one, that's the gradient of F,
and the right vector here that's what we just wrote that's the derivative of V with respect to T, just for being quick I'm gonna
write that as V prime of T. That's saying completely
the same thing as VDVT, and this right here is
another way to write the multi-variable chain rule, and maybe if you were being
a little bit more exact you would emphasize that when
you take the gradient of F the thing that you input
into it is the output of that vector valued function, you know you're throwing
in X of T and Y of T, so you might emphasize
that you take in that as an input, and then
you multiply it by the derivative, the vector
valued derivative of V of T. And when I say multiply,
I mean dot product, right, these are vectors and you're
taking the dot product, it should seem very familiar to, you know, the single-variable chain rule. And just to remind us
I'll throw it up here, if you take the derivative
of composition of two single-variable functions F of G, you take the derivative
of the outside F prime, and throw in G, throw in what
was the interior function, and you multiply it by the derivative of that interior
function, G prime of T. And this is super helpful
in single-variable calculus for computing a lot of derivatives, and over here it has a
very similar form right? The gradient which really
serves the function of the true extension
of the derivative for multi-variable functions for
scalar valued multi-variable functions at least. You
take that derivative and throw in the inner function, which just happens to be
a vector valued function. You throw it in there,
and then you multiply it by the derivative of that,
but multiplying vectors in this context means taking
the dot product of the two, and this could mean if you have a function with a whole bunch of different variables, let's say you have some
F of X, or not F of X, F of X1 and X2 and it takes in a whole bunch of variables
that it goes out to X100. And then what you throw into
it is the vector value function that's vector valued,
takes in a single variable, and in order to be able to compose them it's gonna have a whole bunch
of intermediary functions, and you can write it as X1, X2,
X3, all the way up to X100 , and these are all functions at this point. These are component functions
of your vector valued V. This expression still makes sense, right? You can still take the gradient of F, it's gonna have 100
components, you can plug in any vector, any set
of 100 different numbers, and in particular the output of a vector valued function with
100 different components is gonna work, and then
you take the dot product with the derivative of
this. That's the more general version of the
multi-variable chain rule, and then the cool way
about writing it like this, you can interpret it in terms
of the directional derivative, and I think I'll do
that in the next video, so, that's a certain way to interpret this with a directional derivative.