Main content

## Multivariable calculus

### Course: Multivariable calculus > Unit 2

Lesson 5: Multivariable chain rule- Multivariable chain rule
- Multivariable chain rule intro
- Multivariable chain rule intuition
- Multivariable chain rule
- Vector form of the multivariable chain rule
- Multivariable chain rule and directional derivatives
- More formal treatment of multivariable chain rule

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Multivariable chain rule

This is the simplest case of taking the derivative of a composition involving multivariable functions. Created by Grant Sanderson.

## Want to join the conversation?

- Is this the 3Blue1Brown guy? Sounds really really similar!(25 votes)
- Yes, looks like he is. When I google his name, the first result is 3blue1brown.(20 votes)

- Just want to clarify that this IS the Total Differential? I thought of this as instead of Multivariable Chain Rule, but product rule instead (since chain rule usually implied). Is that a different, but acceptable understanding of it?(9 votes)
- Saul has introduced the multivariable chain rule by finding the derivative of a simple multivariable function by applying the single variable chain and product rules. He then rewrites the formula he has used in a manner equivalent to the multivariable chain rule to demonstrate that the multivariable chain rule is equivalent to applying rules that we already know to work.(7 votes)

- I'm surprised by how much the dot product comes up very often in multivar calc. Your essence of linear algebra series was really helpful!(7 votes)
- dx/dx =1 But dx/∂x= ?(3 votes)
- In calculus, "dx" represents an infinitesimal change in the variable "x," and it's often used in the context of finding derivatives. When you write "dx/dx = 1," it means the derivative of "x" with respect to "x" is equal to 1, which is a tautological statement. Essentially, it's saying that a change in "x" with respect to "x" is always 1, which is true because it's a straightforward change in the same variable.

However, when you write "dx/∂x," you are taking the derivative with respect to a partial derivative (∂x), which typically implies that you are dealing with a function of multiple variables. The partial derivative symbol (∂) is used in multivariable calculus to indicate that you are taking a derivative with respect to one variable while keeping other variables constant.

So, "dx/∂x" doesn't have a straightforward interpretation without context. The result would depend on the specific function you are differentiating with respect to "x" (∂x) and how it depends on other variables.

In general, "dx/∂x" is a notation that isn't commonly used because it's somewhat ambiguous. You would typically see "∂f/∂x" to represent the partial derivative of a function "f" with respect to "x."

Let's consider a simple example of a function of two variables, say, "f(x, y) = x^2 + 2xy + y^2." We can find the partial derivative of this function with respect to "x," denoted as ∂f/∂x:

f(x, y) = x^2 + 2xy + y^2

∂f/∂x is found by treating "y" as a constant and taking the derivative of "f" with respect to "x." The derivative of "x^2" with respect to "x" is "2x," the derivative of "2xy" with respect to "x" is "2y," and the derivative of "y^2" with respect to "x" is 0 because "y" is a constant with respect to "x." So, we have:

∂f/∂x = 2x + 2y

Now, if you want to find "dx/∂x" for this function, you are essentially calculating the reciprocal of ∂f/∂x because "dx" is a small change in "x" and ∂f/∂x represents how "f" changes concerning "x." Therefore:

dx/∂x = 1 / (2x + 2y)

This gives you a sense of how "x" changes concerning the change in "x" (dx) for the given function, taking into account how it depends on both "x" and "y."

So, if you were to evaluate this expression for specific values of "x" and "y," you would find the rate of change of "x" concerning "x" for that point in the function.(1 vote)

- why f[x(t),y(t)] is considered function of 1 variable ?(2 votes)
- 1 input to 2 outputs, 2 outputs to 1 input. the 2 outputs is a process, while there's only 1 input and 1 output.(3 votes)

- Grant is backkk!(2 votes)
- To visualize f(x(t), y(t)) in 3D space, would t be the length of the curve?(2 votes)
- Hey so quick question. At the very end you write out the Multivariate Chain Rule with the factor "x" leading. However in your example throughout the video ends up with the factor "y" being in front. Would this not be a contradiction since the placement of a negative within this rule influences the result. For example look at -sin(t). This value makes the right side of the addition side negative, so now you are subtracting essentially. If you change this, as you would have to based on your "complete" formula at the end, the negative would now be in front of the addition and now you are adding a positive to a negative.

Isn't this wrong or am I just off my rocker?(0 votes)- Isn't adding a positive to a negative same as adding a negative to a positive?

a + (-b) = (-b) + a(4 votes)

- I can kinda understand why it's a single variable function at1:24so you'd have to use partial derivatives if you differentiated f w.r.t cos(t) or sin(t) right?(1 vote)

## Video transcript

- [Voiceover] So I've written here three different functions. The first on is a multivariable function, it has a two variable input, x, y, and a single variable output, that's x squared times y, that's just a number, and then the other two
functions are each just regular old single variable functions. And what I want to do is start thinking about the composition of them. So, I'm going to take,
as the first component, the value of the function x of t, so you pump t through that, and then you make that
the first component of f. And the second component will be the value of the function y of t. So, the image that you
might have in your head for something like this is you can think of t as just living on a
number line of some kind, then you have x and y, which is just the plane, so that will be, you know, your x-coordinate, your y-coordinate, two-dimensional space, and then you have your output, which is just whatever the value of f is. And for this whole function, for this whole composition of functions, you're thinking of xt, yt, as taking a single point in t, and kind of moving it over to two-dimensional space somewhere, and then from there, our
multivariable function takes that back down. So, this is just the
single variable function, nothing too fancy going on
in terms of where you start and where you end up, it's just what's happening in the middle. And what I want to know is
what's the derivative of this function. If I take this, and it's
just an ordinary derivative, not a partial derivative, because this is just a
single variable function, one variable input, one variable output, how do you take it's derivative? And there's a special rule for this, it's called the chain rule, the multivariable chain rule, but you don't actually need it. So, let's actually walk through this, showing that you don't need it. It's not that you'll never need it, it's just for computations like this you could go without it. It's a very useful theoretical tool, a very useful model to have in mind for what function composition looks like and implies for derivatives
in the multivariable world. So, let's just start
plugging things in here. If I have f(x) and y(t), the first thing I might do is write okay, f, and instead of x of t,
just write in cosine of t, since that's the function
that I have for x of t, and then y we replace that with sine of t, sine of t, and of course I'm hoping to
take the derivative of this. And then from there, we can
go to the definition of f, f of xy equals f squared times y, which means we take that
first component squared. So we'll take that first
component, cosine of t, and then square it, square that guy, and then we'll multiply it
by the second component, sine of t, sine of t, and again we're just
taking this derivative. And you might be wondering, okay, why am I doing this, you're just showing me how
to take a first derivative, an ordinary derivative? But the pattern that
we'll see is gonna lead us to the multivariable chain rule. And it's actually kind of
surprising when you see it in this context, because it pops out in a way
that you might not expect things to pop out. So, continuing our chugging along, when you take the derivative of this, you do the product rule, left d right, plus right d left, so in this case, the left
is cosine squared of t, we just leave that as it is, cosine squared of t, and multiply it by the
derivative of the right, d right, so that's going to be cosine of t, cosine of t, and then we add to that right, which is, keep that right side unchanged, multiply it by the derivative of the left, and for that we use the chain rule, the single variable chain rule, where you think of taking the
derivative of the outside, so you plug two down, like you're taking the
derivative of two x, but you're just writing
in cosine, instead of x. Cosine t, and then you multiply that by
the derivative of the inside, that's a tongue twister, which is negative sine of t, negative sine of t. And I'm afraid I'm gonna
run off the edge here, certainly with the many many
parentheses that I need. I'll go ahead and rewrite this though. I'm gonna rewrite it anyway because there's a certain pattern
that I hope to make clear. So, let me just rewrite this side, let's copy that down here, I just want to rewrite this guy. You might be wondering why, but it'll become clear in just a moment why I want to do this. So, in this case, I'm gonna write this as two times cosine of t, times sine of t, then all of them multiplied
by negative sine of t, negative sine of t. So this is the derivative, this is the derivative of
the composition of functions that ultimately was a
single variable function, but it kind of wind through
two different variables. And I just want to make an observation in terms of the partial derivatives of f. So, let me just make a copy of this guy, give ourselves a little
bit of room down here, paste that over here. So let's look at the
partial derivatives of f for a second here. So, if I took the partial
derivative with respect to x, partial x, which means y is treated as a constant. So I take the derivative
of x squared to get two x, and then multiply it by that constant, which is just y, and if I also do it with respect to y, get all of them in there. So, now y looks like a variable, x looks like a constant, so x squared also looks like a constant, constant times a variable, the derivative is just that constant. These two, their pattern comes
up in the ultimate result that we got. And this is the whole
reason that I rewrote it. If you look at this two x y, you can see that over here, where cosine corresponds to x, sine corresponds to y, based
on our original functions, and an x squared here corresponds with squaring
the x that we put in there. Then if we take the derivative
of our two intermediary functions, the ordinary derivative
of x, with respect to t, that's derivative of cosine, negative sine of t, and then similarly derivative of y, just the ordinary derivative,
no partials going on here, with respect to t, that's equal to cosine, derivative of sine is cosine. And these guys show up, right, you see negative sine over here, and you see cosine show up over here. And we can generalize this, we can write it down and say at least for this specific example, it looks like the derivative
of the composition is this part, which is the partial of f with respect to y, right, that's kind of
what it looks like here, once we've plugged in the
intermediary functions, multiply it by this guy, was
the ordinary derivative of y, with respect to t. So, that was the ordinary derivative of y, with respect to t. And then very similarly, this guy was the partial of f, with respect to x, partial x, and we're multiplying it
by the ordinary derivative of x of t. So, over here, x of t, with respect to t. And of course, when I write
this partial f, partial y, what I really mean is
you plug in for x and y, the two coordinate functions, x of t, y of t. So, if I say partial
f, partial y over here, what I really mean is you take that x squared
and then you plug in x of t squared to get cosine squared. And same deal over here, you're always plugging things in, so you ultimately have a function of t. But this right here has a name, this is the multivariable chain rule. And it's important enough, I'll just write it out
all on it's own here. If we take the ordinary
derivative, with respect to t, of a composition of a
multivariable function, in this case just two variables, x of t, y of t, where we're plugging in
two intermediary functions, x of t, y of t, each of which just single variable, the result is that we take
the partial derivative, with respect to x, and we multiply it by the derivative of x with respect to t, and then we add to that the partial derivative with respect to y, multiplied by the derivative
of y with respect to t. So, this entire expression here is what you might call the simple version of the multivariable chain rule. There's a more general version, and we'll kind of build up to it, but this is the simplest
example you can think of, where you start with one dimension, and then you move over
to two dimension somehow, and then you move from those
two dimensions down to one. So, this is that, and in the next video I'm gonna talk about the intuition for why this is true. You know, here I just went
through an example and showed oh but it just happens to be true, it fills this pattern. But there's a very nice line of reasoning for where this comes about, and I'll also talk about
a more generalized form, where you'll see it. We start using vector notation, it makes things look very clean, and I might even get around
to a more formal argument for why this is true. So, we'll see in next video.