If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Multivariable chain rule

This is the simplest case of taking the derivative of a composition involving multivariable functions. Created by Grant Sanderson.

Want to join the conversation?

  • blobby green style avatar for user Aryan Chouhan
    Is this the 3Blue1Brown guy? Sounds really really similar!
    (25 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user steve
    Just want to clarify that this IS the Total Differential? I thought of this as instead of Multivariable Chain Rule, but product rule instead (since chain rule usually implied). Is that a different, but acceptable understanding of it?
    (9 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user Still No Sheep
      Saul has introduced the multivariable chain rule by finding the derivative of a simple multivariable function by applying the single variable chain and product rules. He then rewrites the formula he has used in a manner equivalent to the multivariable chain rule to demonstrate that the multivariable chain rule is equivalent to applying rules that we already know to work.
      (7 votes)
  • aqualine tree style avatar for user White
    I'm surprised by how much the dot product comes up very often in multivar calc. Your essence of linear algebra series was really helpful!
    (7 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user sangitasharma7nov
    grant is baccc
    (5 votes)
    Default Khan Academy avatar avatar for user
  • starky tree style avatar for user {Rayeed}^3
    dx/dx =1 But dx/∂x= ?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • piceratops ultimate style avatar for user Tanzz
      In calculus, "dx" represents an infinitesimal change in the variable "x," and it's often used in the context of finding derivatives. When you write "dx/dx = 1," it means the derivative of "x" with respect to "x" is equal to 1, which is a tautological statement. Essentially, it's saying that a change in "x" with respect to "x" is always 1, which is true because it's a straightforward change in the same variable.

      However, when you write "dx/∂x," you are taking the derivative with respect to a partial derivative (∂x), which typically implies that you are dealing with a function of multiple variables. The partial derivative symbol (∂) is used in multivariable calculus to indicate that you are taking a derivative with respect to one variable while keeping other variables constant.

      So, "dx/∂x" doesn't have a straightforward interpretation without context. The result would depend on the specific function you are differentiating with respect to "x" (∂x) and how it depends on other variables.

      In general, "dx/∂x" is a notation that isn't commonly used because it's somewhat ambiguous. You would typically see "∂f/∂x" to represent the partial derivative of a function "f" with respect to "x."

      Let's consider a simple example of a function of two variables, say, "f(x, y) = x^2 + 2xy + y^2." We can find the partial derivative of this function with respect to "x," denoted as ∂f/∂x:

      f(x, y) = x^2 + 2xy + y^2

      ∂f/∂x is found by treating "y" as a constant and taking the derivative of "f" with respect to "x." The derivative of "x^2" with respect to "x" is "2x," the derivative of "2xy" with respect to "x" is "2y," and the derivative of "y^2" with respect to "x" is 0 because "y" is a constant with respect to "x." So, we have:

      ∂f/∂x = 2x + 2y

      Now, if you want to find "dx/∂x" for this function, you are essentially calculating the reciprocal of ∂f/∂x because "dx" is a small change in "x" and ∂f/∂x represents how "f" changes concerning "x." Therefore:

      dx/∂x = 1 / (2x + 2y)

      This gives you a sense of how "x" changes concerning the change in "x" (dx) for the given function, taking into account how it depends on both "x" and "y."

      So, if you were to evaluate this expression for specific values of "x" and "y," you would find the rate of change of "x" concerning "x" for that point in the function.
      (1 vote)
  • blobby green style avatar for user diamantidisno3
    why f[x(t),y(t)] is considered function of 1 variable ?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Fabian the Panda
    Grant is backkk!
    (2 votes)
    Default Khan Academy avatar avatar for user
  • leafers seed style avatar for user Noah Schwartz
    To visualize f(x(t), y(t)) in 3D space, would t be the length of the curve?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Blackout119
    Hey so quick question. At the very end you write out the Multivariate Chain Rule with the factor "x" leading. However in your example throughout the video ends up with the factor "y" being in front. Would this not be a contradiction since the placement of a negative within this rule influences the result. For example look at -sin(t). This value makes the right side of the addition side negative, so now you are subtracting essentially. If you change this, as you would have to based on your "complete" formula at the end, the negative would now be in front of the addition and now you are adding a positive to a negative.
    Isn't this wrong or am I just off my rocker?
    (0 votes)
    Default Khan Academy avatar avatar for user
  • female robot amelia style avatar for user neamesis
    I can kinda understand why it's a single variable function at so you'd have to use partial derivatives if you differentiated f w.r.t cos(t) or sin(t) right?
    (1 vote)
    Default Khan Academy avatar avatar for user

Video transcript

- [Voiceover] So I've written here three different functions. The first on is a multivariable function, it has a two variable input, x, y, and a single variable output, that's x squared times y, that's just a number, and then the other two functions are each just regular old single variable functions. And what I want to do is start thinking about the composition of them. So, I'm going to take, as the first component, the value of the function x of t, so you pump t through that, and then you make that the first component of f. And the second component will be the value of the function y of t. So, the image that you might have in your head for something like this is you can think of t as just living on a number line of some kind, then you have x and y, which is just the plane, so that will be, you know, your x-coordinate, your y-coordinate, two-dimensional space, and then you have your output, which is just whatever the value of f is. And for this whole function, for this whole composition of functions, you're thinking of xt, yt, as taking a single point in t, and kind of moving it over to two-dimensional space somewhere, and then from there, our multivariable function takes that back down. So, this is just the single variable function, nothing too fancy going on in terms of where you start and where you end up, it's just what's happening in the middle. And what I want to know is what's the derivative of this function. If I take this, and it's just an ordinary derivative, not a partial derivative, because this is just a single variable function, one variable input, one variable output, how do you take it's derivative? And there's a special rule for this, it's called the chain rule, the multivariable chain rule, but you don't actually need it. So, let's actually walk through this, showing that you don't need it. It's not that you'll never need it, it's just for computations like this you could go without it. It's a very useful theoretical tool, a very useful model to have in mind for what function composition looks like and implies for derivatives in the multivariable world. So, let's just start plugging things in here. If I have f(x) and y(t), the first thing I might do is write okay, f, and instead of x of t, just write in cosine of t, since that's the function that I have for x of t, and then y we replace that with sine of t, sine of t, and of course I'm hoping to take the derivative of this. And then from there, we can go to the definition of f, f of xy equals f squared times y, which means we take that first component squared. So we'll take that first component, cosine of t, and then square it, square that guy, and then we'll multiply it by the second component, sine of t, sine of t, and again we're just taking this derivative. And you might be wondering, okay, why am I doing this, you're just showing me how to take a first derivative, an ordinary derivative? But the pattern that we'll see is gonna lead us to the multivariable chain rule. And it's actually kind of surprising when you see it in this context, because it pops out in a way that you might not expect things to pop out. So, continuing our chugging along, when you take the derivative of this, you do the product rule, left d right, plus right d left, so in this case, the left is cosine squared of t, we just leave that as it is, cosine squared of t, and multiply it by the derivative of the right, d right, so that's going to be cosine of t, cosine of t, and then we add to that right, which is, keep that right side unchanged, multiply it by the derivative of the left, and for that we use the chain rule, the single variable chain rule, where you think of taking the derivative of the outside, so you plug two down, like you're taking the derivative of two x, but you're just writing in cosine, instead of x. Cosine t, and then you multiply that by the derivative of the inside, that's a tongue twister, which is negative sine of t, negative sine of t. And I'm afraid I'm gonna run off the edge here, certainly with the many many parentheses that I need. I'll go ahead and rewrite this though. I'm gonna rewrite it anyway because there's a certain pattern that I hope to make clear. So, let me just rewrite this side, let's copy that down here, I just want to rewrite this guy. You might be wondering why, but it'll become clear in just a moment why I want to do this. So, in this case, I'm gonna write this as two times cosine of t, times sine of t, then all of them multiplied by negative sine of t, negative sine of t. So this is the derivative, this is the derivative of the composition of functions that ultimately was a single variable function, but it kind of wind through two different variables. And I just want to make an observation in terms of the partial derivatives of f. So, let me just make a copy of this guy, give ourselves a little bit of room down here, paste that over here. So let's look at the partial derivatives of f for a second here. So, if I took the partial derivative with respect to x, partial x, which means y is treated as a constant. So I take the derivative of x squared to get two x, and then multiply it by that constant, which is just y, and if I also do it with respect to y, get all of them in there. So, now y looks like a variable, x looks like a constant, so x squared also looks like a constant, constant times a variable, the derivative is just that constant. These two, their pattern comes up in the ultimate result that we got. And this is the whole reason that I rewrote it. If you look at this two x y, you can see that over here, where cosine corresponds to x, sine corresponds to y, based on our original functions, and an x squared here corresponds with squaring the x that we put in there. Then if we take the derivative of our two intermediary functions, the ordinary derivative of x, with respect to t, that's derivative of cosine, negative sine of t, and then similarly derivative of y, just the ordinary derivative, no partials going on here, with respect to t, that's equal to cosine, derivative of sine is cosine. And these guys show up, right, you see negative sine over here, and you see cosine show up over here. And we can generalize this, we can write it down and say at least for this specific example, it looks like the derivative of the composition is this part, which is the partial of f with respect to y, right, that's kind of what it looks like here, once we've plugged in the intermediary functions, multiply it by this guy, was the ordinary derivative of y, with respect to t. So, that was the ordinary derivative of y, with respect to t. And then very similarly, this guy was the partial of f, with respect to x, partial x, and we're multiplying it by the ordinary derivative of x of t. So, over here, x of t, with respect to t. And of course, when I write this partial f, partial y, what I really mean is you plug in for x and y, the two coordinate functions, x of t, y of t. So, if I say partial f, partial y over here, what I really mean is you take that x squared and then you plug in x of t squared to get cosine squared. And same deal over here, you're always plugging things in, so you ultimately have a function of t. But this right here has a name, this is the multivariable chain rule. And it's important enough, I'll just write it out all on it's own here. If we take the ordinary derivative, with respect to t, of a composition of a multivariable function, in this case just two variables, x of t, y of t, where we're plugging in two intermediary functions, x of t, y of t, each of which just single variable, the result is that we take the partial derivative, with respect to x, and we multiply it by the derivative of x with respect to t, and then we add to that the partial derivative with respect to y, multiplied by the derivative of y with respect to t. So, this entire expression here is what you might call the simple version of the multivariable chain rule. There's a more general version, and we'll kind of build up to it, but this is the simplest example you can think of, where you start with one dimension, and then you move over to two dimension somehow, and then you move from those two dimensions down to one. So, this is that, and in the next video I'm gonna talk about the intuition for why this is true. You know, here I just went through an example and showed oh but it just happens to be true, it fills this pattern. But there's a very nice line of reasoning for where this comes about, and I'll also talk about a more generalized form, where you'll see it. We start using vector notation, it makes things look very clean, and I might even get around to a more formal argument for why this is true. So, we'll see in next video.