Main content

# Chain rule proof

Here we use the formal properties of continuity and differentiability to see why the chain rule is true.

## Want to join the conversation?

- " lim_{Δx->0} (Δy/Δu) * (Δu)(Δx) "

But if Δu=0 (even when Δx ≠ 0 ), you'd be dividing by zero ? This reasoning suggests that the chain rule is true but I don't think it's rigorous enough.(22 votes)- You are correct. This is the "intuitive" proof. There is a rigorous proof, the chain rule is sound.

To prove the Chain Rule correctly you need to show that if f(u) is a differentiable function of u and u = g(x) is a differentiable function of x, then the composite y=f(g(x)) is a differentiable function of x. Since a function is differentiable if and only if it has a derivative at each point in its domain, it must be shown that whenever g is differentiable at xₒ and f is differentiable at g(xₒ) , then the composite is differentiable at xₒ and the derivative of the composite satisfies the equation:

dy/dx, = f'(g(xₒ))·g'(xₒ) (when x=xₒ,)

Good eye!(35 votes)

- Is it true that every type of derivative is actually also a chain rule on top of whatever other type it is? I've always had the sneaking suspicion that this is true and I haven't yet found a counterexample, but in math you need formal proofs. Let me provide an example of what I am talking about, when I take the derivative of f(x) = 3x^2 and get my result of d/dx(x) = 2*3x^(2-1) = 6x, isn't that still a chain rule, but I just didn't have to type out the second part of the chain rule because the derivative of the inside x is just 1, so that would have made the (complete) way to take the derivative using the chain rule and power rule d/dx(3x^2) = [2*3x^(2-1)] * (d/dx(x)) = 6x * 1 = 6x. I get the same result, but it shows that the chain rule still holds for different types of derivatives besides just standard chain rule problems. But does this discovery hold for every derivative?(14 votes)
- Yes, the chain rule applies to all derivatives (at least all of the derivatives of the type you deal with in an introductory course such as this). However, as you point out, we often get trivial results from the chain rule that we don't need to show explicitly.(11 votes)

- Should not the function y be differentiable at u(x) and not x?(11 votes)
- can't you cancel out du directly, like you would cancel out the 2 in 2/3*1/2= 1/3 for dy/du*du/dx= dy/dx?(2 votes)
- At0:57, Sal says d/dx [y(u(x))] = (dy/du) * (du/dx). Shouldn't this be (d [y(u(x))]/du) * (du/dx)? Because Sal is implying that d/dx [f(g(x))] = (d [f(x)]/d [g(x)]) * (d [g(x)])/dx.(3 votes)
- So you wrote: "Shouldn't this be (d [y(u(x))]/du) * (du/dx)?"

Try this instead: ( d [y(u(x)]/d[u(x)] ) * ( d[u(x)] / dx )

Notice I included a the whole "u(x)" in place of the lone "u" that you sometimes wrote. By writing just "u", you just used a short hand notation for "u(x)". This is what Sal is doing doing by writing dy/du, instead of: ( d[y(u(x))] / d[u(x)] ) * ( d[u(x)] / dx ). Either way is fine, if you know that y is short for y(u(x)).(4 votes)

- why does f'(g(x)) equal to dy/du, can someone please explain, thanks a lot.☻(2 votes)
- at0:08sal calls the chain rule infamous, just asking but why(3 votes)
- Some people find it hard to remember, or maybe just don't like it for some weird reason.(3 votes)

- By the way, what does dy/du mean? It doesn't make sense to me because u is a function not a variable...(3 votes)
- Exesssr is incorrect, Sal is talking about differentiating y(u(x)) with respect to u(x). A function is a dependent variable with only one value for any given value of the independent variable. Therefore, y(u(x)) is a variable dependent on u(x), which in turn is a variable dependent on x. u(x) is not nessesarily equal to x, however, so dy/du /= dy/dx. In fact, dy/du * du/dx = dy/dx, so dy/du only equals dy/dx when du/dx equals 1 (du/dx can only be constantly equal 1 if u=x).(2 votes)

- Why can't I just say that dx, dy and du are infinitesimal changes and hence directly prove the chain rule by multiplication and division of du?

dy/dx = dy/du . du/dx.

Even if 'd' corresponds to a very,very small change, still it is a change in variable and I should be able to do algebraic manipulation?(2 votes)- Derivatives can be defined in two ways, using limits or using infinitesimals. We cannot assume that the infinitesimal change du in dy/du is equal to the infinitesimal change du in du/dx. However, assuming both dy/du and du/dx are differentiable, the standard part of dy/du and du/dx must be constant for ANY infinitesimal du or dx. In order to be differentiable, the standard part of dy/du and du/dx must also be defined, and therefore du in du/dx must be infinitesimal when dx is infinitesimal. Therefore, we can define du in dy/du to be the infinitesimal change of du for a given dx, knowing that it will be the same for any other du. From there, we can algebraicly solve dy/du * du/dx to get dy/dx. Sal essentially did the same thing that I'm doing here, except he used limits instead of infinitesimals and worked in reverse order.(2 votes)

- I came up with this alternative proof:

We know that: df/dt=f'(t) <=> df=f'(t)*dt

Now, if t itself is a function of another variable x then we have that: t=t(x)=g(x). Also dt=dg (that is, an infinitesimal change in t results in an infinitesimal change in g)

if we plug this into the first equation we have that: df=f'(g(x))*dg

Then we divide both sides by dx: df/dx=f'(g(x))*dg/dx=f'(g(x))*g'(x)(2 votes)- That's a good "intuitive" reason for why the chain rule should be true; however it fails complete mathematical rigor, as what happens if dx = 0? The complete proof is a slight modification of yours, creating a piecewise function (called fudge function) for this case.(1 vote)

## Video transcript

- What I hope to do in this video is a proof of the famous and useful and somewhat elegant and
sometimes infamous chain rule. And, if you've been
following some of the videos on "differentiability implies continuity", and what happens to a continuous function as our change in x, if x is
our independent variable, as that approaches zero, how the change in our function approaches zero, then this proof is actually
surprisingly straightforward, so let's just get to it, and this is just one of many proofs of the chain rule. So the chain rule tells us that if y is a function of u, which is a function of x, and we want to figure out
the derivative of this, so we want to differentiate
this with respect to x, so we're gonna differentiate
this with respect to x, we could write this as the derivative of y with respect to x, which is going to be
equal to the derivative of y with respect to u, times the derivative
of u with respect to x. This is what the chain rule tells us. But how do we actually
go about proving it? Well we just have to remind ourselves that the derivative of
y with respect to x... the derivative of y with respect to x, is equal to the limit as
delta x approaches zero of change in y over change in x. Now we can do a little bit of
algebraic manipulation here to introduce a change
in u, so let's do that. So this is going to be the same thing as the limit as delta x approaches zero, and I'm gonna rewrite
this part right over here. I'm gonna essentially divide and multiply by a change in u. So I could rewrite this as delta y over delta u times delta u, whoops... times delta u over delta x. Change in y over change in u, times change in u over change in x. And you can see, these are
just going to be numbers here, so our change in u, this
would cancel with that, and you'd be left with
change in y over change x, which is exactly what we had here. So nothing earth-shattering just yet. But what's this going to be equal to? What's this going to be equal to? Well the limit of the product is the same thing as the
product of the limit, so this is going to be the same thing as the limit as delta x approaches zero of,
and I'll color-coat it, of this stuff, of delta y over delta u, times-- maybe I'll put parentheses around it, times the limit... the limit as delta x approaches zero, delta x approaches zero, of this business. So let me put some parentheses around it. Delta u over delta x. So what does this simplify to? Well this right over here,
this is the definition, and if we're assuming, in
order for this to even be true, we have to assume that u and y are differentiable at x. So we assume, in order
for this to be true, we're assuming... we're assuming y comma
u are differentiable... are differentiable at x. And remember also, if
they're differentiable at x, that means they're continuous at x. But if u is differentiable at x, then this limit exists, and
this is the derivative of... this is u prime of x, or du/dx, so this right over here... we can rewrite as du/dx, I think you see where this is going. Now this right over here, just looking at it the way
it's written out right here, we can't quite yet call this dy/du, because this is the limit
as delta x approaches zero, not the limit as delta u approaches zero. But we just have to remind ourselves the results from, probably,
the previous video depending on how you're watching it, which is, if we have a function u that is continuous at a point, that, as delta x approaches zero, delta u approaches zero. So we can actually rewrite this... we can rewrite this right over here, instead of saying delta x approaches zero, that's just going to have the effect, because u is differentiable at x, which means it's continuous at x, that means that delta u
is going to approach zero. As our change in x gets smaller
and smaller and smaller, our change in u is going to get smaller and smaller and smaller. So we can rewrite this, as our change in u approaches zero, and when we rewrite it like that, well then this is just dy/du. This is just dy, the derivative
of y, with respect to u. So just like that, if we assume y and u are differentiable at x, or you could say that
y is a function of u, which is a function of x, we've just shown, in
fairly simple algebra here, and using some assumptions about differentiability and continuity, that it is indeed the case that the derivative of y with respect to x is equal to the derivative
of y with respect to u times the derivative
of u with respect to x. Hopefully you find that convincing.