Main content

## Multivariable calculus

### Course: Multivariable calculus > Unit 2

Lesson 2: Gradient and directional derivatives- Gradient
- Finding gradients
- Gradient and graphs
- Visual gradient
- Gradient and contour maps
- Directional derivative
- Directional derivative, formal definition
- Finding directional derivatives
- Directional derivatives and slope
- Why the gradient is the direction of steepest ascent

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Why the gradient is the direction of steepest ascent

The way we compute the gradient seems unrelated to its interpretation as the direction of steepest ascent. Here you can see how the two relate. Created by Grant Sanderson.

## Want to join the conversation?

- i did not get the logic in this proof.

consider two vectors (a) and (b).

(imagine you still don't know that this (a).(b)gives a directional derivative)

now (a).(b) is maximum when (b) is in the direction of (a). i completely agree.

but how can you then say (a) must be the direction of steepest ascent.

this is all i understood from this video.

it seems as if he proved the gradient points in the direction of steepest ascent by assuming it in the first place.(101 votes)- I know this question was asked a while ago, but I wanted to give it a shot.

The question we should start by asking is not "Why the gradient is the direction of steepest ascent" but instead "What unit vector gives the direction of steepest ascent at a given point". So, what unit vector gives the direction of steepest ascent at a given point? First off, how do we measure steepness? With slope, which in this context is given by the directional derivative of a point. This means we're looking for the vector that maximizes the directional derivative. So, how do we calculate directional derivative? It's the dot product of the gradient and the vector.

A point of confusion that I had initially was mixing up gradient and directional derivative, and seeing the directional derivative as the magnitude of the gradient. This is not correct at all. Visualizing a plane, a single point has just one vector gradient corresponding to it. However, depending on the direction you are turned, left, right, down, or up, the directional derivative is completely different.

Going back to the problem, we're now looking for a vector that would maximize (gradient) dot (vector) at a specific point. Since we are looking at a single point, the gradient part of it is constant. The vector is the only variable. As you have stated, the maximum value would occur if the vector was in the direction of the gradient.

There you have it. At a given point, the direction of steepest ascent is in the same direction as the gradient. Or, another way of putting it, the gradient is the direction of steepest ascent.(40 votes)

- I found this explanation a bit backwards, this is the way i see it.

By taking partial derivative in [1,0] and [0,1] ( two perpendicular vectors, so everything is covered in 2D plane) we find out how much the function will nudge when x and y increase a little. If a nudge in y direction increased function 4 times and nudge in x direction 1 time, its pretty easy to figure out the best way to "climb" the fastest is to move in ratio 4/1 in y direction relative to x.

Tha'ts all the gradient is, ratio of all possible input/output changes, which we interpret as a vector components.

Directional derivative proves nothing to me but that dot product is the biggest when the angle is smallest. Gradient is the direction of steepest ascent because of nature of ratios of change.

If i want magnitude of biggest change I just take the absolute value of the gradient. If I want the unit vector in the direction of steepest ascent ( directional derivative) i would divide gradient components by its absolute value.(22 votes) - In which direction should you walk to descend the fastest? My homework said it's the negative of the gradient vector but my textbook says when you are moving in the opposite direction of the gradient vector, this results in a minimum rate of change in the direction you're walking- not the maximum.(8 votes)
- Both are correct, but your textbook put it in a way that seems a bit confusing. Moving in the direction of the gradient will give you the greatest rate of
**increase**, and thus going in the opposite direction will give you the greatest rate of**decrease**. And the greatest rate of decrease**is**the minimum rate of change because that is when the rate of change is**most negative**.

As an example, let's say you are hiking up a mountain. Imagine the top of the mountain is to the north, so the gradient points north, imagine it has a magnitude of .5, meaning that for each meter you move north, you will rise .5 meters. So if you walk in the opposite direction, the rate of change will be -.5, and that is the**minimum**of all possible rates of change. If you walk east or west, the rate of change will be 0, which would be the minimum possible**magnitude**for the rate of change. But -.5 is less than 0.(16 votes)

- I think unfortunately I do not have the intuition of how to maximize the dot product. Where can I find the videos mentioned at7:04?(5 votes)
- https://www.khanacademy.org/math/linear-algebra/vectors-and-spaces/dot-cross-products/v/vector-dot-product-and-vector-length Under linear algebra and https://www.khanacademy.org/math/precalculus/vectors-precalc/scalar-multiplication/v/understanding-multiplying-vectors-by-scalars under precalculus (also has one under physics, the dot product is used everywhere)(5 votes)

- This video basically says that gradient is direction of steepest ascent BY DEFINITION of the directional derivative (to be clear, I'm referring to the informal definition of the directional derivative, which is the dot product of directional vector v and gradient). Note that slope and directional derivative (with unit vector direction) are synonymous ideas.

The logic is as follows: "Trust me when I say this, slope is the dot product of gradient and direction. We know that dot product is maximized when the vectors are parallel. Therefore, slope is maximized when direction is parallel to gradient."

What isn't exactly clear to me is why the informal definition itself is a correct way to compute the slope of the function in direction v (I guess it kind of makes sense as it measures a weighted sum of how much I'm taking advantage of going in x direction and how much I'm taking advantage of going in y direction... but it isn't mathematically clear how this weighted sum is reliable measure of the actual graph's slope in that direction) . I DO however understand how the formal limit definition (which was explained in a previous video) is a valid way for computing the slope in a particular direction.

My question is: how is the dot product definition of the directional derivative equivalent to the limit definition of the directional derivative?(7 votes) - I read that the gradient is orthogonal/normal to the tangential plane. How is that possible if it points in the direction of steepest ascent?(5 votes)
- If you look at the method to find a tangent plane, and then the method to find a normal vector to a plane in general you'll see the link. The gradient isn't directly normal, but if you have it in the form <df(A)/dx, df(A)/dy, -1> you get the normal vector. A here is whatever point you are measuring from on the surface. Here is a video from the linear algebra playlist on finding the normal vector from a plane. (Pretend my derivatives are partial derivatives)

https://www.khanacademy.org/math/linear-algebra/vectors-and-spaces/dot-cross-products/v/normal-vector-from-plane-equation

Start at around7:00to be at the point he has a general plane equation, then just keep in mind what A, B and C are. I hope this helped.(3 votes)

- I get the first two of the lectures, but am sorry the rest of the lectures under the gradient & directional derivates topic are not intuitive to me... Quite confusing...(3 votes)
- if you need help in understanding it look up this link, this guy does well:

https://www.youtube.com/watch?v=tDPp5uWSIiU&t=6536s(7 votes)

- The fact that the gradient is in the direction of steepest ascent is inherent to its very definition, so to prove it by reference to the rule for computing directional derivatives seems to me a bit like begging the question. Surely, the gradient points in the direction of steepest ascent because the partial derivatives provide the maximum increases to the value of the function at a point and summing them means advancing in both of their specific directions at the same time.(4 votes)
- Gradient is (slope along purely x, slope along purely y) when we represent it in graph, we can see that vector moves toward x axis as "slope along purely x" increases(i.e vector will point in resultant of balanced slopes along x & y which will be steepest) Am I correct(4 votes)
- If the gradient is the direction of the steepest ascent:

>> gradient(x, y) = [ derivative_f_x(x, y), derivative_f_y(x, y) ]

Then it really confuse me as when calculating the normal line perpendicular to the tangent plane, the formula would be:

>> normal line = (derivative_f_x(x, y), derivative_f_y(x, y), z),

But both derivative_f_x(x,y) & derivative_f_y(x,y) are gradient (the slope of the tangent plane). I don't think the steepest ascent/descent is the slope of the normal line perpendicular to the tangent plane!

For example

Find a vector function for the line normal to x^2 + 2y^2 + 4z^2 = 26 at (2, -3, -1).

Answer: (2 + 4t, -3 -12t, -1 - 8t).

Anyone care to give it a shot and show me the step??

Any information would be much appreciated.

Thanks.(3 votes)

## Video transcript

- [Voiceover] So far,
when I've talked about the gradient of a function, and let's think about this as a multi-variable function
with just two inputs. Those are the easiest to think about. So maybe it's something like
x squared plus y squared, a very friendly function. When I've talked about the gradient, I've left open a mystery. We have the way of computing it, and the way that you
think about computing it is you just take this vector, and you just throw the
partial derivatives in there. Partial with respect to x, and the partial with respect to y, and if it was a higher dimensional input, then the output would have as
many variables as you need. If it was f of x,y,z, you'd have partial x,
partial y, partial z. And this is the way to compute it. But then I gave you a graphical intuition. I said that it points in the
direction of steepest ascent, and maybe the way you think about that is you have your input space, which in this case is the x,y plane, and you think of it as
somehow mapping over to the number line, to your output space, and if you have a given point somewhere, the question is, of all the possible directions that you can move away from this point, all those different
directions you could go, which one of them-- this point will land somewhere on the function, and as you move in the various directions maybe one of them nudges
your output a little bit, one of them nudges it a lot, one of it slides it negative, one of them slides it negative a lot. Which one of these directions results in the greatest
increase to your function? And this was the loose intuition. If you want to think in terms of graphs, we could look over at the
graph of f of x squared, and this is the gradient field. All of these vectors in the x,y plane are the gradients. As you kind of look from below, you can maybe see why each one of these points in
the direction you should move to walk uphill on that graph as fast as you can. If you're a mountain climber, and you want to get to the
top as quickly as possible, these tell you the direction
that you should move to go as quickly. This is why you call it
direction of steepest ascent. So back over here, I don't see the connection immediately, or at least when I was
first learning about it, it wasn't clear why this combination
of partial derivatives has anything to do with
choosing the best direction. And now that we've learned about
the directional derivative, I can give you a little
bit of an intuition. So let's say instead of thinking about all the possible directions, and all of the possible changes to the output that they have, so I'll fill in my line there. Let's say you just have, you've got your point where you're evaluating things, and then you just have a single vector, and let's actually make it a unit vector. Let's make it the case that this guy has a length of one. So I'll go over here, and I'll just think of that guy as being V, and say that V has a length of one, so this is our vector. We know now, having learned about the
directional derivative, that you can tell the rate at which the function changes as you move in this direction by taking a directional derivative of your function, and let's say this point, I don't know, what's a
good name for this point? Just like, a,b. a,b is this point. When you evaluate this at a,b, and the way that you do that is just dotting the gradient of f. I should say dotting it,
evaluate it at that point, 'cause gradient is a
vector valued function, and we just want a specific vector here, so, evaluating that at your point, a,b, together with whatever the vector is, whatever that value is, and in this case we're
thinking of V as a unit vector. So this, this is how you tell the rate of change, and when I originally introduced
the directional derivative, I gave kind of an indication why. If you imagine dotting this together with, let's say it was a vector
that's like one two, really you're thinking
this vector represents one step in the x direction, two steps in the y direction, so the amount that it
changes things should be one times the change caused by a pure step in the x direction, plus two times a change caused by a pure step in the y direction. So that was kind of the loose intuition. You can see the directional
derivative video if you want a little bit
more discussion on that. And this is the formula that you have. But this starts to give us the key for how we could choose the
direction of steepest ascent, 'cause now, what we're really asking, when we say which one of
these changes things the most, maybe when you move in that direction it changes f a little bit negatively, and we want to know, does another vector W, is the change caused by
that gonna be positive? Is it gonna be as big as possible? What we're doing is we're saying find the maximum for all unit vectors, so for all vectors V that satisfy the property
that their length is one, find the maximum of the dot product between f evaluated at that point, evaluated at whatever point we care about, and V. Find that maximum. Well, let's just think about what the dot product represents. So let's say we go over here, and let's say we evaluate
the gradient vector and it turns out that the gradient points in this direction, and maybe, it doesn't
have to be a unit vector, it might be something very long like that. So if you imagine some vector V, some unit vector V, let's say it was taking
off in this direction. The way that you interpret
this dot product, the dot product between the gradient f and this new vector V, is you would project that vector directly, kind of a
perpendicular projection onto your gradient vector, and you'd say what's that length? What's that length right there? And just as an example,
it would be something a little bit less than one, right? 'Cause this is a unit vector. So as an example, let's say that was 0.7. And then you'd multiply that by the length of the gradient itself, of the vector against
which you're dotting, and maybe that guy, maybe the length of the entire gradient vector, just, again, as an
example, maybe that's two. It doesn't have to be,
it could be anything. But the way that you interpret
this whole dot product then is to take the product of those two. You would take 0.7, the length of your projection, times the length of the original vector. And the question is when is this maximized? What unit vector maximizes this? And if you start to imagine maybe swinging that unit vector around, so if, instead of that guy, you were to use one that pointed a little bit more closely in the direction, then it's projection would be a little bit longer. Maybe that projection would
be like 0.75 or something. If you take the unit vector that points directly in the same direction as that full vector, then the length of its projection is just the length of the vector itself. It would be one, because projecting it doesn't
change what it is at all. So it shouldn't be too
hard to convince yourself, and if you have shaky
intuitions on the dot product, I'd suggest finding the videos we have on Khan Academy for those. Sal does a great job
giving that deep intuition. It should kind of make sense why the unit vector that points
in the same direction as your gradient is gonna
be what maximizes it, so the answer here, the answer to what vector maximizes this is gonna be, well, it's
the gradient itself, right? It is that gradient vector evaluated at the point we care about, except you'd normalize it, right? Because we're only
considering unit vectors, so to do that, you just divide it by whatever it's magnitude is. If its magnitude was
already one, it stays one. If its magnitude was two, you're dividing it down by a half. So this is your answer. This is the direction of steepest ascent. So, I think one thing to notice here is the most fundamental fact is that the gradient is this tool for computing directional derivatives. You can think of that vector as something that you really want to dot against, and that's actually a
pretty powerful thought, is that the gradient,
it's not just a vector, it's a vector that loves to be dotted together with other things. That's the fundamental. And as a consequence of that, the direction of steepest ascent is that vector itself because anything, if you're saying what maximizes the dot
product with that thing, it's, well, the vector that points in the same direction as that thing. And this can also give
us an interpretation for the length of the gradient. We know the direction is the direction of steepest ascent, but what is the length mean? So, let's give this guy a name. Let's give this normalized
version of it a name. I'm just gonna call it W. So W will be the unit vector that points in the
direction of the gradient. If you take the directional derivative in the direction of W of f, what that means is the gradient of f dotted with that W. And if you kind of spell out what W means here, that means you're taking
the gradient of the vector dotted with itself, but because it's W and not the gradient, we're normalizing. We're dividing that,
not by magnitude of f, that doesn't really make sense, but by the value of the gradient, and all of these, I'm just writing gradient of f, but maybe you should be thinking about gradient of f evaluated at a,b, but I'm just being kind of lazy, and just writing gradient of f. And the top, when you take
the dot product with itself, what that means is the square of its magnitude. But the whole thing is
divided by the magnitude. So you can kind of cancel that out. You could say this
doesn't need to be there, that exponent doesn't need to be there, and basically, the directional derivative in the direction of the gradient itself has a value equal to the
magnitude of the gradient. So this tells you when you're
moving in that direction, in the direction of the gradient, the rate at which the function changes is given by the magnitude of the gradient. So it's this really magical vector. It does a lot of things. It's the tool that lets you
dot against other vectors to tell you the directional derivative. As a consequence, it's the
direction of steepest ascent, and its magnitude tells
you the rate at which things change while you're moving in that direction of steepest ascent. It's just really a core part of scalar valued
multi-variable functions, and it is the extension of the derivative in every sense that you could
want a derivative to extend.