If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Directional derivatives (introduction)

How does the value of a multivariable function change as you nudge the input in a specific direction?

What we're building to

  • If you have some multivariable function, f(x,y) and some vector in the function's input space, v, the directional derivative of f along v tells you the rate at which f will change while the input moves with velocity vector v.
  • The notation here is vf, and it is computed by taking the dot product between the gradient of f and the vector v, that is, fv
  • When the directional derivative is used to compute slope, be sure to normalize the vector v first.

Generalizing partial derivatives

Consider some multivariable function:
f(x,y)=x2xy
We know that the partial derivatives with respect to x and y tell us the rate of change of f as we nudge the input either in the x or y direction.
The question now is what happens when we nudge the input of f in a direction which is not parallel to the x or y axes.
For example, the image below shows the graph of f along with a small step along a vector v in the input space, meaning the xy-plane in this case. Is there an operation which tells us how the height of the graph above the tip of v compares to the height of the graph above its tail?
As you have probably guessed, there is a new type of derivative, called the directional derivative, which answers this question.
Just as the partial derivative is taken with respect to some input variable—e.g., x or y—the directional derivative is taken along some vector v in the input space.
One very helpful way to think about this is to picture a point in the input space moving with velocity v. The directional derivative of f along v is the resulting rate of change in the output of the function. So, for example, multiplying the vector v by two would double the value of the directional derivative since all changes would be happening twice as fast.

Notation

There are quite a few different notations for this one concept:
  • vf
  • fv
  • fv
  • Dvf
  • vf
All of these represent the same thing: the rate of change of f as you nudge the input along the direction of v. We'll use the vf notation, just because it subtly hints at how you compute the directional derivative using the gradient, which you'll see in a moment.

Example 1: v=j^

Before jumping into the general rule for computing vf, let's look at how we can rewrite the more familiar notion of a partial derivative as a directional derivative.
For example, the partial derivative fy tells us the rate at which f changes as we nudge the input in the y direction. In other words, as we nudge it along the vector j^. Therefore, we could equivalently write the partial derivative with respect to y as fy=j^f.
This is all just fiddling with different notation. What's more important is to have a clear mental image of what all this notation​ represents.
Reflection Question: Suppose v=i^+j^, what is your best guess for vf?

How to compute the directional derivative

Let's say you have a multivariable f(x,y,z) which takes in three variables—x, y and z—and you want to compute its directional derivative along the following vector:
v=[231]
The answer, as it turns out, is
vf=2fx+3fy+(1)fz
This should make sense because a tiny nudge along v can be broken down into two tiny nudges in the x-direction, three tiny nudges in the y-direction, and a tiny nudge backwards, by 1, in the z-direction. We'll go through the rigorous reasoning behind this much more thoroughly in the next article.
More generally, we can write the vector v abstractly as follows:
v=[v1v2v3]
The directional derivative looks like this:
vf=v1fx+v2fy+v3fz
That is, a tiny nudge in the v direction consists of v1 times a tiny nudge in the x-direction, v2 times a tiny nudge in the y-direction, and v3 times a tiny nudge in the z-direction.
This can be written in a super-pleasing compact way using the dot product and the gradient:
=vf(x,y,z)=v1fx(x,y,z)+v2fy(x,y,z)+v3fz(x,y,z)=[fx(x,y,z)fy(x,y,z)fz(x,y,z)][v1v2v3]=f(x,y,z)v
This is why the notation v is so suggestive of the way we compute the directional derivative:
vf=fv
Take a moment to delight in the fact that one single operation, the gradient, packs enough information to compute the rate of change of a function in every possible direction! That's so many directions! Left, right, up, down, north-north-east, 34.8 clockwise from the x-axis... Madness!

Example 2:

Problem: Take a look at the following function.
f(x,y)=x2xy,
What is the directional derivative of f at the point (2,3) along the vector v=0.6i^+0.8j^?
Solution: You can think of the direction derivative either as a weighted sum of partial derivatives, as below:
vf=0.6fx+0.8fy
Or, you can think of it as a dot product with the gradient, as you see here:
vf=fv
The first is faster, but just for practice, let's see how the gradient interpretation unfolds. We start by computing the gradient itself:
f=[fxfy]=[x(x2xy)y(x2xy)]=[2xyx]
Next, plug in the point (x,y)=(2,3) since this is the point the question asks us about.
f(2,3)=[2(2)(3)(2)]=[72]
To get the desired directional derivative, we take the dot product between this gradient and v:
vf(2,3)=f(2,3)(0.6i^+0.8j^)=[72][0.60.8]=7(0.6)+(2)(0.8)=2.6

Finding slope

How do you find the slope of a graph intersected with a plane that is not parallel to the x or y axes?
You can use the directional derivative, but there is one important thing to remember:
If the directional derivative is used to compute slope, either v must be a unit vector or you must remember to divide by ||v|| at the end.
In the definition and computation above, doubling the length of v would double the value of the directional derivative. In terms of the computation, this is because f(2v)=2(fv).
However, this might not always be what you want. The slope of a graph in the direction of v, for example, depends only on the direction of v, not the magnitude ||v||. Let's see why.
How can we imagine this slope? Slice the graph of f with a vertical plane that cuts the xy-plane in the direction of v. The slope in question is that of a line tangent to the resulting curve. As with any slope, we look for the rise over run.
In this case, the run will be the distance of a small nudge in the direction of v. We can express such a nudge as an addition of hv to an input point x0, where h is thought of as some small number. The magnitude of this nudge is h||v||.
The resulting change in the output of f can be approximated by multiplying this little value h by the directional derivative:
hvf(x0,y0)
In fact, the rise of the tangent line—as opposed to the graph of the function— is precisely hvf(x0,y0) due to this run of size h||v||. For full details on why this is true, see the formal definition of the directional derivative in the next article.
Therefore, the rise-over-run slope of our graph is
hvf(x0,y0)h||v||=vf(x0,y0)||v||
Notice, if v is a unit vector, meaning ||v||=1, then the directional derivative does give the slope of a graph along that direction. Otherwise, it is important to remember to divide out by the magnitude of v.
Some authors even go so far as to include normalization in the definition of vf.
Alternate definition of directional derivative:
vf(x)=limh0f(x+hv)f(x)h||v||
Personally, I think this definition puts too much emphasis on the particular use case of finding slope, so I prefer to use the original definition and normalize v when necessary.

Example 3: Slope

Problem: On the stage for this problem we have three players.
Player 1, the function:
f(x,y)=sin(xy)
Player 2, the point:
(x0,y0)=(π3,12)
Player 3, the vector:
v=2i^+3j^
What is the slope of the graph of f at the point (x0,y0) along the vector v?
Answer: Since we are finding slope, we must first normalize the vector in question. The magnitude ||v|| is 22+32=13, so we divide each term by 13 to get the resulting unit vector u^ in the direction of v:
Next, find the gradient of f:
Plug in the point (x0,y0)=(π3,12) to this gradient.
Finally, take the dot product between u^ and f(π/3,1/2):

Summary

  • If you have some multivariable function, f(x,y) and some vector in the function's input space, v, the directional derivative of f along v tells you the rate at which f will change while the input moves with velocity vector v.
  • The notation here is vf, and it is computed by taking the dot product between the gradient of f and the vector v, that is, fv.
  • When the directional derivative is used to compute slope, be sure to normalize the vector v first.

Want to join the conversation?

  • blobby green style avatar for user harrysonghurst1
    In example 3, is there an error?
    cos((1/2) * (pi/3)) =/= 1/2

    SqRt(3)/2 is what I get. I think you have taken the sin of pi/6 instead of cos.
    (23 votes)
    Default Khan Academy avatar avatar for user
  • leaf blue style avatar for user Chris
    I'm still not sure why you have to normalize vector v when computing the directional derivative for slope. Isn't the directional derivative just computing the rate at which f will change while the input moves along v, which is a lengthier way of describing the slope?
    (12 votes)
    Default Khan Academy avatar avatar for user
    • primosaur ultimate style avatar for user shayanaminnjad.sa
      The derivative means instantaneous rate of change. it is obvious if you move along a vector, the bigger the magnitude of the vector is you travel faster, so in each instance, you have a bigger instantaneous rate of change. but the slope is something different. you only care about the rise over run. two vectors with different magnitude have the same rise over run if they point in the same direction. so if we are using derivative as a mean to get to the slope, we ignore the magnitude, cause we only care about the direction. Hope my answer is clear.
      (25 votes)
  • leafers ultimate style avatar for user gschex1112
    In example 3: slope, the magnitude of v should be sqrt(2^2+3^3) = sqrt(13). sqrt(4^2+3^2) = sqrt(25) = 5, not sqrt(13), and 4 is not part of the vector v.

    On a side note, I'm glad to see I'm not the only one who works through an operation and then puts the result back in to the operation, as if it still needs to be solved. I've gotten a few KA exercises wrong that way, usually involving simple arithmetic after having taken care of the calculus.
    (5 votes)
    Default Khan Academy avatar avatar for user
  • piceratops seedling style avatar for user Jorge Luis Borges Vázquez
    Why does he say, the vector "v" is the velocity vector?? I think in this context, it is the displacement vector. We are not considering time, just space.

    If we just consider the graph, (independently of if the real function represents the output of benefits from two inputs of production and investiment, in an economy problem), we obtain the increment of Z distance (increment of the functions), for an increment of a combination of distances in X and Y. Talking about velocity has not sense here.
    (4 votes)
    Default Khan Academy avatar avatar for user
    • blobby green style avatar for user Taras.Pokalchuk
      It's helpful to think about v as of a velocity vector because if you move along v 2 as fast the resulting rate of change has to be 2 as fast (and it is if you double the directional vector). But if you think of it as of distance, i will not be intuitive to think that doubling the distance traveled will double the output.
      (2 votes)
  • blobby green style avatar for user Taras.Pokalchuk
    if h is an infinitsimal why does the magnitude of v matter? even if it would matter wouldn't it be better to aproach the vector's magnitude to zero too?
    (4 votes)
    Default Khan Academy avatar avatar for user
    • leaf green style avatar for user Alexander Wu
      Rate of change with h approaching zero is equivalent to the slope of a tangent. If you are using the rate of change on the original graph, h must be tiny. If you are using a tangent line, then h can be whatever, since the slope is constant.

      v can't possibly be zero since the zero vector has no direction. It has to have some length to retain its directional information, We decide 1 is the best choice because it's the most general number other than 0.

      Slope is defined as rise/run, so it is also rise when run = 1. rise/run = rise/1 = rise. We could of course had defined it as 2rise/run or run/rise, which would still retain all the useful information about how steep the graph is, but we defined it as rise/run, and so we have to use ||v|| = 1.
      (0 votes)
  • blobby green style avatar for user Richard
    so if I compute the directional derivative, having the unit vector as my direction I get the slope of the surface right?, if i dont use a unit vector what do i get? Im asking for a physical interpretation.
    thanks!
    (2 votes)
    Default Khan Academy avatar avatar for user
  • aqualine tree style avatar for user Steve Wallace
    example 2 calculates the directional derivative and uses dot product with gradient and the vector components, yet example 3 in calculating slope, converts the vector to the unit vector before the dot product. Whats the difference between directional derivative and slope?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • ohnoes default style avatar for user Tejas
      There is no difference. Whenever you calculate either, you need to make the vector specifying direction a unit vector. In example 2, the 0.6î+0.8ĵ is already a unit vector, so there was no need to convert anything.
      (1 vote)
  • orange juice squid orange style avatar for user Radu Marin
    In example 1, reflection question, since v = i + j, why isn't the gradient along v, sqrt(2)/2*df/dx+ sqrt(2)/2*df/dy, since we have to normalize it? I'm a bit confused with having two definitions with different meaning for the gradient... some physical examples on when you use one or another?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • purple pi purple style avatar for user Gadzookie2
    For example 3, shouldn't it be root(2 squared + 3 squared) and then 3/root(13) j?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • piceratops seedling style avatar for user Michel Balamou
    When you asked to calculate the directional derivative with respect to vector v=i+j you didn't normalize the vector v since the length of i+j is actually sqrt(2) (following pythagorean sqrt(1+1)), so the directional derivative should be sqrt(2)*(df/dx)+sqrt(2)*(df/dy).

    Am I correct?
    (1 vote)
    Default Khan Academy avatar avatar for user