Main content
Multivariable calculus
Course: Multivariable calculus > Unit 2
Lesson 3: Partial derivative and gradient (articles)The gradient
The gradient stores all the partial derivative information of a multivariable function. But it's more than a mere storage device, it has several wonderful interpretations and many, many uses.
What you need to be familiar with before starting this lesson:
- Partial derivatives
- Vector fields
- Contour maps—only necessary for one section of this lesson.
What we're building toward
- The gradient of a scalar-valued multivariable function f, left parenthesis, x, comma, y, comma, dots, right parenthesis, denoted del, f, packages all its partial derivative information into a vector:In particular, this means del, f is a vector-valued function.
- If you imagine standing at a point (x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, comma, dots) in the input space of f, the vector del, f, left parenthesis, x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, comma, dots, right parenthesis tells you which direction you should travel to increase the value of f most rapidly.
- These gradient vectors—del, f, left parenthesis, x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, comma, dots, right parenthesis—are also perpendicular to the contour lines of f.
Definition
After learning that functions with a multidimensional input have partial derivatives, you might wonder what the full derivative of such a function is. In the case of scalar-valued multivariable functions, meaning those with a multidimensional input but a one-dimensional output, the answer is the gradient.
The gradient of a function f, denoted as del, f, is the collection of all its partial derivatives into a vector.
This is most easily understood with an example.
Example 1: Two dimensions
If f, left parenthesis, x, comma, y, right parenthesis, equals, x, squared, minus, x, y, which of the following represents del, f?
Notice, del, f is a vector-valued function, specifically one with a two-dimensional input and a two-dimensional output. This means it can be nicely visualized with a vector field. That vector field lives in the input space of f, which is the x, y-plane.
This vector field is often called the gradient field of f.
Reflection question: Why are the vectors in this vector field so small along the upward diagonal stripe in the middle of the x, y-plane?
Example 2: Three dimensions
What is the gradient of f, left parenthesis, x, comma, y, comma, z, right parenthesis, equals, x, minus, x, y, plus, z, squared?
del, f is a function with a three-dimensional input and a three-dimensional output. As such, it is nicely visualized with a vector field in three-dimensional space.
Interpreting the gradient
In each example above, we pictured del, f as a vector field, but how do we interpret these vector fields?
More concretely, let's think about the case where the input of f is two-dimensional. The gradient turns each input point left parenthesis, x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, right parenthesis into the vector
What does that vector tell us about the behavior of the function around the point left parenthesis, x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, right parenthesis?
Think of the graph of f as a hilly terrain. If you are standing on the part of the graph directly above—or below—the point left parenthesis, x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, right parenthesis, the slope of the hill depends on which direction you walk. For example, if you step straight in the positive x direction, the slope is start fraction, \partial, f, divided by, \partial, x, end fraction; if you step straight in the positive y-direction, the slope is start fraction, \partial, f, divided by, \partial, y, end fraction. But most directions are some combination of the two.
The most important thing to remember about the gradient: The gradient of f, if evaluated at an input left parenthesis, x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, right parenthesis, points in the direction of steepest ascent.
So, if you walk in the direction of the gradient, you will be going straight up the hill. Similarly, the magnitude of the vector del, f, left parenthesis, x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, right parenthesis tells you what the slope of the hill is in that direction.
It is not immediately clear why putting the partial derivatives into a vector gives you the slope of steepest ascent, but this will be explained once we get to directional derivatives.
When the inputs of a function f live in more than two dimensions, we can no longer comfortably picture its graph as hilly terrain. That said, the same underlying idea holds. Whether the input space of f is two-dimensional, three-dimensional, or 1,000,000-dimensional: the gradient of f gives a vector in that input space that points in the direction that makes the function f increase the fastest.
Example 3: What local maxima look like
Consider the function f, left parenthesis, x, comma, y, right parenthesis, equals, minus, x, start superscript, 4, end superscript, plus, 4, left parenthesis, x, squared, minus, y, squared, right parenthesis, minus, 3. What is its gradient?
Here's what the graph of f looks like:
Notice that it has two peaks. Here's what the vector field for del, f looks like—vectors colored more red should be understood to be longer, and vectors colored more blue should be understood to be shorter:
The two input points corresponding with the peaks in the graph of f are surrounded by arrows directed towards those points. Why?
This is because near the top of a hill, the direction of steepest ascent always points towards the peak.
Reflection question: What would the gradient field of a function look like near the local minimum of that function?
The gradient is perpendicular to contour lines
Like vector fields, contour maps are also drawn on a function's input space, so we might ask what happens if the vector field of del, f sits on top of the contour map corresponding for f.
For example, let's take the function
f, left parenthesis, x, comma, y, right parenthesis, equals, x, y:
Looking at the image above, you might notice something interesting: Each vector is perpendicular to the contour line it touches.
To see why this is true, take a particular contour line, say the one representing the output two, and zoom in to a point on that line. We know that the gradient del, f points in the direction which increases the value of f most quickly. There are two ways to think about this direction:
- Choose a fixed step size, and find the direction such that a step of that size increases f the most.
- Choose a fixed increase in f, and find the direction such that it takes the shortest step to increase f by that amount.
Either way, you're trying to maximize the rise over run, either by maximizing the rise, or minimizing the run.
Contour maps provide a good illustration of what this second perspective might look like. In Figure 2 above, there is a second contour line representing 2.1, which is slightly greater than the value 2 represented by the initial line. The gradient of f should point in the direction that will get to this second line with as short a step as possible.
The more we zoom in, the more these lines will look like straight, parallel lines. The shortest path from one line to another that is parallel to it is always perpendicular to both lines, so the gradient will look perpendicular to the contour line.
The del operator
In multivariable calculus—and beyond—the word operator comes up a lot. This might sound fancy, but for the most part, you can think of operator as meaning "thing which turns a function into another function".
The derivative is one example of an operator since it turns a function f into a new function f, prime. Differential operators are all operators that extend the idea of a derivative to a different context.
Example differential operators
Name | Symbol | Example | ||||
Derivative | start fraction, d, divided by, d, x, end fraction | start fraction, d, divided by, d, x, end fraction, left parenthesis, x, squared, right parenthesis, equals, 2, x | ||||
Partial derivative | start fraction, \partial, divided by, \partial, x, end fraction | start fraction, \partial, divided by, \partial, x, end fraction, left parenthesis, x, squared, minus, x, y, right parenthesis, equals, 2, x, minus, y | ||||
Gradient | del |
This symbol del is referred to either as nabla or del. Typically nabla refers to the symbol itself while del refers to the operator it represents. This can be confusing since del can also refer to the symbol \partial, but hey, when has math terminology ever been reasonable?
Whatever you want to call it, the operator del can be loosely thought of as a vector of partial derivative operators:
This isn't quite a real definition. For one thing, the dimension of this vector is not defined since it depends on how many inputs there are in the function del is applied to. Furthermore, it's playing things pretty fast and loose to make a vector out of operators. But, because in practice the meaning is usually clear, people rarely worry about it.
Imagine "multiplying" this vector by a scalar-valued function:
Of course, this is not multiplication, you are really just evaluating each partial derivative operator on the function. Nevertheless, this is a super helpful way to think about del since it comes up again in the context of several more operators we will learn about later: divergence, curl, and the Laplacian.
Summary
- The gradient of a scalar-valued multivariable function f, left parenthesis, x, comma, y, comma, dots, right parenthesis, denoted del, f, packages all its partial derivative information into a vector:In particular, this means del, f is a vector-valued function.
- If you imagine standing at a point left parenthesis, x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, comma, dots, right parenthesis in the input space of f, the vector del, f, left parenthesis, x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, comma, dots, right parenthesis tells you which direction you should travel to increase the value of f most rapidly.
- These gradient vectors del, f, left parenthesis, x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, comma, dots, right parenthesis are also perpendicular to the contour lines of f.
Want to join the conversation?
- How would I be able to type in the symbol "nabla" on a mac keyboard?(13 votes)
- I know this is a bit late, but hopefully still helpful for someone. On recent versions of OS X, you can get to the "Character Viewer" from any application by going to Edit > Emoji and Symbols. If you don't see the symbol you are looking for in the list (many math symbols aren't there), click the button on the upper right corner to expand the list, giving you the full character viewer. From there, the "Math Symbols" section includes many helpful symbols, including ∇!(31 votes)
- Does anyone know of a good online resource that has Multivariable practice questions?(9 votes)
- I just saw http://www.leadinglesson.com/, not sure how good they are but the professor seems to have some chops in this area.(7 votes)
- I have a question about the gradient: after watching linear algebra videos on linear transformations, we've learned that a transformation T, which takes place from IR^n ---» IR^m represents a matrix with m rows and n columns. Now being aware of this fact, let's assume a function f(x,y) = x^2 - xy, where f: IR^2 ---» IR. Why is the gradient represented as a 2x1 matrix (2 rows and 1 column) and not as a 1x2 matrix (1 row and 2 columns)? The gradient here is represented as a vector field, not as a scalar field, but why?(7 votes)
- Whether you represent the gradient as a
2x1
or as a1x2
matrix (column vector vs. row vector) does not really matter, as they can be transformed to each other by matrix transposition. Ifa
is a point inR²
, we have, by definition, that the gradient ofƒ
ata
is given by the vector∇ƒ(a) = (∂ƒ/∂x(a), ∂ƒ/∂y(a)),
provided the partial derivatives∂ƒ/∂x
and∂ƒ/∂y
ofƒ
exist ata
. Note that∇ƒ(a)
is a vector. Thus∇ƒ
maps a vectora
inR²
to the vector∇ƒ(a)
inR²
, so that∇ƒ: R² ➝ R²
is a vector field (and not a scalar field).
Edit
Going slightly on a tangent here: the gradient∇ƒ
is closely related to the (total) derivative ofƒ
. The total derivative ofƒ
ata
(if it exists) is the unique linear transformationƒ'(a): R² ➝ R
such that|ƒ(x) - ƒ(a) - ƒ'(a)(x - a)| / ‖x - a‖ ➝ 0
asx ➝ a
. In this case, the matrix ofƒ'(a)
(that is, the matrix representation of the linear transformationƒ'(a)
) is given by the1x2
matrixDƒ(a) = [∂ƒ/∂x(a) ∂ƒ/∂y(a)].
(10 votes)
- Where does the gradient point if there are 2 equally steep directions to go in. For that sake, there could be any n number of directions. How does the gradient decide which one to point in if they are equal?
An example I can think of is the the origin in the graph z = x^2 - y^2. If you go along either x axis, the curve will increase exponentially (but equally) on both sides. What does the gradient vector do in such cases? (In the case of the origin of x^2 - y^2, I believe it gives the 0 vector, as if we're at a local maxima -- which makes sense along the y direction but not along the x direction...)(7 votes)- If you actually take the gradient, it becomes [2x, -2y]. so at x-axis, put y = 0, and the gradient becomes [2x, 0]. Now If you are at x = 0, then gradient is [0,0] which does not tell you to go anywhere i.e. does not point in any direction. but as you deviate slightly in any direction, [h,0] or [-h,0], gradient start pointing in a specific direction, which is the direction of steepest ascent.(6 votes)
- hi all,
I am relatively new to Khan Academy and I like it a lot!
I just started with “multivariable calculus” and I was curious whether I could be of some help (and get some help!) on this forum. (it’s almost 50 years ago that I was taught this stuff; it’s a trip down memory lane for me; I have to refresh it all and that will take me some time).
Khan Academy makes it very clear that it hopes (rather: expects) that we are teaching each other.
Unfortunately I see very few well-formulated clear questions; some posts are not even questions at all! No wonder that you (we!) get little or no response. Let’s try to change that! Hope you don’t find me presumptuous. Do you agree with me that we all should make better use of this site and its possibilities? In the near future I hope to comment on some of your questions (please don’t take offense!) and I will problably pose some questions myself. See You!!
Let me emphasize my question: Do you agree with me that we all should make better use of this site and its possibilities?(7 votes)- Hi Rene, this is a good idea! (I agree with your question.) Khan Academy can only get better as its community comes together.(1 vote)
- I'm missing exercises on multivarible calculus.
This was a great article thought.(3 votes)- To find the gradient you find the partial derivatives of the function with respect to each input variable. then you make a vector with del f/del x as the x-component, del f/del y as the y-component and so on...(4 votes)
- My question is suppose we are standing in the xy-plane now-
1.the gradient of the function shows us the direction of the steepest accent?
OR
2.the gradient of the function shows us its value or length by which we can see that in which way its length is minimum and by that we can get the steepest accent of the function?
I am actually confused about the direction of the gradient weather its parallel to the xy-plane or in the direction of the steepest accent?(2 votes)- The gradient gives us a vector, specifically a 2D vector. This vector is going to be parallel to the xy axis.
Now, you will have a point (x,y,z) on a graph of f(x,y) the gradient says at point (x,y,z) if you rotate yourself to face the vector you get from the gradient at that point, if you proceed forward the rate of change relative to the z axis will be the greatest in that direction. I will do an example.
let's just use f(x,y) = x^2 + y^2
gradient = <2x, 2y>
let's use the point (2, 3, 13) here the gradient is <2*2, 2*3> = <4, 6>. If you need find the angle on the xy plane .
Now, on point (2,3,13) if you imagine yourself standing there holding a compss, you would use the compass to you are facing in the direction of the gradient. Now, once you are doing that, walking forward gives you the fastest path up the "hill" you are standing on.
Worth saying the negative gradient is the steepest path down the hill.
So the vector that would point up the hill, so with some measure not parallel with the xy plane is different.
Let me know if that didn't help(4 votes)
- "If you imagine standing at a point (x0,y0,…) in the input space of f, the vector ∇f(x0,y0,…) tells you which direction you should travel to increase the value of f most rapidly."
So this means that the gradient does not point towards the top of a mountain, but to the steepest point, correct?
If I have a tilted plane, what would the gradient be? Zero?(2 votes)- Might be a little late, but I'll answer in case someone finds it useful...
the gradient points to the steepest path, like the text you quoted says. I does not point to the steepest point. If you were trying to climb a mountain as quickly as posible, you could use the gradient as a "compass" that would always tell you the fastest way to get to the top (without considering physical restrictions, of course).
It's easy to see a plane example, let's say: f(x,y) = x + y
what's the gradient? [1,1]
what does it mean? it means that, no matter where you are, the steepest point is always in that direction ([1,1]).
To get a gradient that is always zero you function would have to be constant.(4 votes)
- This is regarding the question "Why are the vectors in this vector field so small along the upward diagonal stripe?"
I understood why the vector along the line y=2x is going to be 0 but how can you deduce that the vectors close to that line have small horizontal component? Cant they just jump around? Is it because the functions are linear? What is the intuition behind this? Thanks.(3 votes)- I’m not sure why I am doing this: answering a 2-year old question by someone who probably already got on with his life. Kunjaan unfortunately does not refer explicitly to the “reflection question” under example 1 of the gradient article.
In the answer that you can find by clicking “show answer”, the author explains (quite lucidly in my opinion) that de gradient (of which the x-component=2x-y and y-component=x) is smallest in the vicinity of the line y=2x and close to the y-axis. In that region the gradient approaches (0,0) wich gives us small vectors. Just read the text closely.
If you formulate your questions precisely, you are more likely to get a timely answer.(2 votes)
- When defining the gradient of f as a vector, there is a typo: it says partial derivative of "0" but should be partial derivative of "y"(3 votes)