Main content

### Course: Multivariable calculus > Unit 3

Lesson 3: Optimizing multivariable functions- Multivariable maxima and minima
- Find critical points of multivariable functions
- Saddle points
- Visual zero gradient
- Warm up to the second partial derivative test
- Second partial derivative test
- Second partial derivative test intuition
- Second partial derivative test example, part 1
- Second partial derivative test example, part 2
- Classifying critical points

© 2024 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Multivariable maxima and minima

A description of maxima and minima of multivariable functions, what they look like, and a little bit about how to find them. Created by Grant Sanderson.

## Want to join the conversation?

- can we optimize multivariable functions without graphing them?(6 votes)
- Yes, you can, for more complex multivariable functions you would use algorithms like steepest descent/accent, conjugate gradient or the Newton-Raphson method. These methods are generally referred to as optimisation algorithms. Simplistically speaking, they work as follows:

1) What direction should I move in to increase my value the fastest? The gradient.

2) Take a small step in that direction.

3) Go to step 1, unless somebody tells me to stop or the gradient is zero.

This will not guarantee that you reach THE global maximum, only that you will reach A maximum (most likely a local one).(9 votes)

- Show me how to find maxima of three variables(6 votes)
- I believe that the process for finding maxima and minima with 3 variables is exactly the same, you would just put another term into the gradient vector. However, it's not really possible to visualize this many dimensions, so they can't 'show' you, per se.(10 votes)

- Is it possible to find maxima and minima from the divergence of the gradient? Maximas have very negative divergence and minimas have very positive divergence?(5 votes)
- I guess, but if you want to do that, you'll need to find the maximum of the divergence of the gradient of the function. How do you find the maximum of the divergence of the gradient of the function? You can find the maximum of the divergence of the gradient of the divergence of the gradient of the function. Um, er...

You were trying to find the maximum of something, and you do that by finding the maximum of something else? Okay.

It's much easier to just let the gradient be**0**. Once you've found an extremum, can you use the divergence of the gradient to determine whether it is a maximum or minimum?

Kind of, but there are saddle points. Saddle points can have nonzero divergence of the gradient. So you need to apply the second derivative test first, with the hessian matrix's determinant. But after applying that test, you can find if it's a max or min just by using one partial derivative, so there's no need for the divergence anymore.

The divergence is the trace of the hessian matrix, which is related to its determinant but not quite the same (trace is the sum of the diagonal entries of a matrix).(5 votes)

- what about absolute maxima and absolute minima? is this explained somewhere ?(4 votes)
- Can't we use Laplacian(f)(x_0,y_0) < 0 for optimization?(3 votes)
- Well, the Laplacian is sort of like a 2nd derivative thing, while for optimization one tends to use the first derivative.(1 vote)

- I wouldn't say it's just a matter of notational convenience to say Del f = 0 -- it makes a lot of sense, that none of the directions show the direction of steepest ascent, and that the slope in each of these directions is zero.(2 votes)
- Of course, it's often possible there are no solutions to them both being zero even when there are solutions to either of them individually being zero.(2 votes)
- What does the extreme value theorem say for multivariable functions and how do we find absolute maxima and minima on a closed bounded set(2 votes)
- Based on the notion that the gradient vectors always point in the direction of steepest ascent, I'm finding it hard to register why the gradient at a minimum point would also be the zero vector. I mean, when you talk about the partial derivatives and all it makes sense that it should be the zero vector, but at the same time it doesn't feel right that there's
*no direction*to travel in starting from a minimum point that would increase the value of the function the fastest. In other words, by definition of a minimum point, if you walk away from that minimum point in any direction you would have to be increasing the value of the function, so it's just a matter of which direction you should walk in to increase the function value the most efficiently, and so I find it hard to see why the gradient would all of a sudden be the zero vector (which says there's no direction to walk in that would result in the steepest ascent) when you're at a minimum point.(2 votes)- If there is a direction of descent, there must be a direction of
*steepest***descent**, and therefore there*must*be a direction of*steepest***ascent**(i.e., the opposite direction).

Using finite steps, you'll never find a true maximum or minimum for the reason you give (and extended as i describe).

However, the derivative finds the limiting value as our steps get ever smaller, and at both maximum and minimum points, as our step approaches infinitessimal magnitude (in whatever direction) the ascent/ descent approaches the limiting value of zero.

Therefore, the gradient of the function is zero at both minima and maxima.(1 vote)

- Wouldn't the partial derivative with respect to z be infinity as the z- axis is perpendicular to the plane. Then, the gradient probably wouldn't be equal to the zero vector, right?(1 vote)
- The function in this video is actually z, z(x,y). Unless you're dealing with f(x,y,z), a 4D graph, then no the partial of z would not be infinity. At maxima points (in 3D, z(x,y)), the partial of z would actually probably be 0 because the partials of x and y are 0 at these points. If you have almost no change in x or y, you would have almost no change in z as well (at least at maxima points).(2 votes)

## Video transcript

- [Voiceover] When you have
a multivariable function, something that takes in
multiple different input values and let's say it's just
outputting a single number, a very common thing you wanna do with an animal like this is Maximize it. Maximize it, and what this means is you're looking for the input points, the values of x and y and
all of its other inputs, such that the output, f, is as
great as it possibly can be. Now this actually comes up
all the time in practice 'cause usually when you're dealing with a multivariable function,
it's not just for fun and for dealing with abstract symbols, it's 'cause it actually
represents something, so maybe it represents
profits of a company, maybe this is a function
where you're considering all the choices you can make, like the wages you give your employees or the prices of your goods, or the amount of debt that
you raise for capital, all sorts of choices that you might make, and you wanna know what
values should you give to those choices such
that you maximize profits, you maximize the thing,
and if you have a function that models these relationships, there are techniques, which
I'm about to teach you, that you can use to maximize this. Another very common setting, more and more important these days, is that of machine learning
and artificial intelligence, where often what you do
is you assign something called a cost function to a task, so maybe you're trying to teach a computer how to understand audio or
how to read handwritten text. What you do, is you find a function that basically tells it how wrong it is when it makes a guess,
and if you do a good job designing that function, you
just need to tell the computer to Minimize, so that's kind
of the flip side, right? Instead of finding the maximum, to minimize a certain function, and if it minimizes this cost function, that means that it's
doing a really good job at whatever task you've assigned it, so a lot of the art and
science of machine learning and artificial intelligence
comes down to, well, one finding this cost function and actually describing difficult tasks in terms of a function, but
then applying the techniques that I'm about to teach you to have the computer minimize that, and a lot of time and research has gone into figuring out
ways to basically apply these techniques, but really
quickly and efficiently. So, first of all, on a conceptual level, let's just think about what it means to be finding the maximum
of a multivariable function. So I have here the graph
of a two-variable function. It's something that has
a two-variable input that we're thinking of as the xy-plane, and then its output is
the height of this graph, and if you're looking to maximize it, basically, what you're
finding is this peak, kind of the tallest
mountain in the entire area, and you're looking for the input value, the point on the xy-plane
directly below that peak, 'cause that tells you
the values of the inputs that you should put in to
maximize your function, so how do you go about finding that? Well, this is perhaps the core observation in well, calculus, not just
multivariable calculus. This is similar in the
single variable world, and there are similarities
in other settings, but the core observation is that if you take a
tangent plane at that peak, so let's just draw in a
tangent plane at that peak, it's gonna be completely flat, but let's say you did this
at a different point, right? 'Cause if you tried to
find the tangent plane, not at that point, but you
kind of moved it about a bit to somewhere that's not quite a maximum, if the tangent plane has
any kind of slope to it, what that's telling
you is that if you take very small directions, kind of in the direction
of that upward slope, you can increase the
value of your function, so if there's any slope
to the tangent plane, you know that you can walk in
some direction to increase it, but if there's no slope
to it, if it's flat, then that's a sign that no
matter which direction you walk, you're not gonna be significantly
increasing the value of your function. So what does this mean
in terms of formulas? Well, if you kind of think back to how we compute tangent planes and if you're not very
comfortable with that, now would be a good time
to take another look at those videos about tangent planes, the slope of the plane in each direction, so this would be the
slope in the x direction, and then if you look at it
from another perspective, this would be the slope
in the y direction, each one of those has
to be zero, and that, in terms of partial derivatives, means the partial
derivative of your function, at whatever point you're
dealing with, right? So I'll call it x not, y not, as the point where you're inputting
this, has to be zero, and then similarly, the partial derivative with respect to the other
variable, with respect to y, at that same point, has to be zero, and both of these have to be true because let's just take
a look, I don't know, let's slide it over a little bit here, this tangent plane, if
you look at the slope, you imagine walking in the y direction, you're not increasing your value at all. The slope in the y direction
would actually be zero, so that would mean the partial derivative with respect to y would be
zero, but with respect to x, when you're moving in
the x direction here, the slope is clearly negative, because as you take positive
steps in the x direction, the height of your tangent
plane is decreasing, which corresponds to if you
take tiny steps on your graph, then the height will decrease
in a manner proportional to the size of those tiny steps. So what this gives you
here is gonna be a system of equations where you're
solving for the value of x not and y not that satisfies
both of these equations, and in future videos, I'll go through specific examples of this. For now, I just wanna give a
good conceptual understanding, but one very important thing to notice is that just because this
condition is satisfied, meaning your tangent plane is flat, just because that's satisfied,
doesn't necessarily mean that you've found the maximum. That's just one requirement
that it has to satisfy, but for one thing, if you
found the tangent plane at other little peaks, like
this guy here or this guy here, or all of the little bumps that go up, those tangent planes would also be flat, and those little bumps
actually have a name because this comes up a lot. They're called local Minima,
or local Maxima, sorry, so those guys are called local Maxima. Maxima is just the plural of Maximum, and local means that it's
relative to a single point, so it's basically, if you
walk in any direction, when you're on that little
peak, you'll go downhill, so relative to the neighbors
of that little point, it is a maximum, but relative
to the entire function, these guys are the shorter
mountains next to Mount Everest, but there's also another circumstance where you might find a flat tangent plane, and that's at the Minima points, right? If you have the global
Minimum, the absolute smallest, or also just the local
Minima, these inverted peaks, you'll also find flat tangent planes. So what that means, first of all, is that when you're minimizing a function, you also have to look
for this requirement, where all the partial
derivatives are zero, but it mainly just means
that your job isn't done once you've done this. You have to do more tests to check whether or not what you
found is a local Maximum or a local Minimum, or a global Maximum, and these requirements, by the way, often you'll see them written
in a more succinct form, where instead of saying
all the partial derivatives have to be zero, which
is what you need to find, they'll write it in a different form where you say that the
gradient of your function, f, which, of course, is just the vector that contains all those
partial derivatives. Its first component is
the partial derivative with respect to the first variable, its second component is
the partial derivative with respect to the second variable, and if there's more variables,
you would keep going, you'd say that this whole thing has to equal the zero vector,
the vector that has nothing but zeroes as its components, and it's kind of a
common, abusive notation. People will just call
that zero vector, zero, and maybe they'll emphasize
it by making it bold, because the number zero is not a vector and often making things bold emphasizes that you want to be referring to a vector, but this gives a very succinct way of describing the requirement. You're just looking for where
the gradient of your function is equal to the zero vector, and that way, you can just write it on
one line, but in practice, every time that you're expanding that out, what that means is you find all of the different partial derivatives, so this is really just a matter
of notational convenience and using less space on a blackboard, but whenever you see this,
that the gradient equals zero, what you should be thinking of is the idea that the tangent plane, the tangent plane is completely flat, and as I just said, that's not enough because you might also
be finding local Maxima or Minima points, but in
multivariable calculus, there's also another possibility, a place where the tangent plane is flat, but what you're looking at
is neither a local Maximum nor a local Minimum, and this is the idea of a saddle point, which is
new to multivariable calculus, and that's what I'll be talking
about in the next video, so I will see you then.