Main content
Multivariable calculus
Course: Multivariable calculus > Unit 3
Lesson 4: Optimizing multivariable functions (articles)Maxima, minima, and saddle points
Learn what local maxima/minima look like for multivariable function.
Background
What we're building to
- Intuitively, when you're thinking in terms of graphs, local maxima of multivariable functions are peaks, just as they are with single variable functions.
- The gradient of a multivariable function at a maximum point will be the zero vector, which corresponds to the graph having a flat tangent plane.
- Formally speaking, a local maximum point is a point in the input space such that all other inputs in a small region near that point produce smaller values when pumped through the multivariable function f.
Optimizing in higher dimensions
One of the most important applications of calculus is its ability to sniff out the maximum or the minimum of a function.
- Perhaps you find yourself running a company, and you've come up with some function to model how much money you can expect to make based on a number of parameters, such as employee salaries, cost of raw materials, etc., and you want to find the right combination of resources that will maximize your revenues.
- Maybe you are designing a car, hoping to make it more aerodynamic, and you've come up with a function modelling the total wind resistance as a function of many parameters that define the shape of your car, and you want to find the shape that will minimize the total resistance.
- In machine learning and artificial intelligence, the way a computer "learns" how to do something is commonly to minimize some "cost function" that the programmer has specified.
Local maxima and minima, visually
Let's start by thinking about those multivariable functions which we can graph: Those with a two-dimensional input, and a scalar output, like this:
I chose this function because it has lots of nice little bumps and peaks. We call one of these peaks a local maximum, and the plural is local maxima.
- The point left parenthesis, x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, right parenthesis underneath a peak in the input space (which in this case means the x, y-plane) is called a local maximum point.
- The output of a function at a local maximum point, which you can visualize as the height of the graph above that point, is the local maximum itself.
The word "local" is used to distinguish these from the global maximum of the function, which is the single greatest value that the function can achieve. If you are on the peak of a mountain, it's a local maximum, but unless that mountain is Mt. Everest, it is not a global peak.
I'll give you the formal definition of a local maximum point at the end of this article. Intuitively, it is a special point in the input space where taking a small step in any direction can only decrease the value of the function.
Similarly, if the graph has an inverted peak at a point, we say the function has a local minimum point at the value left parenthesis, x, comma, y, right parenthesis above/below this point on the x, y-plane, and the value of the function at this point is a local minimum. Intuitively, these are points where stepping in any direction can only increase the value of the function.
Stable points in one variable (review)
You may remember the idea of local maxima/minima from single-variable calculus, where you see many problems like this:
Concept check: For what value x is the function f, left parenthesis, x, right parenthesis, equals, minus, left parenthesis, x, minus, 2, right parenthesis, squared, plus, 5 the greatest? What is the maximum value?
In general, local maxima and minima of a function f are studied by looking for input values a where f, prime, left parenthesis, a, right parenthesis, equals, 0. This is because as long as the function is continuous and differentiable, the tangent line at peaks and valleys will flatten out, in that it will have a slope of 0.
Such a point a has various names:
- Stable point
- Critical point
- Stationary point
All of these mean the same thing: f, prime, left parenthesis, a, right parenthesis, equals, 0
The requirement that f be continuous and differentiable is important, for if it was not continuous, a lone point of discontinuity could be a local maximum:
And if f is continuous but not differentiable, a local maximum could look like this:
In either case, talking about tangent lines at these maximum points doesn't really make sense, does it?
However, even when f is continuous and differentiable, it is not enough for the derivative to be 0, since this also happens at inflection points:
This means finding stable points is a good way to start the search for a maximum, but it is not necessarily the end.
Stable points in two variables
The story is very similar for multivariable functions. When the function is continuous and differentiable, all the partial derivatives will be 0 at a local maximum or minimum point.
With respect to the graph of a function, this means its tangent plane will be flat at a local maximum or minimum. For instance, here is a graph with many local extrema and flat tangent planes on each one:
Saying that all the partial derivatives are zero at a point is the same as saying the gradient at that point is the zero vector:
People often write this more compactly like this:
The convention is that bold variable are vectors. So start bold text, x, end bold text, start subscript, 0, end subscript is a vector of the input values left parenthesis, x, start subscript, 0, end subscript, comma, y, start subscript, 0, end subscript, comma, dots, right parenthesis and start bold text, 0, end bold text is the vector with all zeros.
Such an input start bold text, x, end bold text, start subscript, 0, end subscript goes by the same various names as in the single-variable case:
- Stable point
- Stationary point
- Critical point
The thinking behind the words "stable" and "stationary" is that when you move around slightly near this input, the value of the function doesn't change significantly. The word "critical" always seemed a bit over dramatic to me, as if the function is about to die near those points.
As with single variable functions, it is not enough for the gradient to be zero to ensure that a point is a local maximum or minimum. For one thing, you can still have something similar to an inflection point:
But there is also an entirely new possibility, unique to multivariable functions.
Saddle points
Consider the function f, left parenthesis, x, comma, y, right parenthesis, equals, x, squared, minus, y, squared. Let's make a few observations about what goes on around the origin left parenthesis, 0, comma, 0, right parenthesis
- Both partial derivatives are 0 at this point:
Therefore left parenthesis, start color #0c7f99, 0, end color #0c7f99, comma, start color #bc2612, 0, end color #bc2612, right parenthesis is a stable point.
- When you just move in the x direction around this point, the function looks like f, left parenthesis, x, comma, 0, right parenthesis, equals, x, squared, minus, 0, squared, equals, x, squared. The single-variable function f, left parenthesis, x, right parenthesis, equals, x, squared has a local minimum at x, equals, 0.
- When you just move in the y direction around this point, meaning the function looks like f, left parenthesis, 0, comma, y, right parenthesis, equals, 0, squared, minus, y, squared, equals, minus, y, squared. The single-variable function f, left parenthesis, y, right parenthesis, equals, minus, y, squared has a local maximum at y, equals, 0.
In other words, the x and y directions disagree over whether this input should be a maximum or a minimum point. So even though left parenthesis, 0, comma, 0, right parenthesis is a stable point, and is not an inflection point, it cannot be a local maximum or local minimum!
Here's a video of this graph rotating in space:
Doesn't the region around left parenthesis, 0, comma, 0, comma, 0, right parenthesis kind of have the shape of a horse's saddle?
Well, mathematicians thought so, and they had one of those rare moments of deciding on a good name for something: Saddle points. By definition, these are stable points where the function has a local maximum in one direction, but a local minimum in another direction.
Testing maximality/minimality
"Alright,"
I hear you saying,
"so it's not enough for the gradient to be 0 since you might have an inflection point or a saddle point. But how can you tell if a stable point is a local maximum or minimum?"
I'm glad you asked! This is the topic of the next article on the second partial derivative test. For now, let's finish things off with a formal definition of a local maximum.
Formal definition
I've said this before, but the reason to learn formal definitions, even when you already have an intuition, is to expose yourself to how intuitive mathematical ideas are captured precisely. It's good practice for thinking clearly, and it can also help to understand those times when intuition differs from reality.
In defining a local maximum, let's use vector notation for our input, writing it as start bold text, x, end bold text.
Formal definition of a local maximum: A scalar-valued function f has a local maximum at start bold text, x, end bold text, start subscript, 0, end subscript if there exists some positive number r, is greater than, 0, thought of as a radius, such that the following statement is true:
f, left parenthesis, start bold text, x, end bold text, right parenthesis, is less than or equal to, f, left parenthesis, start bold text, x, end bold text, start subscript, 0, end subscript, right parenthesis for all start bold text, x, end bold text such that vertical bar, vertical bar, start bold text, x, end bold text, minus, start bold text, x, end bold text, start subscript, 0, end subscript, vertical bar, vertical bar, is less than, r
That's a bit of a mouthful, so let's break it down:
Saying "vertical bar, vertical bar, start bold text, x, end bold text, minus, start bold text, x, end bold text, start subscript, 0, end subscript, vertical bar, vertical bar, is less than, r" means the variable start bold text, x, end bold text is within a distance r of the maximum point start bold text, x, end bold text, start subscript, 0, end subscript. When start bold text, x, end bold text is two-dimensional this is the same as saying start bold text, x, end bold text lies inside a circle of radius r centered at the point start bold text, x, end bold text, start subscript, 0, end subscript.
More generally, if start bold text, x, end bold text is n-dimensional, the set of all start bold text, x, end bold text such that vertical bar, vertical bar, start bold text, x, end bold text, minus, start bold text, x, end bold text, start subscript, 0, end subscript, vertical bar, vertical bar, is less than, r forms an n-dimensional ball with radius r centered at start bold text, x, end bold text, start subscript, 0, end subscript.
We can then translate this definition from math-speak to something more closely resembling English as follows:
- start bold text, x, end bold text, start subscript, 0, end subscript is a maximum point of f if there is some small (ball-shaped) region in the input space around the point start bold text, x, end bold text, start subscript, 0, end subscript such that the highest possible value you can get for f evaluated on points in that region is achieved at the point start bold text, x, end bold text, start subscript, 0, end subscript.
Test your understanding: Write the formal definition for a local minimum, and think about what each component means as you write it down. (Resist the temptation to just copy down the words in the definition above.)
Summary
- Intuitively, when you're thinking in terms of graphs, local maxima of multivariable functions are peaks, just as they are with single variable functions.
- The gradient of a multivariable function at a maximum point will be the zero vector, which corresponds to the graph having a flat tangent plane.
- Formally speaking, a local maximum point is a point in the input space such that all other inputs in a small region near that point produce smaller values when pumped through the multivariable function f.
Want to join the conversation?
- When reading this article I noticed the "Subject: Prometheus" button up at the top just to the right of the KA homesite link. Anyone else notice this?(6 votes)
- Nope. I have a "Subject: Multivariable Calculus" button. Glitch?(4 votes)
- what R should be? any value? does the limit of R tends to zero?(3 votes)
- The specific value of r is situational, depending on how "local" you want your max/min to be.(2 votes)
- f(x)≥f(x0) why it is allowed to be greater or EQUAL ? isn't it just greater?(3 votes)
- Well think about what happens if we do what you are suggesting. We cant have the point x = x0 then yet when we say for all x we mean for the entire domain of the function. x0 thus must be part of the domain if we are able to evaluate it in the function. So what happens when x does equal x0? If the definition was just > and not >= then we would find that the condition is not true and thus the point x0 would not be a maximum which is not what we want.(2 votes)
- Saddle point and inflection point are two things I have a hard time differentiating! They are not the same, graphically, but what are the key mathematical differences between them both?(1 vote)
- Saddle points are used for multivariable functions, while inflection points are found in the case of single variable functions. So, that's one point of difference.
As for what they are, saddle points are extremum points which are neither local minima or maxima (How are they extrema then? Well, they show up when you try to find extrema, but once you use the second derivative test on them, they return 0)
Inflection points are points where functions change concavity (go from increasing to decreasing or vice versa).(3 votes)
- In machine learning and artificial intelligence, the way a computer "learns" how to do something is commonly to minimize some "cost function" that the programmer has specified.
I think it should be loss function--not cost function?(1 vote)- Well, if doing A costs B, then by doing A you lose B.
So, whether you call it cost function or loss function is just semantics – the meaning is the same.(3 votes)
- Is the following true when identifying if a critical point is an inflection point? A critical point of function F (the gradient of F is the 0 vector at this point) is an inflection point if both the F_xx (partial of F with respect to x twice)=0 and F_yy (partial of F with respect to y twice)=0 and of course the Hessian must be >0 to avoid being a saddle point or inconclusive.(1 vote)
- It is inaccurate to say that "this [the derivative being 0] also happens at inflection points." The function f(x)=sin(x) has an inflection point at x=0, but the derivative is not 0 there. I think what you mean to say is simply that a function's derivative can equal 0 at a point without having an extremum at that point, which is related to the fact that the second derivative at that point is 0, i.e. neither positive nor negative (i.e. the point is an inflection point). Maybe you meant that "this also can happen at inflection points."(1 vote)
- When talking about Saddle point in this article. It says 'The single-variable function f(x) = x^2 has a local minimum at x=0, and
The single-variable function f(y) = -y^2 has a local maximum at y = 0.
Can someone please explain why x is minimum and y is maximum and not vice versa?(1 vote)- Imagine the respect graphs of the two, and observe the concavity.(1 vote)
- questions of triple derivative(1 vote)
- "Saying that all the partial derivatives are zero at a point is the same as saying the gradient at that point is the zero vector." It's not true.
There exists a function such that all the partial derivatives do exist in a point P, but the function is not differentiable at P, that is the gradient vector does not exists.(0 votes)- An assumption made in the article actually states the importance of how the function must be continuous and differentiable.(2 votes)