Lagrange multipliers, introduction
The "Lagrange multipliers" technique is a way to solve constrained optimization problems. Super useful!
What we're building to:
- The Lagrange multiplier technique lets you find the maximum or minimum of a multivariable function when there is some constraint on the input values you are allowed to use.
- This technique only applies to constraints that look something like this:Here, is another multivariable function with the same input space as , and is some constant.
- The core idea is to look for points where the contour lines of and are tangent to each other.
- This is the same as finding points where the gradient vectors of and are parallel to each other.
- The entire process can be boiled down into setting the gradient of a certain function, called the Lagrangian, equal to the zero vector.
Suppose you want to maximize this function:
Plot of the function
But let's also say you limited yourself to inputs which satisfy the following equation:
In other words, for which point on the is the value biggest?
This is what's known as a constrained optimization problem. The restriction to points where is called a "constraint", and is the function that needs to be optimized.
Here's one way to visualize this: First draw the graph of , which looks like a slanted plane since is linear. Next, project the circle from the -plane vertically onto the graph of . The maximum we are seeking corresponds with the highest point of this projected circle on the graph.
More general form
In general, constrained optimization problems involve maximizing/minimizing a multivariable function whose input has any number of dimensions:
Its output will always be one-dimensional, though, since there's not a clear notion of "maximum" with vector-valued outputs.
The type of constraints that the Lagrange multiplier technique applies to must take the form of some other multivariable function being set equal to a constant .
Since this is meant to be a constraint on the input of , the number of dimensions in the input of is the same as that of . For example, the example outlined above fits this general form as follows:
Using contour maps
Reasoning about this problem becomes easier if we visualize not with a graph, but with its contour lines.
As a reminder, a contour line of is the set of all points where for some constant . The following interactive tool shows how this line (drawn in blue) changes as the constant changes. The circle is also shown (in red). Try to make as big/small as possible while still allowing contour line of to intersect the circle.
Concept check: What does it mean if for a particular value of , the blue line representing does not intersect the red circle representing ?
Notice, the circle where can be thought of as a particular contour line of the function . So with that, here's the clever way to think about constrained optimization problems:
Key observation: The maximum and minimum values of , subject to the constraint , correspond with contour lines of that are tangent to the contour representing .
Constrained extrema are tangent.
If were a different function, its contours might not always be straight lines. This is unique to our example since is linear. For example, take a look at this function:
Its contour lines look like this:
That said, the key observation still holds, and is worth repeating: When is a maximum or minimum of subject to the constraint, the contour line for will be tangent to contour representing .
Where the gradient comes into play
How do you put the idea of two contour lines being tangent into a formula you can solve?
To answer this, we turn to our loyal friend the gradient. There are many ways to interpret : The direction of steepest ascent, a tool for computing directional derivatives, etc. But for our purposes here, the property we care about is that the gradient of evaluated at a point always gives a vector perpendicular to the contour line passing through that point.
Gradient vectors are perpendicular to contour lines.
This means when the contour lines of two functions and are tangent, their gradient vectors are parallel. Here's what that might look like for arbitrary functions and :
Wikipedia image of tangent contour lines
The fact that contour lines are tangent tells us nothing about the magnitude of each of these gradient vectors, but that's okay. When two vectors point in the same direction, it means we can multiply one by some constant to get the other. Specifically, let represent a particular point where the contour lines of and are tangent (writing and with a subscripts just indicates that we are considering constant values, and hence a specific point). Since this tangency means their gradient vectors align, here's what you might write down:
Here, represents some constant. Some authors use a negative constant, , but I personally prefer a positive constant, as it gives a cleaner interpretation of down the road.
Let's see what this looks like in our example where and . The gradient of is
and the gradient of is
Therefore, the tangency condition ends up looking like this:
Solving the problem in the specific case
To sum up where we are so far, we are looking for input points with the following properties:
- , which for our example means
- for some constant , which for our example means
There are equations and unknowns, so this is a perfectly solvable situation.
The Lagrangian function
Picture of Lagrange
In the 1700's, our buddy Joseph Louis Lagrange studied constrained optimization problems of this kind, and he found a clever way to express all of our conditions into a single equation.
You can write these conditions generally by saying we are looking for constants , and that satisfy the following conditions:
- The constraint:
- The tangency condition:.This can be broken into its components as follows:
Lagrange wrote down a special new function which takes in all the same input variables as and , along with the new kid in town , thought of now as a variable rather than a constant.
For example, consider our example above.
Here's how this new function would look:
Notice, the partial derivative of with respect to is :
So we can translate the condition as
What's more, look at what we get when we set one of the other partial derivatives equal to :
That just so happens to be another one of our conditions! Almost identically, the condition unravels to become
Together, these conditions are the same as saying.
Therefore, the three conditions we need to solve to find and come down to the various partial derivatives of being equal to . This can be written extremely compactly by setting the gradient of equal to the zero vector:
For example, using our specific functions from above, we see how this encodes the system of equations we need to solve:
As a tribute to ol' Joey Lou, we call this function the "Lagrangian", and the new variable that we introduce is called a "Lagrange multiplier". Imagine if someone added "-ian" the end of your last name and made it the name of a function everybody uses. Pretty sweet!
Warning: Some authors use a convention where the sign of is reversed:
This doesn't make any difference when it comes to solving the problem, but you should keep it in mind in case the course you are taking or the text you are reading follows this convention.
When you want to maximize (or minimize) a multivariable function subject to the constraint that another multivariable function equals a constant, , follow these steps:
- Step 1: Introduce a new variable , and define a new function as follows:This function is called the "Lagrangian", and the new variable is referred to as a "Lagrange multiplier"
- Step 2: Set the gradient of equal to the zero vector.In other words, find the critical points of .
- Step 3: Consider each solution, which will look something like . Plug each one into . Or rather, first remove the component, then plug it into , since does not have as an input. Whichever one gives the greatest (or smallest) value is the maximum (or minimum) point you are seeking.
Want to join the conversation?
- In the final "side note" example graph, the equation for the red diagonal line should be x - y = 0, not x + y = 0(36 votes)
- Why is that the maximum or minimum value for f lies at the point where the contour lines of f and g are tangent? How did you prove it more rigorously?(8 votes)
- I won't prove this, but imagine them not being tangent, that would mean the function's contours cross the constraint at some point. If they cross, however, it means we can always shift the contour to a higher (or lower, depending of we're maximising or minimising) level and still be crossing the constraint. The time when you cannot shift it any higher is precisely at the level where they're only touching (tangent), not crossing. This is all assuming we have well-behaved functions/constraints(17 votes)
- “There's a slight twist to this story, best illustrated with an example.” is repeated.(5 votes)
- Use the method of Lagrange Multipliers to determine the maximum and minimum of f(x,y,z) = x + y + z subject to the two conditions g(x,y,z) = x2 + y2 − 2 = 0 and h(x, y, z) = x + z − 1 = 0(2 votes)
- We haven't learned about multiple constraints yet. The Lagrangian Function for multiple constraints is ℒ(x1,x2,...,xn,λ1,λ2,...,λM) = f(x1,x2,...,xn) - Σ λkgk(x1,x2,...,xn), where f(x) is the function to be maximized and g1, g2, ..., gM are the constraints (the Σ sum is from k = 1 to M). Note that here we use gk(x) = 0 instead of gk(x) = c.
We take the three gradients and get ∇f = i + j + k, ∇g = 2x i + 2y j, ∇h = i + k. ∇f = λ1∇g + λ2∇h, and g = h = 0, so we get: 1 = 2xλ1 + λ2, 1 = 2yλ1, 1 = λ2, x^2 + y^2 = 2, and x + z - 1 = 0. We see that λ2 = 1, so 2xλ1 = 0. because λ1 = 1/2y, x/y = 0, so x must be 0. Then y = +-√2, and λ1 = +-√2/4, and z = 1.
So we get two solutions: (0,√2,1) and (0,-√2,1). I think you can figure out which is the maximum.
If you can graph this, try it, and you'll see how stupid our computations were. After all this work the answers are on the y-z plane!(5 votes)
- Hi, thanks for the article. Question ... how does a parallel gradient tell us that the 2 contour lines are tangent to each other? Parallel gradients should be possible even if the 2 functions don't touch right? Thank you.(2 votes)
- Actually, parallel gradients tell us that the contour lines are parallel. Contour lines are tangent when they are both touching and parallel at the place they're touching. Of course, we had to answer the question "Where are they parallel?" The direction of gradients and contour lines are different at every point.
Therefore, the gradients of the two functions need to be parallel at the same point. ∇f(0,1) being parallel to ∇g(1,0) wouldn't be much use. ∇f(0,1) must be parallel to ∇g(0,1).
Of course, this says nothing about where the functions f and g are, only what their gradients (slants or tilts) are. In fact, it doesn't matter whether f(0,1) = g(0,1). After all, we are restricting g to g(x,y) = c, and c may be arbitrary. If the restraint is x + y = 4, we can let g(x,y) = x + y and c = 4, or we can let g(x,y) = x + y - 4 and c = 0, or g(x,y) = x + y + 100 and c = 104. So how "high" g is doesn't matter at all, it's just its gradient (slant or tilt) that we need.
Of course, sometimes you might be required to let c = 0, but there's no difference. The "height" of g doesn't matter.(3 votes)
- What does it mean when you have more than one constraint (g and h), so you have 5 equations, 5 unknowns, but the answer you get for the Lagrange multipliers don't come out as a nice scalars? For example lambda ends up being something that depends on x and y.(2 votes)
In the chapter explaining When the gradient comes into play, we insert the constraint in
g(x, y)formula (we have
x² + y² - 1instead of x² + y²).
Why do we insert the constraint
-1when computing the gradient ?(2 votes)
- At that phase in the lesson, I think there is no difference (constants do not affect the gradient there). The problem is that the latter phases of the lesson use that same g-function multiplied with a Lagrange multiplier, and since we want the partial derivative of λg with respect to λ to be g = function − c so that setting that partial derivative to zero returns the constraint function = c, the constant is a bit more useful in the later phases.(1 vote)
- The contour graph for f(x,y) = 2x^2 + √(5y) doesn't seem right. For example, check the point (0,2). It should equal √10, yet in the contour graph, it has two values, √10 and -√10. And f(2,3) should be 8 + √15, about 12, but on the plot it seems it also has a value at about 4.
I think there is a bug in the program that displays this plot. Once you square everything and solve the equation, you get roots that weren't there before. For example, if you square both sides of x = √2, you get x^2 = 2, so x = +-√2. Though I'm not sure if that's what's wrong with the program.
I graphed the function with my calculator and there was only one value for (0,2) and (2,3). But then, f is a function, so the can only be one output for each pair of inputs. The contour plot above doesn't seem right.(2 votes)
- Hi, in the section "More general form" of Lagrangian function, it says:
"Its output will always be one-dimensional, though, since there's not a clear notion of "maximum" with vector-valued outputs."
Although we can have notion of maximum for vectors by taking norms!
Then why the Lagrangian is defined only for 1-D output ?(1 vote)
- What if there is more than one constraint equation then how do we extremize a given function(1 vote)
- You would take del f= (lambda 1)(del g) + (lambda 2)(del g) and solve the system of equations carefully to find lambda 1 and 2 and ultimately x, y, z.(1 vote)