- What do quadratic approximations look like
- Quadratic approximation formula, part 1
- Quadratic approximation formula, part 2
- Quadratic approximation example
- The Hessian matrix
- The Hessian matrix
- Expressing a quadratic form with a matrix
- Vector form of multivariable quadratic approximation
- The Hessian
- Quadratic approximation
Quadratic approximations extend the notion of a local linearization, giving an even closer approximation of a function.
What we're building to
The goal, as with a local linearization, is to approximate a potentially complicated multivariable function
near some input, which I'll write as the vector . A quadratic approximation does this more tightly than a local linearization, using the information given by second partial derivatives.
In the specific case where the input of
is two dimensional, and you are approximating near a point , you will see below that the quadratic approximation ends up looking like this:
The general form of this, for a scalar-valued function
with any kind of multidimensional input, here's what that approximation looks like:
I know it looks a bit complicated, but I'll step through it piece by piece later on. Here's a brief outline of each term.
is a function with multi-dimensional input and a scalar output. is the gradient of evaluated at . is the Hessian matrix of evaluated at .
- The vector
is a specific input, the one we are approximating near.
- The vector
represents the variable input.
- The approximation function,
, has the same value as at the point , all its partial derivatives have the same value as those of at this point, and all its second partial derivatives have the same value as those of at this point.
Tighter and tighter approximations
Imagine you are given some function
with two inputs and one output, such as
The goal is to find a simpler function that approximates
near some particular point . For example,
The most naive approximation would be a constant function which equals the value of
at everywhere. We call this a " -order approximation".
In the example:
Written in the abstract:
The graph of this approximation function
is a flat plane passing through the graph of our function at the point . Below is a video showing how this approximation changes as we move the point around.
The graph of
is pictured in blue, the graph of the approximation is white, and the point is pictured as a red dot.
The constant function zero-order approximation is pretty lousy. Sure, it is guaranteed to equal
at the point , but that's about it. One step better is to use a local linearization, also known as a "First-order approximation".
In the example:
Written in the abstract:
and denote the partial derivatives of .
The graph of a local linearization is the plane tangent to the graph of
at the point . Here is a video showing how this approximation changes as we move around the point :
Better still is a quadratic approximation, also called a "second-order approximation".
The remainder of this article is devoted to finding and understanding the analytic form of such an approximation, but before diving in, let's see what such approximations look like graphically. You can think of these approximations as nestling into the curves the graph at the point
, giving it a sort of mathematical hug.
"Quadratic" means product of two variables
In single variable functions, the word "quadratic" refers to any situation where a variable is squared as in the term
. With multiple variables, "quadratic" refers not only to square terms, like and , but also terms that involve the product of two separate variables, such as .
In general, the "order" of a term which is the product of several things, such as
, is the total number of variables multiplied into that term. In this case, the order would be : Two 's, three 's, and the constant doesn't matter.
Graphs of quadratic functions
One way to think of quadratic functions is in terms of their concavity, which might depend on which direction you are moving in.
If the function has an upward concavity, as is the case, for example, with
, the graph will look something like this:
This shape, which is a three-dimensional parabola, goes by the name paraboloid.
If the function is concave up in one direction and linear in another, the graph looks like a parabolic curve has been dragged through space to trace out a surface. For example this happens in the case of
Finally, if the graph is concave up when traveling in one direction, but concave down when traveling in another direction, as is the case for
, the graph looks a bit like a saddle. Here's what such a graph looks like:
Reminder on the local linearization recipe
To actually write down a quadratic approximation of a function
near the point , we build up from the local linearization:
It's worth walking through the recipe for finding the local linearization one more time since the recipe for finding a quadratic approximation is very similar.
- Start with the constant term
, so that our approximation at least matches at the point .
- Add on linear terms
- Use the constants
and to ensure that our approximation has the same partial derivatives as at the point .
- Use the terms
and instead of simply and so that we don't mess up the fact that our approximation equals at the point .
Finding the quadratic approximation
For the quadratic approximation, we add on the quadratic terms
, , and , and for now we write their coefficients as the constants , and which we will solve for in a moment:
In the same way that we made sure that the local linearization has the same partial derivatives as
at , we want the quadratic approximation to have the same second partial derivatives as at this point.
The really nice thing about the way I wrote
above is that the second partial derivative depends only on the term.
- Try it! Take the second partial derivative with respect to
of every term in the expression of above, and notice that they all go to zero except for the term.
Did you really try it? I'm serious, take a moment to reason through it. It really helps in understanding why
is expressed the way it is.
This fact is nice because rather than taking the second partial derivative of the entire monstrous expression, you can view it like this:
Since the goal is for this to match
at the point , you can solve for like this:
Test yourself: Use similar reasoning to figure out what the constants
and should be.
We can now write our final quadratic approximation, with all six of its terms working in harmony to mimic the behavior of
To see this beast in action, let's try it out on the function from the introduction.
Problem: Find the quadratic approximation of
about the point
To collect all the necessary information, you need to evaluate
and all if its partial derivatives and all of its second partial derivatives at the point .
Almost there! As a final step, apply all these values to the formula for a quadratic approximation.
So for example, to generate the animation of quadratic approximations, this is the formula I had to plug into the graphing software.
Vector notation using the Hessian
Perhaps it goes without saying that the expression for the quadratic approximation is long. Now imagine if
had three inputs, , and . In principle you can imagine how this might go, adding terms involving , , , on and on with all partial derivatives and all second partial derivative. But this would be a total nightmare!
Now imagine you were writing a program to find the quadratic approximation of a function with
It actually doesn't have to be that bad. When something is not that complicated in principle, it shouldn't be that complicated in notation. Quadratic approximations are a little complicated, sure, but they're not absurd.
Let's break this down:
- The boldfaced
represents the input variable(s) as a vector,Moreover, is a particular vector in the input space. If this has two components, this formula for is just a different way to write the one we derived before, but it could also represent a vector with any other dimension.
- The dot product
will expand into the sum of all terms of the form , , etc. if this is not familiar from the vector notation for local linearization, work it out for yourself in the case of -dimensions to see!
- The little superscript
in the expression indicates "transpose". This means you take the initial vector , which looks something like this:Then you flip it, to get something like this: is the Hessian of .
- The expression
might seem complicated if you have never come across something like it before. This way of expressing quadratic terms is actually quite common in vector-calculus and vector-algebra, so it's worth expanding an expression like this at least a few times in your life. For example, try working it out in the case where is two-dimensional to see what it looks like.You should find that it is exactly times the quadratic portion of the non-vectorized formula we derived above.
What's the point?
In truth, it is a real pain to compute a quadratic approximation by hand, and it requires staying very organized to do so without making a little mistake. In practice, people rarely work through a quadratic approximation like the example above, but knowing how they work is useful for at least two broad reasons:
- Computation: Even if you never have to write out a quadratic approximation, you may one day need to program a computer to do it for a particular function. Or even if you are relying on someone else's program, you may need to analyze how and why the approximation is failing in some circumstance.
- Theory: Being able to reference a second-order approximation helps us to reason about the behavior of general functions near a point. This will be useful later in figuring out if a point is a local maximum or minimum.
Want to join the conversation?
- In the worked example (Approximating sin(x)cos(y)) the very last term in the solution (fyy) is written in brown as 3/4 - this is missing a minus sign(10 votes)
- In the example using sin(x)cos(y), the second derivative with respect to y (the last one) is sin(x)cos(y), but shouldn't it be -sin(x)cos(y)? If you have the first partial as -sin(x)sin(y), and take the partial of that with respect to y, you get the derivative of sin(y) = cos(y), not -cos(y), right? Why did the sign change again?(9 votes)
- during the last part ("vector notation using the hessian") I do not understand why is it necessary to transpose that vector in the quadratic term. I mean.. You can expand the quadratic term exacly in the same manner without transposing that vector right?? As it is done in the exercise you end up with 2 vectors, why would you need to have the vector on the left transposed??(3 votes)
- So, could these sorts of things be used to generalise the taylor series to higher dimension?(3 votes)
- Yep it is a generalisation, higher order terms consist of tensorlike operations (3. order fijk(x1,x2)*xi*xj*xk, while 2. order terms can be written as a matrix multiplication).(3 votes)
- What about cubic approximations? Would we need a cubical "Hessian Matrix" analogue?
And how would we define the multiplication?(2 votes)
- What is the formula (not in the vector/matrix form) for a quadratic approximation when z is added to the input of the function f, making it f(x,y,z)?(2 votes)
- At the top, in your definition of Qf(x), I think the partial derivatives of Q are not the same as the partial derivatives of f, due to the presence of the quadratic term. Only the second partials match. I suppose we could modify the "coefficients" on the first-order term to include the negative of the value of the partial derivatives of the quadratic term. Would this improve the approximation? Hmm.(1 vote)
- When you evaluate at the particular point (x_0, y_0), the partial derivatives of the quadratic term go to zero.(2 votes)
- For the solution of finding the b constant, finding the first partial derivative with respect to y does not make c(y - y0)^2 zero. It would actually be 2c(y - y0). Nevertheless, this has no effect in the final answer as applying the partial derivative respect to x makes that term zero.(1 vote)
- Would it be possible to find f given Q and the input vector? so like finding a best for a particular set of data(1 vote)