If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Quadratic approximation

Quadratic approximations extend the notion of a local linearization, giving an even closer approximation of a function.

What we're building to

The goal, as with a local linearization, is to approximate a potentially complicated multivariable function f near some input, which I'll write as the vector x0. A quadratic approximation does this more tightly than a local linearization, using the information given by second partial derivatives.
Non-vector form
In the specific case where the input of f is two dimensional, and you are approximating near a point (x0,y0), you will see below that the quadratic approximation ends up looking like this:
Vector form:
The general form of this, for a scalar-valued function f with any kind of multidimensional input, here's what that approximation looks like:
Qf(x)=f(x0)Constant+f(x0)(xx0)Linear term+12(xx0)THf(x0)(xx0)Quadratic term
I know it looks a bit complicated, but I'll step through it piece by piece later on. Here's a brief outline of each term.
  • f is a function with multi-dimensional input and a scalar output.
  • f(x0) is the gradient of f evaluated at x0.
  • Hf(x0) is the Hessian matrix of f evaluated at x0.
  • The vector x0 is a specific input, the one we are approximating near.
  • The vector x represents the variable input.
  • The approximation function, Qf, has the same value as f at the point x0, all its partial derivatives have the same value as those of f at this point, and all its second partial derivatives have the same value as those of f at this point.

Tighter and tighter approximations

Imagine you are given some function f(x,y) with two inputs and one output, such as
Khan Academy video wrapper
The goal is to find a simpler function that approximates f(x,y) near some particular point (x0,y0). For example,

Zero-order approximation

The most naive approximation would be a constant function which equals the value of f at (x0,y0) everywhere. We call this a "0-order approximation".
In the example:
Written in the abstract:
C(x,y)=f(x0,y0)Constant function
The graph of this approximation function C(x,y) is a flat plane passing through the graph of our function at the point (x0,y0,f(x0,y0)). Below is a video showing how this approximation changes as we move the point (x0,y0) around.
Khan Academy video wrapper
The graph of f is pictured in blue, the graph of the approximation is white, and the point (x0,y0,f(x0,y0)) is pictured as a red dot.

First-order approximation

The constant function zero-order approximation is pretty lousy. Sure, it is guaranteed to equal f(x,y) at the point (x0,y0), but that's about it. One step better is to use a local linearization, also known as a "First-order approximation".
In the example:
Written in the abstract:
Here, fx and fy denote the partial derivatives of f.
The graph of a local linearization is the plane tangent to the graph of f at the point (x0,y0,f(x0,y0)). Here is a video showing how this approximation changes as we move around the point (x0,y0):
Khan Academy video wrapper

Second-order approximation.

Better still is a quadratic approximation, also called a "second-order approximation".
The remainder of this article is devoted to finding and understanding the analytic form of such an approximation, but before diving in, let's see what such approximations look like graphically. You can think of these approximations as nestling into the curves the graph at the point (x0,y0,f(x0,y0)), giving it a sort of mathematical hug.
Khan Academy video wrapper

"Quadratic" means product of two variables

In single variable functions, the word "quadratic" refers to any situation where a variable is squared as in the term x2. With multiple variables, "quadratic" refers not only to square terms, like x2 and y2, but also terms that involve the product of two separate variables, such as xy.
In general, the "order" of a term which is the product of several things, such as 3x2y3, is the total number of variables multiplied into that term. In this case, the order would be 5: Two x's, three y's, and the constant doesn't matter.

Graphs of quadratic functions

One way to think of quadratic functions is in terms of their concavity, which might depend on which direction you are moving in.
If the function has an upward concavity, as is the case, for example, with f(x,y)=x2+y2, the graph will look something like this:
This shape, which is a three-dimensional parabola, goes by the name paraboloid.
If the function is concave up in one direction and linear in another, the graph looks like a parabolic curve has been dragged through space to trace out a surface. For example this happens in the case of f(x,y)=x2+y:
Parabola dragged through space
Finally, if the graph is concave up when traveling in one direction, but concave down when traveling in another direction, as is the case for f(x,y)=x2y2, the graph looks a bit like a saddle. Here's what such a graph looks like:
Khan Academy video wrapper

Reminder on the local linearization recipe

To actually write down a quadratic approximation of a function f near the point (x0,y0), we build up from the local linearization:
Lf(x,y)=f(x0,y0)Constant term+fx(x0,y0)(xx0)+fy(x0,y0)(yy0)Linear terms
It's worth walking through the recipe for finding the local linearization one more time since the recipe for finding a quadratic approximation is very similar.
  • Start with the constant term f(x0,y0), so that our approximation at least matches f at the point (x0,y0).
  • Add on linear terms fx(x0,y0)(xx0) and fy(x0,y0)(yy0).
  • Use the constants fx(x0,y0) and fy(x0,y0) to ensure that our approximation has the same partial derivatives as f at the point (x0,y0).
  • Use the terms (xx0) and (yy0) instead of simply x and y so that we don't mess up the fact that our approximation equals f(x0,y0) at the point (x0,y0).

Finding the quadratic approximation

For the quadratic approximation, we add on the quadratic terms (xx0)2, (xx0)(yy0), and (yy0)2, and for now we write their coefficients as the constants a, b and c which we will solve for in a moment:
Qf(x,y)=f(x0,y0)Order 0 part+fx(x0,y0)(xx0)+fy(x0,y0)(yy0)Order 1 part+a(xx0)2+b(xx0)(yy0)+c(yy0)2Quadratic part
In the same way that we made sure that the local linearization has the same partial derivatives as f at (x0,y0), we want the quadratic approximation to have the same second partial derivatives as f at this point.
The really nice thing about the way I wrote Qf above is that the second partial derivative 2Qfx2 depends only on the a(xx0)2 term.
  • Try it! Take the second partial derivative with respect to x of every term in the expression of Qf(x,y) above, and notice that they all go to zero except for the a(xx0)2 term.
Did you really try it? I'm serious, take a moment to reason through it. It really helps in understanding why Qf is expressed the way it is.
This fact is nice because rather than taking the second partial derivative of the entire monstrous expression, you can view it like this:
2Qfx2(x,y)=(A bunch of 0’s)+2x2a(xx0)2+(more 0’s)=x2a(xx0)=2a
Since the goal is for this to match fxx(x,y) at the point (x0,y0), you can solve for a like this:
Test yourself: Use similar reasoning to figure out what the constants b and c should be.
We can now write our final quadratic approximation, with all six of its terms working in harmony to mimic the behavior of f at (x0,y0):

Example: Approximating sin(x)cos(y)

To see this beast in action, let's try it out on the function from the introduction.

Problem: Find the quadratic approximation of
about the point (x,y)=(π3,π6).

To collect all the necessary information, you need to evaluate f(x,y)=sin(x)cos(y) and all if its partial derivatives and all of its second partial derivatives at the point (π3,π6).






Almost there! As a final step, apply all these values to the formula for a quadratic approximation.
So for example, to generate the animation of quadratic approximations, this is the formula I had to plug into the graphing software.
Khan Academy video wrapper

Vector notation using the Hessian

Perhaps it goes without saying that the expression for the quadratic approximation is long. Now imagine if f had three inputs, x, y and z. In principle you can imagine how this might go, adding terms involving fz, fxz, fzz, on and on with all 3 partial derivatives and all 9 second partial derivative. But this would be a total nightmare!
Now imagine you were writing a program to find the quadratic approximation of a function with 100 inputs. Madness!
It actually doesn't have to be that bad. When something is not that complicated in principle, it shouldn't be that complicated in notation. Quadratic approximations are a little complicated, sure, but they're not absurd.
Using vectors and matrices, specifically the gradient and Hessian of f, we can write the quadratic approximation Qf as follows:
Qf(x)=f(x0)Constant+f(x0)(xx0)Linear term+12(xx0)THf(x0)(xx0)Quadratic term
Let's break this down:
  • The boldfaced x represents the input variable(s) as a vector,
    Moreover, x0 is a particular vector in the input space. If this has two components, this formula for Qf is just a different way to write the one we derived before, but it could also represent a vector with any other dimension.
  • The dot product f(x0)(xx0) will expand into the sum of all terms of the form fx(x0)(xx0), fy(x0)(yy0), etc. if this is not familiar from the vector notation for local linearization, work it out for yourself in the case of 2-dimensions to see!
  • The little superscript T in the expression (xx0)T indicates "transpose". This means you take the initial vector (xx0), which looks something like this:
    Then you flip it, to get something like this:
  • Hf(x0) is the Hessian of f.
  • The expression (xx0)THf(x0)(xx0) might seem complicated if you have never come across something like it before. This way of expressing quadratic terms is actually quite common in vector-calculus and vector-algebra, so it's worth expanding an expression like this at least a few times in your life. For example, try working it out in the case where x is two-dimensional to see what it looks like.
    You should find that it is exactly 2 times the quadratic portion of the non-vectorized formula we derived above.

What's the point?

In truth, it is a real pain to compute a quadratic approximation by hand, and it requires staying very organized to do so without making a little mistake. In practice, people rarely work through a quadratic approximation like the example above, but knowing how they work is useful for at least two broad reasons:
  • Computation: Even if you never have to write out a quadratic approximation, you may one day need to program a computer to do it for a particular function. Or even if you are relying on someone else's program, you may need to analyze how and why the approximation is failing in some circumstance.
  • Theory: Being able to reference a second-order approximation helps us to reason about the behavior of general functions near a point. This will be useful later in figuring out if a point is a local maximum or minimum.

Want to join the conversation?