Main content

## Multivariable calculus

### Course: Multivariable calculus > Unit 3

Lesson 2: Quadratic approximations- What do quadratic approximations look like
- Quadratic approximation formula, part 1
- Quadratic approximation formula, part 2
- Quadratic approximation example
- The Hessian matrix
- The Hessian matrix
- Expressing a quadratic form with a matrix
- Vector form of multivariable quadratic approximation
- The Hessian
- Quadratic approximation

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Quadratic approximation

Quadratic approximations extend the notion of a local linearization, giving an even closer approximation of a function.

## What we're building to

The goal, as with a local linearization, is to approximate a potentially complicated multivariable function $f$ near some input, which I'll write as the vector ${\mathbf{\text{x}}}_{0}$ . A quadratic approximation does this more tightly than a local linearization, using the information given by second partial derivatives.

**Non-vector form**

In the specific case where the input of $f$ is two dimensional, and you are approximating near a point $({x}_{0},{y}_{0})$ , you will see below that the quadratic approximation ends up looking like this:

**Vector form**:

The general form of this, for a scalar-valued function $f$ with any kind of multidimensional input, here's what that approximation looks like:

I know it looks a bit complicated, but I'll step through it piece by piece later on. Here's a brief outline of each term.

is a function with multi-dimensional input and a scalar output.$f$ is the gradient of$\mathrm{\nabla}f({\mathbf{\text{x}}}_{0})$ evaluated at$f$ .${\mathbf{\text{x}}}_{0}$ is the Hessian matrix of${\mathbf{\text{H}}}_{f}({\mathbf{\text{x}}}_{0})$ evaluated at$f$ .${\mathbf{\text{x}}}_{0}$ - The vector
is a specific input, the one we are approximating near.${\mathbf{\text{x}}}_{0}$ - The vector
represents the variable input.$\mathbf{\text{x}}$ - The approximation function,
, has the same value as${Q}_{f}$ at the point$f$ , all its partial derivatives have the same value as those of${\mathbf{\text{x}}}_{0}$ at this point, and all its$f$ *second*partial derivatives have the same value as those of at this point.$f$

## Tighter and tighter approximations

Imagine you are given some function $f(x,y)$ with two inputs and one output, such as

The goal is to find a simpler function that approximates $f(x,y)$ near some particular point $({x}_{0},{y}_{0})$ . For example,

## Zero-order approximation

The most naive approximation would be a constant function which equals the value of $f$ at $({x}_{0},{y}_{0})$ everywhere. We call this a "$0$ -order approximation".

**In the example**:

**Written in the abstract**:

**Graphically**:

The graph of this approximation function $C(x,y)$ is a flat plane passing through the graph of our function at the point $({x}_{0},{y}_{0},f({x}_{0},{y}_{0}))$ . Below is a video showing how this approximation changes as we move the point $({x}_{0},{y}_{0})$ around.

The graph of $f$ is pictured in blue, the graph of the approximation is white, and the point $({x}_{0},{y}_{0},f({x}_{0},{y}_{0}))$ is pictured as a red dot.

## First-order approximation

The constant function zero-order approximation is pretty lousy. Sure, it is guaranteed to equal $f(x,y)$ at the point $({x}_{0},{y}_{0})$ , but that's about it. One step better is to use a local linearization, also known as a "First-order approximation".

**In the example**:

**Written in the abstract**:

**Graphically**:

The graph of a local linearization is the plane $f$ at the point $({x}_{0},{y}_{0},f({x}_{0},{y}_{0}))$ . Here is a video showing how this approximation changes as we move around the point $({x}_{0},{y}_{0})$ :

**tangent**to the graph of## Second-order approximation.

Better still is a

**quadratic approximation**, also called a "second-order approximation".The remainder of this article is devoted to finding and understanding the analytic form of such an approximation, but before diving in, let's see what such approximations look like graphically. You can think of these approximations as nestling into the curves the graph at the point $({x}_{0},{y}_{0},f({x}_{0},{y}_{0}))$ , giving it a sort of mathematical hug.

## "Quadratic" means product of two variables

In single variable functions, the word "quadratic" refers to any situation where a variable is squared as in the term ${x}^{2}$ . With multiple variables, "quadratic" refers not only to square terms, like ${x}^{2}$ and ${y}^{2}$ , but also terms that involve the product of two separate variables, such as $xy$ .

In general, the "order" of a term which is the product of several things, such as $3{x}^{2}{y}^{3}$ , is the total number of $5$ : Two $x$ 's, three $y$ 's, and the constant doesn't matter.

*variables*multiplied into that term. In this case, the order would be## Graphs of quadratic functions

One way to think of quadratic functions is in terms of their

**concavity**, which might depend on which direction you are moving in.If the function has an upward concavity, as is the case, for example, with $f(x,y)={x}^{2}+{y}^{2}$ , the graph will look something like this:

This shape, which is a three-dimensional parabola, goes by the name

**paraboloid**.If the function is concave up in one direction and linear in another, the graph looks like a parabolic curve has been dragged through space to trace out a surface. For example this happens in the case of $f(x,y)={x}^{2}+y$ :

Finally, if the graph is concave up when traveling in one direction, but concave down when traveling in another direction, as is the case for $f(x,y)={x}^{2}-{y}^{2}$ , the graph looks a bit like a saddle. Here's what such a graph looks like:

## Reminder on the local linearization recipe

To actually write down a quadratic approximation of a function $f$ near the point $({x}_{0},{y}_{0})$ , we build up from the local linearization:

It's worth walking through the recipe for finding the local linearization one more time since the recipe for finding a quadratic approximation is very similar.

- Start with the constant term
, so that our approximation at least matches$f({x}_{0},{y}_{0})$ at the point$f$ .$({x}_{0},{y}_{0})$ - Add on linear terms
and${{f}_{x}({x}_{0},{y}_{0})}(x-{x}_{0})$ .${{f}_{y}({x}_{0},{y}_{0})}(y-{y}_{0})$ - Use the constants
and${{f}_{x}({x}_{0},{y}_{0})}$ to ensure that our approximation has the same partial derivatives as${{f}_{y}({x}_{0},{y}_{0})}$ at the point$f$ .$({x}_{0},{y}_{0})$ - Use the terms
and$(x-{x}_{0})$ instead of simply$(y-{y}_{0})$ and$x$ so that we don't mess up the fact that our approximation equals$y$ at the point$f({x}_{0},{y}_{0})$ .$({x}_{0},{y}_{0})$

## Finding the quadratic approximation

For the quadratic approximation, we add on the quadratic terms $(x-{x}_{0}{)}^{2}$ , $(x-{x}_{0})(y-{y}_{0})$ , and $(y-{y}_{0}{)}^{2}$ , and for now we write their coefficients as the constants ${a}$ , ${b}$ and ${c}$ which we will solve for in a moment:

In the same way that we made sure that the local linearization has the same partial derivatives as $f$ at $({x}_{0},{y}_{0})$ , we want the quadratic approximation to have the same second partial derivatives as $f$ at this point.

The really nice thing about the way I wrote ${Q}_{f}$ above is that the second partial derivative $\frac{{\partial}^{2}{Q}_{f}}{\partial {x}^{2}}$ depends ${a}(x-{x}_{0}{)}^{2}$ term.

*only*on the**Try it!**Take the second partial derivative with respect to of every term in the expression of$x$ above, and notice that they all go to zero except for the${Q}_{f}(x,y)$ term.${a}(x-{x}_{0}{)}^{2}$

Did you really try it? I'm serious, take a moment to reason through it. It really helps in understanding why ${Q}_{f}$ is expressed the way it is.

This fact is nice because rather than taking the second partial derivative of the entire monstrous expression, you can view it like this:

Since the goal is for this to match ${f}_{xx}(x,y)$ at the point $({x}_{0},{y}_{0})$ , you can solve for ${a}$ like this:

**Test yourself**: Use similar reasoning to figure out what the constants

We can now write our final quadratic approximation, with all six of its terms working in harmony to mimic the behavior of $f$ at $({x}_{0},{y}_{0})$ :

## Example: Approximating $\mathrm{sin}(x)\mathrm{cos}(y)$

To see this beast in action, let's try it out on the function from the introduction.

**Problem**: Find the quadratic approximation of

about the point $(x,y)=({\displaystyle \frac{\pi}{3}},{\displaystyle \frac{\pi}{6}})$ .

**Solution**:

To collect all the necessary information, you need to evaluate $f(x,y)=\mathrm{sin}(x)\mathrm{cos}(y)$ and all if its partial derivatives and all of its second partial derivatives at the point $({\displaystyle \frac{\pi}{3}},{\displaystyle \frac{\pi}{6}})$ .

Almost there! As a final step, apply all these values to the formula for a quadratic approximation.

So for example, to generate the animation of quadratic approximations, this is the formula I had to plug into the graphing software.

## Vector notation using the Hessian

Perhaps it goes without saying that the expression for the quadratic approximation is long. Now imagine if $f$ had three inputs, $x$ , $y$ and $z$ . In principle you can imagine how this might go, adding terms involving ${f}_{z}$ , ${f}_{xz}$ , ${f}_{zz}$ , on and on with all $3$ partial derivatives and all $9$ second partial derivative. But this would be a total nightmare!

Now imagine you were writing a program to find the quadratic approximation of a function with $100$ inputs. Madness!

It actually doesn't have to be that bad. When something is not that complicated in principle, it shouldn't be that complicated in notation. Quadratic approximations are a

*little*complicated, sure, but they're not absurd.Let's break this down:

- The boldfaced
represents the input variable(s) as a vector,$\mathbf{\text{x}}$ Moreover, is a particular vector in the input space. If this has two components, this formula for${\mathbf{\text{x}}}_{0}$ is just a different way to write the one we derived before, but it could also represent a vector with any other dimension.${Q}_{f}$ - The dot product
will expand into the sum of all terms of the form$\mathrm{\nabla}f({\mathbf{\text{x}}}_{0})\cdot (\mathbf{\text{x}}-{\mathbf{\text{x}}}_{0})$ ,${f}_{x}({\mathbf{\text{x}}}_{0})(x-{x}_{0})$ , etc. if this is not familiar from the vector notation for local linearization, work it out for yourself in the case of${f}_{y}({\mathbf{\text{x}}}_{0})(y-{y}_{0})$ -dimensions to see!$2$ - The little superscript
in the expression$T$ indicates "transpose". This means you take the initial vector$(\mathbf{\text{x}}-{\mathbf{\text{x}}}_{0}{)}^{\mathrm{T}}$ , which looks something like this:$(\mathbf{\text{x}}-{\mathbf{\text{x}}}_{0})$ Then you flip it, to get something like this: - The expression
might seem complicated if you have never come across something like it before. This way of expressing quadratic terms is actually quite common in vector-calculus and vector-algebra, so it's worth expanding an expression like this at least a few times in your life. For example, try working it out in the case where$(\mathbf{\text{x}}-{\mathbf{\text{x}}}_{0}{)}^{\mathrm{T}}{\mathbf{\text{H}}}_{f}({\mathbf{\text{x}}}_{0})(\mathbf{\text{x}}-{\mathbf{\text{x}}}_{0})$ is two-dimensional to see what it looks like.$\mathbf{\text{x}}$ You should find that it is exactly times the quadratic portion of the non-vectorized formula we derived above.$2$

## What's the point?

In truth, it is a real pain to compute a quadratic approximation by hand, and it requires staying

*very*organized to do so without making a little mistake. In practice, people rarely work through a quadratic approximation like the example above, but knowing how they work is useful for at least two broad reasons:**Computation**: Even if you never have to write out a quadratic approximation, you may one day need to program a computer to do it for a particular function. Or even if you are relying on someone else's program, you may need to analyze how and why the approximation is failing in some circumstance.**Theory**: Being able to reference a second-order approximation helps us to reason about the behavior of general functions near a point. This will be useful later in figuring out if a point is a local maximum or minimum.

## Want to join the conversation?

- In the worked example (Approximating sin(x)cos(y)) the very last term in the solution (fyy) is written in brown as 3/4 - this is missing a minus sign(10 votes)
- In the example using sin(x)cos(y), the second derivative with respect to y (the last one) is sin(x)cos(y), but shouldn't it be -sin(x)cos(y)? If you have the first partial as -sin(x)sin(y), and take the partial of that with respect to y, you get the derivative of sin(y) = cos(y), not -cos(y), right? Why did the sign change again?(9 votes)
- during the last part ("vector notation using the hessian") I do not understand why is it necessary to transpose that vector in the quadratic term. I mean.. You can expand the quadratic term exacly in the same manner without transposing that vector right?? As it is done in the exercise you end up with 2 vectors, why would you need to have the vector on the left transposed??(3 votes)
- The dimensions must be right for matrix multiplication.(6 votes)

- fyy(x,y) = -sin(x)cos(y) not sin(x)cos(y).(4 votes)
- So, could these sorts of things be used to generalise the taylor series to higher dimension?(3 votes)
- Yep it is a generalisation, higher order terms consist of tensorlike operations (3. order fijk(x1,x2)*xi*xj*xk, while 2. order terms can be written as a matrix multiplication).(3 votes)

- What about cubic approximations? Would we need a cubical "Hessian Matrix" analogue?

And how would we define the multiplication?(2 votes) - What is the formula (not in the vector/matrix form) for a quadratic approximation when z is added to the input of the function f, making it f(x,y,z)?(2 votes)
- At the top, in your definition of Qf(x), I think the partial derivatives of Q are not the same as the partial derivatives of f, due to the presence of the quadratic term. Only the second partials match. I suppose we could modify the "coefficients" on the first-order term to include the negative of the value of the partial derivatives of the quadratic term. Would this improve the approximation? Hmm.(1 vote)
- When you evaluate at the particular point (x_0, y_0), the partial derivatives of the quadratic term go to zero.(2 votes)

- For the solution of finding the b constant, finding the first partial derivative with respect to y does not make c(y - y0)^2 zero. It would actually be 2c(y - y0). Nevertheless, this has no effect in the final answer as applying the partial derivative respect to x makes that term zero.(1 vote)
- Would it be possible to find f given Q and the input vector? so like finding a best for a particular set of data(1 vote)