Main content

## Multivariable calculus

### Course: Multivariable calculus > Unit 3

Lesson 2: Quadratic approximations- What do quadratic approximations look like
- Quadratic approximation formula, part 1
- Quadratic approximation formula, part 2
- Quadratic approximation example
- The Hessian matrix
- The Hessian matrix
- Expressing a quadratic form with a matrix
- Vector form of multivariable quadratic approximation
- The Hessian
- Quadratic approximation

© 2023 Khan AcademyTerms of usePrivacy PolicyCookie Notice

# Quadratic approximation formula, part 2

A continuation from the previous video, leading to the full formula for the quadratic approximation of a two-variable function. Created by Grant Sanderson.

## Want to join the conversation?

- Ok, now I have a serious question... which song did Grant start singing? :P(45 votes)
- he was just saying line things up right here it isn't a real song(1 vote)

- okay i got a question at this point i am not sure if it is just me or what but i am starting to mix this up with the taylor series do these have some kind of connection? is this like some kind of multi dimensional equivalent of taylor series(23 votes)
- That is an exceptional point. This is in fact very closely related to the Taylor Series.

Just as functions can be multidimensional, so too can the Taylor Series.

How are they related?

Well, the Taylor Series is a means to*represent*some function (can be multidimensional) as a*polynomial*. I.e., of the form a+bx+cx^2+dx^3+...

Now, the Taylor Series can have infinite terms. The more terms the series has, the closer it is to the original function. But, if we cut the Taylor Series short, say, by only including the terms up to x^1, we have ourselves a linear approximation (or a local linearisation) of the function. However, if we include all the terms in the Taylor Series up to x^2, we have ourselves a*quadratic*approximation to the original function.

So, to summarise, *approximations are just the Taylor Series cut short.*

-If you cut it at x, you've got a linear approximation

-If you cut it at x^2, you've got a quadratic approximation

-If you cut it at x^3, you've got a cubic approximation

-If you cut it at x^4, you've got a quartic approximation

.

.

.

and so on.

Hope this helps.(31 votes)

- At1:13, why we want the 2nd partial derivative? I feel dull.. but I miss the point.(4 votes)
- My understanding of this topic is still growing but I feel I have an understanding sufficient to give you the intuition for an appropriate answer to your question..what I'm trying to say is, I might be wrong but I think I'm right.

The reason you take the second derivative is because the second derivative tells you what direction the curve around that region is going to be i.e. either positive or negative (think back to single variable quadratic equations and what second derivatives do there - it tells you whether the point you're looking at is curving up or down (max or min point)). These second derivative terms acts as the control knobs that make your graph curve and therefore hug your original function more closely allowing for closer approximations. In other words, the second derivative is what turns the flat tangent plane into the curvy sheet that Grant showed at9:15. If you're wondering why the first derivative wasn't used, its because the first derivative gives you the tangent line which is just a straight line which is what the linear terms already do. Quadratic terms are necessary for the second derivative to happen and its also what we want to make the second derivative a constant as mentioned at5:44.

Hope that helps:)(14 votes)

- I think it's very similar to taylor expansion except it's multivariable.(6 votes)
- Why do we still need the linear part of the quadratic approximation function? Why can't we just throw it out and keep the ax^2 + bxy + c^2 term?(4 votes)
- wondering where the article on quadratic approximations would be found ? - mentioned at9:03(2 votes)
- aren't you just doing taylor expansions except instead of approximating curves you're approximating surfaces?(2 votes)
- They both have some similarities, but we have to use slightly different methods because we are working in 3D.

But yes, the Taylor/Macluarin Expansion creates a quadratic that approximates a 2D graph and now we are creating a quadratic equation to approximate 3D graphs, so we have the same ideas in mind.

Hope this helps,

- Convenient Colleague(1 vote)

- Why do they call it a quadratic approximation and not a local quadratic approximation, you are still talking about a specific point on the curve.(1 vote)
- I think the linear approximation is only around that point, but the quadratic approximation is able to approximate more than just the point. It is still local but it covers a lot more area and therefore should be differentiated from the local aspect!(1 vote)

- Like Dave already asked, is there a reason we started out trying to create this formula with the linear part of the quadratic approximation function? Why can't we just throw it out and keep the ax^2 + bxy + c^2 term?

That is what I would have expected, and Grant doesn't really explain why.

Thanks for your time. :)(1 vote) - In which practical problems would we use quadratic approximations?(1 vote)

## Video transcript

- [Voiceover] ♫ Line things
up a little bit right here. ♫ All right. So in the last video I
set up the scaffolding for the quadratic approximation which I'm calling Q of a function, an arbitrary two variable
function which I'm calling f, and the form that we have right now looks like quite a lot actually. We have six different terms. Now the first three were
just basically stolen from the local linearization formula and written in their full abstractness. It almost makes it seem a little bit more complicated than it is. And then these next three terms are basically the quadratic parts. We have what is basically X squared. We take it as X minus X naught squared so that we don't mess
with anything previously once we plug in X equals X naught, but basically we think
of this as X squared. And then this here is basically X times Y, but of course we're
matching each one of them with the corresponding X naught Y naught, and then this term is the Y squared. And the question at hand is how do we fill in these constants? The coefficients in front of each one of these quadratic terms to make it so that this guy Q hugs the graph of f as closely as possible. And I showed that in the very first video, kind of what that hugging means. Now in formulas, the goal here, I should probably state, what it is that we want is for the second partial derivatives of Q, so for example if we take
the partial derivative with respect to X twice in a row, we want it to be the case
that if you take that guy and you evaluate it at
the point of interest, the point about which
we are approximating, it should be the same as when you take the second partial derivative of f or the corresponding
second partial derivative I should say since
there's multiple different second partial derivatives, and you evaluate it at that same point. And of course we want this to be true not just with the second
partial derivative with respect to X twice in a row, but if we did it with the other ones. Like for example, let's say we took the partial derivative first with respect to X, and then with respect to Y. This is called the mixed
partial derivative. We want it to be the case
that when we evaluate that at the point of interest
it's the same as taking the mixed partial derivative
of f with respect to X, and then with respect to Y, and we evaluate it at that same point. And remember, for almost all
functions that you deal with, when you take this
second partial derivative where we mix two of the variables, it doesn't matter the order in which you take them, right? You could take it first
with respect to X then Y or you could it first with respect to Y, and then with respect to X. Usually these guys are equal. There are some functions
for which this isn't true, but we're going to basically assume that we're dealing with
functions where this is. So, that's the only
mixed partial derivative that we have to take into account. And I'll just kind of get
rid of that guy there. And then, of course, the final one, just to have it on record here, is that we want the partial derivative when we take it with respect to Y two times in a row and we evaluate that at the same point, there's kind of a lot, there's a lot of writing that goes on with these things and that's just kind of par for the course when it comes to multi-variable calculus, but you take the partial derivative with respect to Y, add both of them, and you want it to be the
same value at this point. So even though there's
a lot going on here, all I'm basically saying is all the second to partial differential
information should be the same for Q as it is for f. So, let's actually go up and take a look at our function and start thinking about what it's partial derivatives are. What it's first and second
partial derivatives are. And to do that, let me first just kind of clear up some of the board here just to make it so we can actually start computing what this second
partial derivative is. So let's go ahead and do it. First, this partial derivative
with respect to X twice, what we'll do is I'll
take one of those out and think partial derivative
with respect to X. And then on the inside I'm going to put what the partial derivative
of this entire expression with respect to X is. But we just take it one term at a time. This first term here is a constant, so that goes to zero. The second term here actually
has the variable X in it. And when we take it's partial derivative, since this is a linear term, it's just going to be that
constant sitting in front of it. So it will be that
constant which is the value of the partial derivative
of f with respect to X evaluated at the point of interest. And that's just a constant. All right, so that's there. This next term has no Xs in it, so that's just going to go to zero. This term is interesting
because it's got an X in it. So when we take its
derivative with respect to X, that two comes down. So this will be two times a, whatever the constant a is, multiplied by X minus X naught. That's what the derivative
of this component is with respect to X. Then this over here, this also has an X, but it's just showing up
basically as a linear term. And when we treat Y as a constant, since we're taking the partial derivative with respect to X, what that ends up being
is b multiplied by that, what looks like a constant
as far as X is concerned, Y minus Y naught. And then the last term
doesn't have any Xs in it. So that is the first partial derivative with respect to X. And now we do it again. Now we take the partial
derivative with respect to X, and I'll hmm, maybe I should actually clear up even more of this guy. And now when we take
the partial derivative of this expression with respect to X, f of X of X naught, Y naught, that's just a constant, so that goes to zero. Two times a times X, that's going to, we take the derivative with respect to X and we're just going to get two times a. And this last term
doesn't have an Xs in it, so that also goes to zero. So conveniently, when we take the second partial derivative of Q with respect to X, We just get a constant. It's this constant to a. And since we want it to be the case, we want that this entire
thing is equal to, well what do we want? We want it to be the
second partial derivative of f both times with respect to X. So here I'm going to use
the subscript notation. Over here I'm using the
kind of Leibniz notation, but here just second partial derivative with respect to X, we want it to match
whatever that looks like when we evaluate it at
the point of interest. So what we could do to make that happen, to make sure that two
a is equal to this guy, is we set a equal to one half of that second partial derivative evaluated at the point of interest. Okay. So this is something
we kind of tuck away. We remember this is, we have solved for one of the constants. So now let's start thinking
about another one of them. Well I guess actually I
don't have to scroll off because let's say we just want to take the mixed partial derivative here where if instead of taking it
with respect to X twice, we wanted to, let's see
I'll kind of erase this, we wanted to first do
it with respect to X, and then do it with respect to Y. Then we can kind of just edit what we have over here and we say, "we already took it with respect to X, "so now as our second go we're going to be "taking it with respect to Y." So in that case, instead of getting two a
let's kind of figure out what it is that we get. When we take the derivative
of this whole guy with respect to Y, well this looks like a constant. This here also looks like a constant since we're doing it with respect to Y and no Ys show up. And the partial derivative of this just ends up being b. So again, we just get a constant. This time it's b not two, previously it was two a, but now it's just b. And this time we want it to equal the mixed partial derivative. So instead of saying f sub XX, I'm going to f XY which basically says you take the partial derivative first with respect to X and
then with respect to Y. We want this guy to
equal the value of that mixed partial derivative
evaluated at that point. So that gives us another fact. That means we can just
basically set b equal to that. And this is another fact, another constant that we can record. And now for C, when we're trying to figure
out what that should be, the reasoning is almost identical. It's pretty much symmetric. We did everything that
we did for the case X, and instead we do it for
taking the partial derivative with respect to Y twice in a row, and I encourage you to
do that for yourself. It'll definitely solidify everything that we're doing here because it can seem kind of like a lot and
a lot of computations. But you're going to get
basically the same conclusion you did for the constant a. It's going to be the case
that you have the constant c is equal to one half of the
second partial derivative of f with respect to Y, so you're differentiating
with respect to Y twice evaluated at the point of interest. So this is going to be
kind of the third fact. And the way that you get
to that conclusion again, it's going to be almost
identical to the way that we found this one for X. Now when you plug in these
values for a, b and c, and these are constants, even though we've written them as formulas they are constants, when you plus those in
to this full formula, you're going to get the
quadratic approximation. It'll have six separate terms. One that corresponds to the constant, two that correspond to the linear fact, and then three which correspond to the various quadratic terms. And if you wanted to dig into more details and kind of go through an
example or two on this, I do have an article on
quadratic approximations and hopefully you can kind of step through and do some of the computations
yourself as you go. But in all of this, even though there's a
lot of formulas going on, it can be pretty notationly heavy. I want you to think back to that original graphical intuition, here, let me actually pull up the
graphical intuition here. So if you're approximating a function near a specific point, the quadratic approximation looks like this curve where if you were to chop it in any direction it would be a parabola, but it's hugging the graph pretty closely. So it gives up a pretty
close approximation. So even though there's a lot of formulas that go on to get us that, the ultimate visual and I
think the ultimate intuition is actually a pretty sensible one. You're just hoping to find something that hugs the function nice and closely. And with that, I will see you next video.