If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

# Vector form of multivariable quadratic approximation

This is the more general form of a quadratic approximation for a scalar-valued multivariable function. It is analogous to a quadratic Taylor polynomial in the single-variable world. Created by Grant Sanderson.

## Want to join the conversation?

• Forgot to transpose at the end • Hi, I was wondering what the purpose of approximation is. • Quadratic approx. is useful for machine learning (ML). For example, if you want to measure how bad your program is at recognizing handwritten digits, you can do represent the errors with a cost function. The cost function depends on a lot of parameters (so it has a lot of dimensions), which is where representing things with vectors comes in handy. We want to minimize this cost function (because low cost = less bad ML program) and we can find minima of the cost function more easily when we approximate the cost function with a quadratic approximation.
• Why does the Hessian(-like) matrix have a 1/2 for each term? Why does the whole formula need an 1/2? What would happen without it?

I don't get where those 1/2s came from. • The origin of this 1/2 is the same as of Taylor series. If you remember, the Taylor series has the following pattern: f(x) = 1/1! + 1/2! + 1/3! + 1/4! + 1/5! + ...
You can interpret these constants as a way to calm down the exponent. When we add more terms with higher exponent the thing becomes pretty huge and we reduce it to normal values by dividing it with a factorial.

The factorial itself comes out because we are using derivatives to approximate our function. For example, let's take derivatives of g(x) = x^4:
g'(x) = 4 * x^3
g''(x) = 4 * 3 * x ^ 2
g'''(x) = 4 * 3 * 2 * x
g''''(x) = 4 * 3 * 2 * 1 = 4!
• When he wrote the formula for the quadratic form part of the qudratic approximation, he wrote the hessian with f_xy in both the top right and bottom left corners. By definition, the hessian has f_yx in the top right corner. I know that f_xy and f_yx are usually the same value, but shouldn't the definition for quadratic approximation contain the true hessian in case f_xy and f_yx come out to be different? • What is the general notation for hessian of n-th degree approximation? • For anyone else wondering, why there is at the end only one x:
Now it isn't one variable x, it is the whole input vector x (with x,y,z, etc.) and x0 with (x0, y0,z0,etc). • For anyone interested, I derived the full multivariable Taylor expansion using directional derivatives -- it seems to match the quadratic approximation to second degree:

f(x + h) = sum_0^infty 1/n! [Del_h^n f] (x)
(1 vote) • In the vector form of Quadratic Approx. x^{T}·H(f)·x, while ^ stands for superscript and H(f) is the Hessian of f, why should we transpose the vector? Doesn't the dot product turn out the same without transposing?
(1 vote) • As far as I know is that it depends on whether you are using matrices and matrix multiplication or vector dot products to represent the quadratic form.

so for example, if you are only using matrices to represent the quadratic form then it doesn't make sense to use dot product because usually dot products are not defined for matrices so instead we only use matrix multiplication and our expression becomes:
x^{T} H(f) x

However I think you can also represent the quadratic form using a mix of matrix multiplication and vector dot products, But you should be very careful when doing this because its not that clear and not that common. In that case our expression can be rewritten as:
x · (H(f) x)

where we have matrix multiplication ( H(f) x )

and vector dot product between ( x · (H(f)x) )
(1 vote)
• Hey, How we can know from the vector form that X-X0 after Hessian matrix is a column vector as it?
(1 vote) • Correct me if I am wrong, but at while formulating the linear equation in matrix form shouldn't grad(f) be multiplied with the transpose of [X-X0] in order for matrix multiplication to happen??
(1 vote) • This is not matrix multiplication, but rather a dot product between vectors. For function with x and y in its input, f(x,y), grad(f) would be a vector containing f_x in the first row and f_y in the second row. Also, when you do this, you're taking the gradient at X_0, the vector containing x_0 and y_0 in its two rows (you can also think of this as the coordinates(x_0,y_0)). This means you'll have a constant in each row. X - X_0 is just simple vector subtraction. X has one column and two rows; in the first row, there is x and in second there is y. X_0 is the same thing, except you tag the subscript naught on each variable. When you do the vector subtraction, you get a vector with its first row as x - x_0 and the second row as y - y_0 (notice that these are no longer the bold-faced vector X, they are variables). When you perform the dot product (if you need a refresher on dot products, khan academy has lessons on them in the Linear Algebra section) between grad(f) and the vector formed from the vector subtraction we just did, you get f_x*(x - x_0) + f_y*(y - y_0). Remember that each of the partials was evaluated at (x_0,y_0) so you'll end up with (using A and B as constants) A(x - x_0) + B(y - y_0).
(1 vote)