If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

The Hessian

The Hessian is a matrix that organizes all the second partial derivatives of a function.

The Hessian matrix

The "Hessian matrix" of a multivariable function f(x,y,z,), which different authors write as H(f), Hf, or Hf, organizes all second partial derivatives into a matrix:
Hf=[2fx22fxy2fxz2fyx2fy22fyz2fzx2fzy2fz2]
So, two things to notice here:
  • This only makes sense for scalar-valued function.
  • This object Hf is no ordinary matrix; it is a matrix with functions as entries. In other words, it is meant to be evaluated at some point (x0,y0,).
    Hf(x0,y0,)=[2fx2(x0,y0,)2fxy(x0,y0,)2fyx(x0,y0,)2fy2(x0,y0,)]
As such, you might call this object Hf a "matrix-valued" function. Funky, right?
One more important thing, the word "Hessian" also sometimes refers to the determinant of this matrix, instead of to the matrix itself.

Example: Computing a Hessian

Problem: Compute the Hessian of f(x,y)=x32xyy6 at the point (1,2):
Solution: Ultimately we need all the second partial derivatives of f, so let's first compute both partial derivatives:
fx(x,y)=x(x32xyy6)=3x22yfy(x,y)=y(x32xyy6)=2x6y5
With these, we compute all four second partial derivatives:
fxx(x,y)=x(3x22y)=6xfxy(x,y)=y(3x22y)=2fyx(x,y)=x(2x6y5)=2fyy(x,y)=y(2x6y5)=30y4
The Hessian matrix in this case is a 2×2 matrix with these functions as entries:
Hf(x,y)=[fxx(x,y)fyx(x,y)fxy(x,y)fyy(x,y)]=[6x2230y4]
We were asked to evaluate this at the point (x,y)=(1,2), so we plug in these values:
Hf(1,2)=[6(1)2230(2)4]=[622480]
Now, the problem is ambiguous, since the "Hessian" can refer either to this matrix or to its determinant. What you want depends on context. For example, in optimizing multivariable functions, there is something called the "second partial derivative test" which uses the Hessian determinant. When the Hessian is used to approximate functions, you just use the matrix itself.
If it's the determinant we want, here's what we get:
det([622480])=6(480)(2)(2)=2884

Uses

By capturing all the second-derivative information of a multivariable function, the Hessian matrix often plays a role analogous to the ordinary second derivative in single variable calculus. Most notably, it arises in these two cases:

Want to join the conversation?

  • aqualine ultimate style avatar for user Jorge R. Martinez Perez-Tejada
    will there be videos and exercises (mostly interested in the exercises) for these topics any time soon?
    (34 votes)
    Default Khan Academy avatar avatar for user
  • duskpin ultimate style avatar for user Maria
    Is the Hessian in any way related to the Jacobian matrix?
    (7 votes)
    Default Khan Academy avatar avatar for user
  • leaf grey style avatar for user Marcel Brown
    Should the determinant in the final step be: 180xy^4 - 4?
    (7 votes)
    Default Khan Academy avatar avatar for user
  • leafers ultimate style avatar for user gschex1112
    Why is the last second partial derivative not -30y^4?
    (5 votes)
    Default Khan Academy avatar avatar for user
  • female robot grace style avatar for user Emily H
    Why is fyx= d/dy(3x^2-2y) and not d/dx(-2x-6y^5)? Wouldn't it be the partial derivative with respect to x of the first partial derivative with respect to y? I ask the same for fxy= d/dx(-2x-6y^5) not being d/dy(3x^2-2y).
    (4 votes)
    Default Khan Academy avatar avatar for user
  • male robot donald style avatar for user Aaron Hargrove
    What are some of the practical applications of the determinant of a Hessian matrix?
    (2 votes)
    Default Khan Academy avatar avatar for user
  • purple pi purple style avatar for user Briahnna A
    Around the last paragraph, the determinant is used to evaluate specific points. But, I was wonder why, in my multivariate calculate classes, we also use the hessian determinant and f_xx to determine local extrema. I was wondering what was the background or the basis for using the determinant of the Hessian Matrix to decided if critical points are local maximum or local minimums? Basically, what is important about the Hessian Determinant? Thank-You!
    (4 votes)
    Default Khan Academy avatar avatar for user
    • ohnoes default style avatar for user Tejas
      We actually don't use the Hessian to determine whether the critical points are local maxima or local minima. We actually use the Hessian to determine whether they are local extrema or saddle points. As for using fxx, it doesn't have to be fxx. You could just as easily use fyy to determine whether the local extremum is a maximum or minimum.

      If it is a local minimum, the gradient is pointing away from this point. If it is a local maximum, the gradient is always pointing toward this point. Of course, at all critical points, the gradient is 0. That should mean that the gradient of nearby points would be tangent to the change in the gradient. In other words, fxx and fyy would be high and fxy and fyx would be low.

      On the other hand, if the point is a saddle point, then the gradient vectors will all be pointing around the critical point. Therefore at nearby points, the change in the gradient will be orthogonal to the gradient, not tangent. In other words, fxy and fyx would be high and fxx and fyy would be low, or fxx and fyy would have opposite signs. Either way, the Hessian determinant would be negative.
      (1 vote)
  • male robot hal style avatar for user yogesh kumar
    why hessian makes sense only for a scalar valued function?
    (4 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user muhammadahme2019
    Where it says The Hessian matrix in this case is a 2 x 2 matrix with these functions as entries:
    the fxx and fxy are flipped. Cuz under fxx it should be fxy. Then under fyx it should be fyy.
    (2 votes)
    Default Khan Academy avatar avatar for user
  • piceratops seed style avatar for user disha.bandy
    Since Hessian matrices are only used for trigonometric functions, what can be used to optimise a 3 variable trigonometric function?
    (1 vote)
    Default Khan Academy avatar avatar for user