Conceptually, why does a positive definite Hessian at a specific point able to tell you if that point is a maximum or minimum?

This is not about calculating anything. But can anyone tell me why this is the case?

So, from wikipedia:

If the Hessian is positive definite at x, then f attains a local minimum at x. If the Hessian is negative definite at x, then f attains a local maximum at x. If the Hessian has both positive and negative eigenvalues then x is a saddle point for f. Otherwise the test is inconclusive. This implies that, at a local minimum (resp. a local maximum), the Hessian is positive-semi-definite (resp. negative semi-definite).

Can someone explain, intuitively, why this is the case?


Solution 1:

It's pretty much the same as the 1-dimensional case. The second derivative gives you an idea of the local "curvature" of the function near the point, with a positive second derivative meaning that it's curving "up". In multiple dimensions, the Hessian matrix gives you the same information, except now you have infinitely many directions to look for curvature. Positive definiteness says that all the eigenvalues are positive, which means that any time you look along an eigenvector, the function will be curving up. Assuming the Hessian is nondegenerate, the eigenvectors form a basis near that point, and so looking in any direction you'll also see "curving up" because you can decompose the direction into eigenvector directions.

You can extend this idea to negative definite and semidefinite cases fairly easily - the idea is the same. Looking along eigenvectors gives you 1-D slices of the function, and then you're back to 1D calculus.

Solution 2:

Roughly like this:

A Taylor expansion around $x$ by $h$ is $$ f(x + h) = f(x) + \text{grad } f \cdot h + \frac{1}{2} h^T H h + O(h^3) $$ at a critical point the gradient vanishes and this reduces to $$ f(x + h) = f(x) + \frac{1}{2} h^T H h + O(h^3) $$ For a minimum, neglecting $O(h^3)$ for small $h$ one would need $$ f(x + h) - f(x) = \frac{1}{2} h^T H h \ge 0 $$ and that is why positive semi-definiteness is needed.

For a maximum $$ f(x + h) - f(x) = \frac{1}{2} h^T H h \le 0 $$

Solution 3:

This is because of Taylor's formula at order $2$: \begin{align*}f(x+h,y+k)-f(x,y)&=hf'_x(x,y)+kf'_y(x,y)\begin{aligned}[t]&+\frac12\Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)\\&+k^2f''_{y^2}(x,y)\Bigr)+o\bigl(\bigl\lVert(h,k)\bigr\rVert^2 \bigr)\end {aligned}\\ &=\frac12\Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)\Bigr)+o\bigl(\bigl\lVert(h,k)\bigr\rVert^2 \bigr) \end{align*} If the quadratic form $\;q(h,k)=\frac12\Bigl(h^2f''_{x^2}(x,y)+2hkf''_{xy}(x,y)+k^2f''_{y^2}(x,y)\Bigr)$ is positive definite, the sign of the left-hand side is positive for all $\lVert(h,k)\bigr\rVert^2$ small enough, hence $f(x+h,y+k)-f(x,y)>0$, so we have a local minimum. If it is definite negative, for the same reasons, we have a local maximum.