Why/How does the determinant of the Hessian matrix, combined with the 2nd derivatives, tell us max., min., saddle points? Reasoning behind it?

Solution 1:

Given a smooth function $f: \mathbb{R}^n \to \mathbb{R}$, we can write a second order Taylor expansion in the form: $$f(x + \Delta x) = f(x) + \nabla f(x) \Delta x + \frac{1}{2}( \Delta x)^t Hf(x) \Delta x + O(|\Delta x|^3)$$ where $\nabla f (x)$ is the gradient of $f$ at $x$ (written as a row vector), $Hf(x)$ is the Hessian matrix of $f$ at $x$ (which is symmetric, of course), and $\Delta x$ is some small displacement.

Suppose you have have a critical point at $x=a$, then $\nabla f(a)= 0$. Then your Taylor expansion looks like: $$f(a + \Delta x) = f(a) + \frac{1}{2}( \Delta x)^t Hf(a) \Delta x + O(|\Delta x|^3).$$ Thus, for small displacements $\Delta x$, the Hessian tells us how the function behaves around the critical point.

  • The Hessian $Hf(a)$ is positive definite if and only if $( \Delta x)^t Hf(a) \Delta x > 0 $ for $\Delta x \neq 0$. Equivalently, this is true if and only if all the eigenvalues of $Hf(a)$ are positive. Then no matter which direction you move away from the critical point, the value of $f(a + \Delta x)$ grows (for small $|\Delta x|$), so $a$ is a local minimum.

  • Likewise, the Hessian $Hf(a)$ is negative definite if and only if $( \Delta x)^t Hf(a) \Delta x < 0 $ for $\Delta x \neq 0$. Equivalently, this is true if and only if all the eigenvalues of $Hf(a)$ are negative. Then no matter which direction you move away from the critical point, the value of $f(a + \Delta x)$ decreases (for small $|\Delta x|$), so $a$ is a local maximum.

  • Now suppose that the Hessian $Hf(a)$ has mixed positive and negative (but all nonzero) eigenvalues. Then (for small $|\Delta x|$) the value of $f(a + \Delta x)$ decreases or increases as you move away from the critical point, depending on which direction you take, so $a$ is a saddle point.

  • Lastly, suppose that there exists some $\Delta x \neq 0$ such that $Hf(a) \Delta x = 0$. This is true if and only if $Hf(a)$ has a $0$ eigenvalue. In this case the test fails: along this direction we aren't really sure whether the function $f$ is increasing or decreasing as we move away from $a$; our second order approximation isn't good enough and we need higher order data to decide.

What I've described for you here is the intuition for the general situation on $\mathbb{R}^n$, but since it seems like you're working in $\mathbb{R}^2$, the test becomes a bit simpler. In $\mathbb{R}^2$ we can only have two (possibly identical) eigenvalues $\lambda_1$ and $\lambda_2$ for $Hf(a)$, since it is a $2 \times 2$ matrix. We can take advantage of the fact that the determinant of a matrix is the product of the eigenvalues, and the trace is their sum: $\det(Hf(a))=\lambda_1 \lambda_2$ and $\operatorname{tr}(Hf(a))=\lambda_1 + \lambda_2$.

In this situation:

  1. $\det (Hf(a))=0$ means that there is a zero eigenvalue and so the test fails.

  2. $\det(Hf(a))<0$ means that both eigenvalues have different sign, so we have a saddle point at $a$.

  3. $\det(Hf(a))>0$ means that both eigenvalues have the same sign: either both positive or both negative, and we must use the trace to decide which it is. In fact, rather than use the trace, it actually suffices to just use the top left entry $\frac{\partial^2 f}{\partial x^2} (a)$ of $Hf(a)$ to decide, by Sylvester's criterion. In other words, $\frac{\partial^2 f}{\partial x^2}(a) > 0$ means both eigenvalues are positive (local min at a $a$), whereas $\frac{\partial^2 f}{\partial x^2} (a) < 0$ means both eigenvalues are negative (local max at $a$).

Solution 2:

It rests on the multivariable Taylor's formula: $$f(x)=f(c)+Df(c)\cdot(x-c)+\frac1{2!}(D^2f)_{x=c}(x-c)+o(\lVert x-c\rVert^2)$$ For a critical point, $Df(c)=0$, and the formula can be written as $$f(x)-f(c)=\frac1{2!}(D^2f)_{x=c}(x-c)+o(\lVert x-c\rVert^2)$$

where $(D^2f)_{x=c}$ denotes the quadratic form associated with the hessian matrix at $x=c$.

When $\lVert x-c\rVert$ is small enough and the quadratic form is not $0$, the sign of this expression is the sign of the quadratic form. Hence if the quadratic form is positive, $f(x)-f(c)>0$, corresponding to $f(c)$ being a local minimum, if it is negative, it corresponds to a local maximum.

For two variables, setting $x-c=(h,k)$, the quadratic form is, explicitly: $$\frac{\partial^2f}{\partial x^2}(c)h^2+2\frac{\partial^2f}{\partial x\partial y}(c)hk+\frac{\partial^2f}{\partial y^2}(c)k^2$$ This is a homogeneous quadratic form in two variables, and its sign is the same as the sign of the dehomogenised quadratic form in one variable (setting $t= h/k$): $$\frac{\partial^2f}{\partial x^2}(c)t^2+2\frac{\partial^2f}{\partial x\partial y}(c)t+\frac{\partial^2f}{\partial y^2}(c)$$ Now a non-zero quadratic polynomial has constant sign if and only if its (reduced) discriminant is negative, whence the condition for an extremum: $$\biggl[\biggl(\frac{\partial^2f}{\partial x^2}\biggr)^2-\frac{\partial^2f}{\partial x\partial y}\biggr](c)<0.$$ Furthermore, under these circumstances, the sign of the quadratic polynomial is the sign of its leading coefficient.Hence it is

  • positive (local minimum) if $\dfrac{\partial^2f}{\partial x^2}(c)<0$,
  • negative (local maximum) if $\dfrac{\partial^2f}{\partial x^2}(c)>0$.

Solution 3:

You can see it in this way. Determinant is the product of all eigenvalues of the Hessian matrix (2 eigenvalues, in the case of two variables). Then checking the sign of determinant is sufficient to tell the sign of eigenvalues, which is a more general way to test the min/max points.

FYI: wiki.