The "second derivative test" for $f(x,y)$

Solution 1:

That matrix is symmetric. It is a consequence of linear algebra that a symmetric matrix is orthogonally diagonalizable. That means there are two perpendicular directions upon which that matrix acts as scaling by $\lambda_1$ and by $\lambda_2$.

These $\lambda_i$ represent the quadratic coefficient of a parabolic approximation to the function $f$ at $(x_0,y_0)$ as you move through in the direction of each eigenspace. Since you already are looking at a critical point, the quadratic approximation is reaching its tip at $(x_0,y_0)$. If the two $\lambda_i$ are opposite in sign, you will have two parabolas orthogonal to each other opening in different directions, clearly creating a saddle. If you have two $\lambda_i$ that are of the same sign, then depending on that sign you either have a max or a min.

But the determinant of a $2\times2$ matrix works out to be the same thing as the product of the two eigenvalues. So you can see how a negative determinant implies $\lambda_i$ of opposite sign, which implies a saddle point, and a positive determinant similarly implies either a max or a min.


Locally at any $(x_0,y_0)$, there is a higher dimensional version of the Taylor series, grouped here by increasing order of derivative: $$\begin{align*} f(x,y)&=f(x_0,y_0)+\Big[f_x(x_0,y_0)\cdot(x-x_0)+f_y(x_0,y_0)\cdot(y-y_0)\Big]\\ &\phantom{{}={}}+\frac12\Big[f_{xx}(x_0,y_0)\cdot(x-x_0)^2+f_{xy}(x_0,y_0)\cdot(x-x_0)(y-y_0)\\ &\phantom{{}={}}+f_{yx}(x_0,y_0)\cdot(y-y_0)(x-x_0)+f_{yy}(x_0,y_0)\cdot(y-y_0)^2\Big]+\cdots\\ &=f(x_0,y_0)+\nabla f(x_0,y_0)\cdot\left((x,y)-(x_0,y_0)\right)^T\\ &\phantom{{}={}}+\frac12\left((x,y)-(x_0,y_0)\right)\cdot H(x_0,y_0)\cdot\left((x,y)-(x_0,y_0)\right)^T+\cdots \end{align*}$$

When you are at a critical point, this simplifies to $$\begin{align*} f(x,y)&=f(x_0,y_0)+\frac12\left((x,y)-(x_0,y_0)\right)\cdot H(x_0,y_0)\cdot\left((x,y)-(x_0,y_0)\right)^T+\cdots \end{align*}$$

And if we could change coordinates to an $s$ and $t$ variable that run in the directions of $H$'s eigenspaces, based at the critical point, we'd just have

$$f(s;t)=f(0;0)+\frac12\lambda_1s^2+\frac12\lambda_2t^2+\cdots$$ which I hope helps to see the parabolas and the role of the eigenvalues of $H$.

Solution 2:

Let $f: \mathbb{R}^n \longrightarrow \mathbb{R}$ a twice differentiable function.

The second derivative is defined as the derivative of the first derivative $x\mapsto \text{d}f(x) \in \mathcal{L}(\mathbb{R^n},\mathbb{R})$ therefore belongs to the following set of functions $\mathcal{L}(\mathcal{L}(\mathbb{R}^n,\mathbb{R}),\mathbb{R})$ this set is isomorphic to the set of bilinear functions from $\mathbb{R}^n \longrightarrow\mathbb{R}$. As f is twice differentiable, Schwarz's theorem shows that the bilinear form $d^2f(x)$ is actually symmetric.

The hessian matrix is defined as the matrix associated to the bilinear form $\text{d}^2f(x)$. Let's take a look at what $f$ ressembles locally, one has:

$f(x+h) = f(x) + \text{d}f(x).h + \frac{1}{2}\text{d}^2f(x).h^{(2)} + o(\|h\|^2)$ which one can rewrite (using Riez's representation theorem to define the gradient vector and the above remarks for the hessian matrix), $f(x+h)=f(x)+ \nabla f(x).h + \frac{1}{2}h^T\mathcal{H}f(x)h + o(\|h\|^2)$

Therefore if $x$ is a critical point one has locally:

$f(x+h)=f(x)+ \frac{1}{2}h^T\mathcal{H}f(x)h + o(\|h\|^2)$

From this we can see how the hessian matrix is going to help us determine the behavior of $f$ locally, for example if $\mathcal{H}f(x)$ is a positive definite symmetric bilinear form then if $h$ is small enough $f(x+h)-f(x)\geq 0$, and so $f$ will attain a local minimum at $x$, if $\mathcal{H}f(x)$ is a negative definite symmetric bilinear form then if $h$ is small enough $f(x+h)-f(x)\leq 0$ so $f$ will attain a local maximum at $x$ etc.. if it is neither one or the other.. it's a saddle point, etc.

We can see that the eigenvalues of $\mathcal{H}f(x)$ are going to be essential to determine whether or not the matrix is positive definite , negative definite or not.

In the case $n=2$, we can write, if we call $r=\frac{\partial^2 f}{\partial x^2}$, $t=\frac{\partial^2f}{\partial y^2}$ $s=\frac{\partial^2f}{\partial x \partial y}$, $q(h)=d^2f(x).(h1,h2)^{(2)} = rh_1^2 +2sh_1h_2 +th_2^2$.

if $r>0, q(h_1,h_2)=r[(h_1+\frac{s}{r}h_2)^2 + \frac{rt-s^2}{r^2}h_2^2]$

and if $rt-s^2>0$ then $q(h_1,h_2)\geq 0$ and $q(h_1,h_2)=0 \Rightarrow (h_1,h_2)=0$ so in this case $\mathcal{H}f(x)$ is positive definite.

if however $rt-s^2<0$ $q$ changes sign.

The case $r<0$ is the same as above except that "positive" becomes negative.

Last of all if $r=0$ we can't conclude.

Solution 3:

This is a proof which only uses directional derivatives.

We want to prove the Second Derivative Test.

Suppose the second partial derivatives of $f$ are continuous on a disk with center (a, b), and suppose that $f_x(a,b)=f_y(a,b)=0$. Let

$H=\begin{vmatrix} f_{xx} & f_{xy} \\ f_{yx} & f_{yy} \end{vmatrix}=f_{xx}(a,b)f_{yy}(a,b)-[f_{xy}(a,b)]^2$

(a) If $H>0, f_{xx}(a,b)>0$, then $f(a,b)$ is a local minimum.

(b) If $H>0, f_{xx}(a,b)<0$, then $f(a,b)$ is a local maximum.

(c) If $H<0$, then $f(a,b)$ is a saddle point.

(d) If $H=0$, inconclusive.

Let $u=(h, k)$, which is a unit vector.

$$\begin{align} D^2_uf(x,y)&=D_u[D_uf(x,y)]\\ &=D_u[\nabla f\cdot u]\\ &=(f_{xx}h+f_{yx}k)h+(f_{xy}h+f_{yy}k)k \end{align}$$

Since the second partial derivatives are continuous, $f_{xy}=f_{yx}$

$$\begin{align} D^2_uf(x,y)&=f_{xx}h^2+2f_{xy}hk+f_{yy}k^2\\ &=k^2(f_{xx}(\frac{h}{k})^2+2f_{xy}\frac{h}{k}+f_{yy}) \end{align}$$

Let $z=\frac{h}{k}$, $g(z)=f_{xx}z^2+2f_{xy}z+f_{yy}$

Use the quadratic function above to find the range of solutions of $D^2_uf(x,y)$.

Let $d=(2f_{xy})^2-4f_{xx}f_{yy}=4(f_{xy}^2-f_{xx}f_{yy})=-4H$

(1) If $d<0$, means that $D^2_uf(x,y)$ can only be either positive or negative.

We can know that the point $(a,b)$ must be a local minimum or a local maximum.

$$\begin{align} D^2_uf(x,y)&=f_{xx}h^2+2f_{xy}hk+f_{yy}k^2\\ &=f_{xx}(h^2+2\frac{f_{xy}}{f_{xx}}hk+\frac{f_{yy}}{f_{xx}}k^2)\\ &=f_{xx}[(h+\frac{f_{xy}}{f_{xx}}k)^2+(f_{xx}f_{yy}-f_{xy}^2)k^2]\\ &=f_{xx}[(h+\frac{f_{xy}}{f_{xx}}k)^2+Hk^2] \end{align}$$

We know that $H>0$, so,

if $f_{xx}>0$, the equation above will always be positive, which means a local minimum at $(a,b)$.

Otherwise if $f_{xx}<0$, it will always be negative, which means a local maximum at $(a,b)$.

(2) If $d>0$, means that $D^2_uf(x,y)$ can be both positive, zero or negative, so

$H<0$ tells us that the function has a saddle point at $(a,b)$.

(3) If $d>0$, means that $D^2_uf(x,y)$ can only be either positive and zero or negative and zero, so

$H=0$ cannot tell that whether $(a,b)$ is a local minimum, local maximum or a saddle point.