Second order approximation of log det X

Short answer: The trace gives the scalar product on the space of matrices: $\langle X,Y \rangle = \mathrm{tr}(X^\top Y)$. Since you're working with symmetric matrices, you can forget the transposition: $\langle X,Y \rangle = \mathrm{tr}(XY)$.

Long answer, with all the gory details: Given a function $f:\mathrm S_n^{++}\to\mathbf R$, the link between the gradient $\nabla_Xf$ of the function $f$ at $X$ (which is a vector) and its differential $d_Xf$ at $X$ (which is a linear form) is that for any $U\in V$, $$ d_Xf(U) = \langle \nabla_Xf,U \rangle. $$ For your function $f$, since you know the gradient, you can write the differential: $$ d_Xf(U) = \langle X^{-1},U \rangle = \mathrm{tr}(X^{-1}U). $$

What about the second order differential? Well, it's the differential of the differential. Let's take it slow. The differential of $f$ is the function $df:\mathrm S_n^{++}\to\mathrm L(\mathrm M_n,\mathbf R)$, defined by $df(X) = V\mapsto \mathrm{tr}(X^{-1}V)$. To find the differential of $df$ at $X$, we look at $df(X+\Delta X)$, and take the part that varies linearly in $\Delta X$. Since $df(X+\Delta X)$ is a function $\mathrm M_n\to\mathbf R$, if we hope to ever understand anything we should apply it to some matrix $V$: $$ df(X+\Delta X)(V) = \mathrm{tr}\left[ (X+\Delta X)^{-1} V \right] $$ and use the approximation from the passage you cited: \begin{align*} df(X+\Delta X)(V) &\simeq \mathrm{tr}\left[ \left(X^{-1} - X^{-1}(\Delta X)X^{-1}\right) V \right]\\ &= \mathrm{tr}(X^{-1}V) - \mathrm{tr}(X^{-1}(\Delta X)X^{-1}V)\\ &= df(X)(V) - \mathrm{tr}(X^{-1}(\Delta X)X^{-1}V). \end{align*} And we just see that the part that varies linearly in $\Delta X$ is the $-\mathrm{tr}(\cdots)$. So the differential of $df$ at $X$ is the function $d^2_Xf:\mathrm S_n^{++}\to\mathrm L(\mathrm M_n, \mathrm L(\mathrm M_n,\mathbf R))$ defined by $$ d^2_Xf(U)(V) = -\mathrm{tr}(X^{-1}UX^{-1}V). $$

Second order approximation of log det X

Related

Recent Posts