Prove $\frac{\partial \rm{ln}|X|}{\partial X} = 2X^{-1} - \rm{diag}(X^{-1})$.

Note: just in case any of this notation seems wrong or or something, see matrix cookbook: p.15's (141), p.9's (57) and p.8's (43)

Prove that $\forall \ p \ \epsilon \ \mathbb{N}, \forall \ X \ \epsilon \ \mathbb{R}^{pxp}, \ $ if X is a positive definite matrix, $\frac{\partial ln|X|}{\partial X} = 2X^{-1} - diag(X^{-1})$.

In words: derivative of logarithm of determinantof a matrix with respect to the matrix is twice inverse minus diagonal matrix of the inverse

What I tried:

Firstly, we define differentiation of a scalar function $u$ (I guess $u: \mathbb R \to \mathbb R$) with respect to a matrix $X$ (I guess the following definition doesn't rely on $X$ positive definite, but I might be wrong):

$\frac{\partial u}{\partial X} := \begin{bmatrix} \frac{\partial u}{\partial X_{11}} \cdots \frac{\partial u}{\partial X_{1p}}\\ \vdots \ \ddots \ \vdots \\ \frac{\partial u}{\partial X_{p1}} \cdots \frac{\partial u}{\partial X_{pp}} \end{bmatrix}$, assuming of course each entry is defined.

Thus, $\frac{\partial ln|X|}{\partial X} = \begin{bmatrix} \frac{\partial ln|X|}{\partial X_{11}} \cdots \frac{\partial ln|X|}{\partial X_{1p}}\\ \vdots \ \ddots \ \vdots \\ \frac{\partial ln|X|}{\partial X_{p1}} \cdots \frac{\partial ln|X|}{\partial X_{pp}} \end{bmatrix}$

We first note that for the case where the elements of X are independent, a constructive proof involving cofactor expansion and adjoint matrices can be made to show that $\frac{\partial ln|X|}{\partial X} = X^{-T}$ (Harville). This is not always equal to $2X^{-1}-diag(X^{-1})$. The fact alone that X is positive definite is sufficient to conclude that X is symmetric and thus its elements are not independent.

It can be shown $\frac{\partial ln|X|}{\partial X_{ij}}=tr[X^{-1} \frac{\partial X}{\partial X_{ij}}]$. I prove this here and here.

Observe that

$\frac{\partial X}{\partial X_{ij}}$ is matrix with 1 in its ith row and jth column and 0 elsewhere if i=j

and $\frac{\partial X}{\partial X_{ij}}$ is a matrix with 1 in its ith row and jth column and its jth row and ith column (since positive definite matrices are symmetric) otherwise.

Examples:

$\frac{\partial X}{\partial X_{11}} = \begin{bmatrix} \frac{\partial X_{11}}{\partial X_{11}} \cdots \frac{\partial X_{1p}}{\partial X_{11}}\\ \vdots \ \ddots \ \vdots \\ \frac{\partial X_{p1}}{\partial X_{11}} \cdots \frac{\partial X_{pp}}{\partial X_{11}} \end{bmatrix}$

$= \begin{bmatrix} 1 \ 0 \cdots 0\\ 0 \ 0 \cdots 0\\ \vdots \ \vdots \ \ddots \ \vdots \\ 0 \ 0 \cdots 0 \end{bmatrix}$

$\frac{\partial X}{\partial X_{12}} = \begin{bmatrix} \frac{\partial X_{11}}{\partial X_{12}} \cdots \frac{\partial X_{1p}}{\partial X_{12}}\\ \vdots \ \ddots \ \vdots \\ \frac{\partial X_{p1}}{\partial X_{12}} \cdots \frac{\partial X_{pp}}{\partial X_{12}} \end{bmatrix}$

$= \begin{bmatrix} 0 \ 1 \ 0 \cdots 0\\ 1 \ 0 \ 0 \cdots 0\\ 0 \ 0 \ 0 \cdots 0\\ \vdots \ \vdots \ \vdots \ \ddots \ \vdots \\ 0 \ 0 \ 0 \cdots 0 \end{bmatrix}$

since $\frac{\partial X_{12}}{\partial X_{12}} = \frac{\partial X_{12}}{\partial X_{21}}$ since $X_{21}=X_{12}$.

Thus, if we let $E = X^{-1}= \begin{bmatrix}e_{11} \cdots e_{1p}\\ \vdots \ \ddots \ \vdots \\ e_{p1} \cdots e_{pp} \end{bmatrix}$

and if we let $F_{ij} = X^{-1} \frac{\partial X}{\partial X_{ij}} = E \frac{\partial X}{\partial X_{ij}}$,then $F_{ij}$ is a matrix containing $e_{ij}$ in the ith row and jth column and zero elsewhere if i=j and is a matrix containing $e_{ij}$, $e_{ji}$ and zeroes in the main diagonal and zero elsewhere otherwise.

If we let $F = [f_{ij}] = tr(F_{ij})$, then $f_{ij} = \frac{\partial ln|X|}{\partial X_{ij}} = e_{ij}$ if i=j and $\frac{\partial ln|X|}{\partial X_{ij}} = tr(F_{ij}) = 2e_{ij}$ otherwise. (Note: $F \ne [F_{ij}]$. Probably should've used better notation.)

Example:

$F_{12} = E \frac{\partial X}{\partial X_{12}} = \begin{bmatrix} e_{11} \cdots e_{1p}\\ \vdots \ \ddots \ \vdots \\ e_{p1} \cdots e_{pp} \end{bmatrix}\begin{bmatrix} \frac{\partial X_{11}}{\partial X_{12}} \cdots \frac{\partial X_{1p}}{\partial X_{12}}\\ \vdots \ \ddots \ \vdots \\ \frac{\partial X_{p1}}{\partial X_{12}} \cdots \frac{\partial X_{pp}}{\partial X_{12}} \end{bmatrix}$

$=\frac{\partial X}{\partial X_{12}} = \begin{bmatrix} e_{11} \cdots e_{1p}\\ \vdots \ \ddots \ \vdots \\ e_{p1} \cdots e_{pp} \end{bmatrix}\begin{bmatrix} 0 \ 1 \ 0 \cdots 0\\ 1 \ 0 \ 0 \cdots 0\\ 0 \ 0 \ 0 \cdots 0\\ \vdots \ \vdots \ \vdots \ \ddots \ \vdots \\ 0 \ 0 \ 0 \cdots 0 \end{bmatrix}$

$=\begin{bmatrix} e_{12} \ 0 \ 0 \cdots 0\\ 0 \ e_{21} \ 0 \cdots 0\\ 0 \ 0 \ 0 \cdots 0\\ \vdots \ \vdots \ \vdots \ \ddots \ \vdots \\ 0 \ 0 \ 0 \cdots 0 \end{bmatrix}$

Thus, $tr(F_{12})=e_{12}+e_{21} = 2e_{12} = 2e_{21}$

$\therefore, \frac{\partial ln|X|}{\partial X} = F = \begin{bmatrix} e_{11} \ 2e_{12} \cdots 2e_{1p}\\ 2e_{21} \ e_{22} \cdots 2e_{2p}\\ \vdots \ \vdots \ \ddots \ \vdots \\ 2e_{p1} \ 2e_{p2} \cdots e_{pp} \end{bmatrix}= 2E -$ diag(E) = $2X^{-1} -$ diag($X^{-1}$).

QED

Any mistakes? Are there simpler or alternative ways?


Perhaps I am missing something, but the result does not seem correct just from a specific example.

Let $f(X) = \log(\det(X))$.

I am assuming when you write $\frac{\partial ln|X|}{\partial X}$ that you mean that the derivative of $f$ evaluated at $X$ in the direction $H$ is given by $Df(X)H = \operatorname{tr}(\frac{\partial ln|X|}{\partial X}^{-1} H)$.

If this is not correct, perhaps you could add your definition to avoid causing confusion and I will delete this answer.

Choose $X=\begin{bmatrix} 1 & 1 \\ 1 & 2 \end{bmatrix}$ and $H=\begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}$. Both are positive definite and symmetric and $X^{-1} = \begin{bmatrix} 2 & -1 \\ -1 & 1 \end{bmatrix}$.

The usual formula gives $Df(X)H = \lim_{t\to 0} {f(X+tH)-f(X) \over t} = \operatorname{tr}(X^{-1} H)$.

With the above matrices we get $Df(X)H =4$ and by taking $t=0.01$ and just to confirm, evaluating numerically using the quotient we get ${f(X+tH)-f(X) \over t} \approx 3.95$ which is close to $4$.

From the above formula we have $G= 2X^{-1}-\operatorname{diag} X^{-1} = \begin{bmatrix} 2 & -2 \\ -2 & 1 \end{bmatrix}$ and $\operatorname{tr}(G H) = 2$.


If we just use the chain rule, positivity of $det$ for PD matrices, and the formula for the derivative of $det$ for invertible matrices, then $[d(\ln\det X)](H)=\operatorname{tr}(X^{-1}H)$ as per Compute the derivative of the log of the determinant of A with respect to A, where one recognizes the Frobenius inner product with the gradient $X^{-T}$ in agreement with @loup blanc answer above and with "Harville".


@ BCLC , you notation $\frac{\partial ln|X|}{\partial X}$ is a bad one because the entries of $X$ are not independent. Where did you find this formula ?

Let $f:X\in GL_n\rightarrow \log(|\det(X)|), g:X\rightarrow \det(X)$. Since, for every $H\in M_n$, $Dg_X(H)=\det(X)tr(X^{-1}H)$, $Df_X(H)=tr(X^{-1}H)$. Using the standard inner product over $M_n$, the gradient of $f$ is defined by $Df_X(H)=<\nabla f(X),H>=tr((\nabla f(X))^TH)$; then $\nabla f(X)=X^{-T}$.

Now, let $\phi:Y\in S_n^+\rightarrow \log(|\det(Y)|)$, where $S_n^+$ is the set of SPD matrices. Since the tangent vector space to $S_n^+$ is $S_n$, for every $K\in S_n$, $D\phi_Y(K)=tr(Y^{-1}K)$. The restriction to $S_n$ of the previous inner product is an inner product. Then, for this inner product and $K\in S_n$, $\nabla \phi(Y)=Y^{-1}$.

Of course, we find the same formula because the Taylor series of $f,\phi$ are $f(X+H)=f(X)+tr(X^{-1}H)+o(||H||)$ and $\phi(Y+K)=\phi(Y)+tr(Y^{-1}K)+o(||K||)$, that is not extraordinary because the restriction of $f$ is $\phi$.

EDIT. Answer to BCLC. There are standard definitions concerning the derivative and the gradient of a real function. Here we consider a curious third definition:

Let $f:GL_n^+\rightarrow \mathbb{R}$, $\frac{\partial f}{\partial X}=G$ and $\phi$ its restriction to $S_n^+$, $\frac{\partial \phi}{\partial X}=\Gamma$. Here, $\Gamma$ is constructed as follows: if $i\not= j$, then $\gamma_{i,j}=\frac{\partial f}{\partial X_{i,j}}+\frac{\partial f}{\partial X_{j,i}}=2g_{i,j}$ and otherwise $\gamma_{i,i}=\frac{\partial f}{\partial X_{i,i}}=g_{i,i}$. In a standard way, we derive and, in a second step, we put $X_{i,j}=X_{j,i}$. Here we do the contrary: we put $X_{i,j}=X_{j,i}$ and, in a second step, we derive.

Anyway, we obtain the following interesting result: the gradient of the restriction is not the restriction of the gradient.