Derivative of the inverse of a matrix
In a scientific paper, I've seen the following
$$\frac{\delta K^{-1}}{\delta p} = -K^{-1}\frac{\delta K}{\delta p}K^{-1}$$
where $K$ is a $n \times n$ matrix that depends on $p$. In my calculations I would have done the following
$$\frac{\delta K^{-1}}{\delta p} = -K^{-2}\frac{\delta K}{\delta p}=-K^{-T}K^{-1}\frac{\delta K}{\delta p}$$
Is my calculation wrong?
Note: I think $K$ is symmetric.
Solution 1:
The major trouble in matrix calculus is that the things are no longer commuting, but one tends to use formulae from the scalar function calculus like $(x(t)^{-1})'=-x(t)^{-2}x'(t)$ replacing $x$ with the matrix $K$. One has to be more careful here and pay attention to the order. The easiest way to get the derivative of the inverse is to derivate the identity $I=KK^{-1}$ respecting the order $$ \underbrace{(I)'}_{=0}=(KK^{-1})'=K'K^{-1}+K(K^{-1})'. $$ Solving this equation with respect to $(K^{-1})'$ (again paying attention to the order (!)) will give $$ K(K^{-1})'=-K'K^{-1}\qquad\Rightarrow\qquad (K^{-1})'=-K^{-1}K'K^{-1}. $$
Solution 2:
Yes, your calculation is wrong, note that $K$ may not commute with $\frac{\partial K}{\partial p}$, hence you must apply the chain rule correctly. The derivative of $\def\inv{\mathrm{inv}}\inv \colon \def\G{\mathord{\rm GL}}\G_n \to \G_n$ is not given by $\inv'(A)B = -A^2B$, but by $\inv'(A)B = -A^{-1}BA^{-1}$. To see that, note that for small enough $B$ we have \begin{align*} \inv(A + B) &= (A + B)^{-1}\\ &= (\def\I{\mathord{\rm Id}}\I + A^{-1}B)^{-1}A^{-1}\\ &= \sum_k (-1)^k (A^{-1}B)^kA^{-1}\\ &= A^{-1} - A^{-1}BA^{-1} + o(\|B\|) \end{align*} Hence, $\inv'(A)B= -A^{-1}BA^{-1}$, and therefore, by the chain rule $$ \partial_p (\inv \circ K) = \inv'\circ K\bigl(\partial_p K) = -K^{-1}(\partial_p K) K^{-1} $$
Solution 3:
Actually, we can directly compute the derivate of a matrix starting from the definition of the derivate of function. In particular, \begin{align} \frac{dK^{-1}}{dp} & =\lim_{\Delta p \to 0} \frac{(K+\Delta K)^{-1} - K^{-1}}{\Delta p} \\ {} & = \lim_{\Delta p \to 0} \frac{(K+\Delta K)^{-1}KK^{-1} - (K+\Delta K)^{-1}(K+\Delta K)K^{-1}}{\Delta p} \\ {} & = \lim_{\Delta p \to 0} \frac{(K+\Delta K)^{-1}(-\Delta K) K^{-1}}{\Delta p} \\ {} & = - K^{-1} \lim_{\Delta p \to 0} \frac{\Delta K}{\Delta p} K^{-1} \\ {} & = - K^{-1} (\partial_{p} K) K^{-1} \end{align}