How to show $\frac {\partial a^{T}X^{-1}b}{\partial X} = -\left( X^{-1}\right) ^{T}ab^{T}\left( X^{-1}\right) ^{T}$? [duplicate]

Solution 1:

Before we start deriving the gradient, some facts and notations for brevity:

  • Trace and Frobenius product relation $$\left\langle A, B C\right\rangle={\rm tr}(A^TBC) := A : B C$$
  • Cyclic properties of Trace/Frobenius product \begin{align} A : B C &= BC : A \\ &= A C^T : B \\ &= {\text{etc.}} \cr \end{align}

Firstly, we obtain the differential for $X^{-1}$, which will be utilized for the gradient you are seeking: \begin{align} d\left[X^{-1}X = I\right] &= dX^{-1} X + X^{-1}dX = 0 \\ & \Leftrightarrow dX^{-1} = -X^{-1} dX X^{-1} \ . \end{align}

Let $f := a^T X^{-1} b = a: X^{-1} b$.

Now, we can obtain the differential first, and then the gradient of $\frac{\partial f}{\partial X}$. \begin{align} df &= a: dX^{-1} b \\ &= a: -X^{-1} dX X^{-1} b\\ &= -X^{-T} a b^T X^{-T} : dX \\ \end{align}

Thus, the gradient is \begin{align} \frac{\partial f}{\partial X} = -X^{-T} a b^T X^{-T}. \end{align}