Derivative of multivariate normal distribution wrt mean and covariance

I want to differentiate this wrt $\mu$ and $\Sigma$ :
$${1\over \sqrt{(2\pi)^k |\Sigma |}} e^{-0.5 (x-\mu)^T \Sigma^{-1} (x-\mu)} $$

I'm following the matrix cookbook here and also this answer . The solution given in the answer (2nd link), doesn't match with what I read in the cookbook.
For example, for this term, if I follow rule 81 from the linked cookbook, I get a different answer (differentiating wrt $\mu$) :
$(x-\mu)^T \Sigma^{-1} (x-\mu)$

According to the cookbook, the answer should be : $-(\Sigma^{-1} + \Sigma^{-T}) (x-\mu)$ . Or, am I missing something here? Also, how do I differentiate $(x-\mu)^T \Sigma^{-1} (x-\mu)$
with respect to $\Sigma$ ?


Solution 1:

For convenience, define some variables which are easier to type $$\eqalign{ M &= \Sigma^{-1} \cr z &= (\mu-x) \cr }$$

Now let's answer your second question first.
Rewrite the function in terms of the above variables and the Frobenius (:) product and find its differential $$\eqalign{ f &= z^TMz \cr &= M:z\,z^T \cr\cr df &= M:(dz\,z^T+z\,dz^T) + zz^T:dM \cr &= Mz:dz + z^TM:dz^T - zz^T:M\,d\Sigma\,M \cr &= Mz:dz + M^Tz:dz - M^Tzz^TM^T:d\Sigma \cr &= (M+M^T)\,z:dz - M^Tzz^TM^T:d\Sigma \cr &= (M+M^T)\,(\mu-x):d\mu - M^Tzz^TM^T:d\Sigma \cr }$$ Setting $d\Sigma=0$ yields the gradient with respect to $\mu$ as $$\eqalign{ \frac{\partial f}{\partial \mu} &= (\Sigma^{-1}+\Sigma^{-T})\,(\mu-x) \cr }$$ and setting $d\mu=0$ yields $$\eqalign{ \frac{\partial f}{\partial \Sigma} &= - M^Tzz^TM^T \cr }$$ Now back to your first function.
Let's write down its logarithm and find the differential $$\eqalign{ L &= \frac{1}{2}\Big(\log\det(M) - f\Big) \cr &= \frac{1}{2}\Big({\rm tr}\log(M) - f\Big) \cr\cr dL &= \frac{1}{2}\Big(M^{-T}:dM - df\Big) \cr &= \frac{1}{2}\Big(M^{-T}:dM - (M+M^T)\,(\mu-x):d\mu + M^Tzz^TM^T:d\Sigma\Big) \cr &= \frac{1}{2}\Big(-M^{-T}:M\,d\Sigma\,M - (M+M^T)\,(\mu-x):d\mu + M^Tzz^TM^T:d\Sigma\Big) \cr &= \frac{1}{2}\Big(-M^T:d\Sigma - (M+M^T)\,(\mu-x):d\mu + M^Tzz^TM^T:d\Sigma\Big) \cr &=\frac{1}{2}(M^Tzz^TM^T-M^T):d\Sigma-\frac{1}{2}(M+M^T)\,(\mu-x):d\mu \cr }$$ Once again, holding one of the independent variables constant yields the gradient with respect to the other $$\eqalign{ \frac{\partial L}{\partial \mu} &= \frac{1}{2}(M+M^T)\,(x-\mu) \cr\cr \frac{\partial L}{\partial \Sigma} &= \frac{1}{2}(M^Tzz^TM^T-M^T) \cr }$$ To recover the gradient of the original function (let's call it $H$) simply apply the logarithmic derivative rule $$\eqalign{ \frac{\partial H}{\partial\mu} &= H\Bigg(\frac{\partial L}{\partial\mu}\Bigg) \cr \frac{\partial H}{\partial\Sigma} &= H\Bigg(\frac{\partial L}{\partial\Sigma}\Bigg) \cr }$$

Solution 2:

I also had the same question as you. After trying equation 81 from the Matrix cookbook, I got this equation: $$ \frac{\partial{f}}{\partial{\mu}} = -\frac{1}{2}(\Sigma ^{-1} + (\Sigma^{-1})^{T}) (x - \mu)*(-1) $$ Since $ \Sigma $ is the co-variance matrix, it is symmetrical. Inverse of a symmetrical matrix is also symmetric (Is the inverse of a symmetric matrix also symmetric?). Therefore, we have $ (\Sigma^{-1})^{T} = \Sigma ^{-1} $.

Now, the above equation reduces to $$ \frac{\partial{f}}{\partial{\mu}} = \Sigma ^{-1}(x - \mu) $$