Derivative of multivariate normal distribution wrt mean and covariance
I want to differentiate this wrt $\mu$ and $\Sigma$ :
$${1\over \sqrt{(2\pi)^k |\Sigma |}} e^{-0.5 (x-\mu)^T \Sigma^{-1} (x-\mu)} $$
I'm following the matrix cookbook here and also this answer . The solution given in the answer (2nd link), doesn't match with what I read in the cookbook.
For example, for this term, if I follow rule 81 from the linked cookbook, I get a different answer (differentiating wrt $\mu$) :
$(x-\mu)^T \Sigma^{-1} (x-\mu)$
According to the cookbook, the answer should be : $-(\Sigma^{-1} + \Sigma^{-T}) (x-\mu)$ . Or, am I missing something here? Also, how do I differentiate $(x-\mu)^T \Sigma^{-1} (x-\mu)$
with respect to $\Sigma$ ?
Solution 1:
For convenience, define some variables which are easier to type $$\eqalign{ M &= \Sigma^{-1} \cr z &= (\mu-x) \cr }$$
Now let's answer your second question first.
Rewrite the function in terms of the above variables and the Frobenius (:) product and find its differential
$$\eqalign{
f &= z^TMz \cr
&= M:z\,z^T \cr\cr
df &= M:(dz\,z^T+z\,dz^T) + zz^T:dM \cr
&= Mz:dz + z^TM:dz^T - zz^T:M\,d\Sigma\,M \cr
&= Mz:dz + M^Tz:dz - M^Tzz^TM^T:d\Sigma \cr
&= (M+M^T)\,z:dz - M^Tzz^TM^T:d\Sigma \cr
&= (M+M^T)\,(\mu-x):d\mu - M^Tzz^TM^T:d\Sigma \cr
}$$
Setting $d\Sigma=0$ yields the gradient with respect to $\mu$ as
$$\eqalign{
\frac{\partial f}{\partial \mu} &= (\Sigma^{-1}+\Sigma^{-T})\,(\mu-x) \cr
}$$
and setting $d\mu=0$ yields
$$\eqalign{
\frac{\partial f}{\partial \Sigma} &= - M^Tzz^TM^T \cr
}$$
Now back to your first function.
Let's write down its logarithm and find the differential
$$\eqalign{
L &= \frac{1}{2}\Big(\log\det(M) - f\Big) \cr
&= \frac{1}{2}\Big({\rm tr}\log(M) - f\Big) \cr\cr
dL &= \frac{1}{2}\Big(M^{-T}:dM - df\Big) \cr
&= \frac{1}{2}\Big(M^{-T}:dM - (M+M^T)\,(\mu-x):d\mu + M^Tzz^TM^T:d\Sigma\Big) \cr
&= \frac{1}{2}\Big(-M^{-T}:M\,d\Sigma\,M - (M+M^T)\,(\mu-x):d\mu + M^Tzz^TM^T:d\Sigma\Big) \cr
&= \frac{1}{2}\Big(-M^T:d\Sigma - (M+M^T)\,(\mu-x):d\mu + M^Tzz^TM^T:d\Sigma\Big) \cr
&=\frac{1}{2}(M^Tzz^TM^T-M^T):d\Sigma-\frac{1}{2}(M+M^T)\,(\mu-x):d\mu \cr
}$$
Once again, holding one of the independent variables constant yields the gradient with respect to the other
$$\eqalign{
\frac{\partial L}{\partial \mu} &= \frac{1}{2}(M+M^T)\,(x-\mu) \cr\cr
\frac{\partial L}{\partial \Sigma} &= \frac{1}{2}(M^Tzz^TM^T-M^T) \cr
}$$
To recover the gradient of the original function (let's call it $H$) simply apply the logarithmic derivative rule
$$\eqalign{
\frac{\partial H}{\partial\mu} &= H\Bigg(\frac{\partial L}{\partial\mu}\Bigg) \cr
\frac{\partial H}{\partial\Sigma} &= H\Bigg(\frac{\partial L}{\partial\Sigma}\Bigg) \cr
}$$
Solution 2:
I also had the same question as you. After trying equation 81 from the Matrix cookbook, I got this equation: $$ \frac{\partial{f}}{\partial{\mu}} = -\frac{1}{2}(\Sigma ^{-1} + (\Sigma^{-1})^{T}) (x - \mu)*(-1) $$ Since $ \Sigma $ is the co-variance matrix, it is symmetrical. Inverse of a symmetrical matrix is also symmetric (Is the inverse of a symmetric matrix also symmetric?). Therefore, we have $ (\Sigma^{-1})^{T} = \Sigma ^{-1} $.
Now, the above equation reduces to $$ \frac{\partial{f}}{\partial{\mu}} = \Sigma ^{-1}(x - \mu) $$