Solution 1:

It's not the derivative with respect to a matrix really. It's the derivative of $f$ with respect to each element of a matrix and the result is a matrix.

Although the calculations are different, it is the same idea as a Jacobian matrix. Each entry is a derivative with respect to a different variable.

Same goes with $\frac{\partial f}{\partial \mu}$, it is a vector made of derivatives with respect to each element in $\mu$.

You could think of them as $$\bigg[\frac{\partial f}{\partial \Sigma}\bigg]_{i,j} = \frac{\partial f}{\partial \sigma^2_{i,j}} \qquad \text{and}\qquad \bigg[\frac{\partial f}{\partial \mu}\bigg]_i = \frac{\partial f}{\partial \mu_i}$$ where $\sigma^2_{i,j}$ is the $(i,j)$th covariance in $\Sigma$ and $\mu_i$ is the $i$th element of the mean vector $\mu$.

Solution 2:

You can view this in the same way you would view a function of any vector. A matrix is just a vector in a normed space where the norm can be represented in any number of ways. One possible norm would be the root-mean-square of the coefficients; another would be the sum of the absolute values of the matrix coefficients. Another is as the norm of the matrix as a linear operator on a vector space with its own norm.

What is significant is that the invertible matrices are an open set; so a derivative can make sense. What you have to do is find a way to approximate $$ f(x,\Sigma + \Delta\Sigma,\mu)-f(x,\Sigma,\mu)$$ as a linear function of $\Delta\Sigma$. I would use a power series to find a linear approximation. For example, $$ (\Sigma+\Delta\Sigma)^{-1}=\Sigma^{-1}(I+(\Delta\Sigma) \Sigma^{-1})^{-1} =\Sigma^{-1} \sum_{n=0}^{\infty}(-1)^{n}\{ (\Delta\Sigma)\Sigma^{-1}\}^{n} \approx \Sigma^{-1}(I-(\Delta\Sigma)\Sigma^{-1})$$ Such a series converges for $\|\Delta\Sigma\|$ small enough (using whatever norm you choose.) And, in the language of derivatives, $$ (\frac{d}{d\Sigma} \Sigma^{-1})\Delta\Sigma = -\Sigma^{-1}(\Delta\Sigma)\Sigma^{-1} $$ Remember, that the derivative is a linear operator on $\Delta\Sigma$; if you squint you can almost see the classical term $\frac{d}{dx}x^{-1} =-x^{-2}$. Chain rules for derivatives apply. So that's how you can handle the exponential composed with matrix inversion.