Matrix/Vector Derivative

Solution 1:

Think of when you compute the derivative with $n = 1$ : you get $(x-\mu)^{\top} \Sigma (x-\mu) = \sigma (x-\mu)^2$ for some constant $\sigma$ representing the matrix, thus derivative with respect to $\mu$ gives $2 \sigma(\mu-x)$. This is what happens in dimension $n$ : the derivative of this function is the gradient seen as a function of $\mu$. Write $x = (x_1, \dots, x_n)$, $\mu = (\mu_1, \dots, \mu_n)$ and $\Sigma = (\sigma_{ij})$. Then $$ f(\mu) = (x-\mu)^{\top} \Sigma (x-\mu) = \sum_{i=1}^n \sum_{j=1}^n (x_i - \mu_i)(x_j - \mu_j)\sigma_{ij}. $$ Computing partial derivatives, say, with respect to the $k^{\text{th}}$ variable, with $1 \le k \le n$, you get $$\begin{align*} \frac{\partial f}{\partial\mu_{k}} &= \sum_{i=1}^n \sum_{j=1}^n \left( -\delta_{ik} (x_j - \mu_j) \sigma_{ij} \right) + \left( (x_i -\mu_i) (-\delta_{jk}) \sigma_{ij} \right)\\ &= \sum_{j=1}^n (\mu_j - x_j) \sigma_{kj} + \sum_{i=1}^n (\mu_i - x_i) \sigma_{ik}, \end{align*}$$ where $\delta_{ij} = 0$ if $i \neq j$ and $1$ if $i=j$. If you look at the vector $\nabla f = \left( \frac{ \partial f}{\partial \mu_1} , \dots, \frac{ \partial f}{\partial \mu_n}\right)$, you see that its components are precisely those of the vector $\Sigma(\mu-x) + \Sigma^{\top} (\mu-x)$. If the matrix $A$ is symmetric, you get $2\Sigma(\mu-x)$.

Hope that helps,

Solution 2:

There is a very short and quick way to calculate it correctly. The object $(x-\mu)^T\Sigma(x-\mu)$ is called a quadratic form. It is well known that the derivative of such a form is (see e.g. here),

$$\frac{\partial x^TAx }{\partial x}=(A+A^T)x$$

This works even if $A$ is not symmetric. In your particular example, you use the chain rule as,

$$\frac{\partial (x-\mu)^T\Sigma(x-\mu) }{\partial \mu}=\frac{\partial (x-\mu)^T\Sigma(x-\mu) }{\partial (x-\mu)}\frac{\partial (x-\mu)}{\partial \mu}$$

Thus,

$$\frac{\partial (x-\mu)^T\Sigma(x-\mu) }{\partial (x-\mu)}=(\Sigma +\Sigma^T)(x-\mu)$$

and

$$\frac{\partial (x-\mu)}{\partial \mu}=-1$$

Combining equations you get the final answer,

$$\frac{\partial (x-\mu)^T\Sigma(x-\mu) }{\partial \mu}=(\Sigma +\Sigma^T)(\mu-x)$$