'Trace trick' for expectations of quadratic forms

Solution 1:

Where does the trace come from?

A real number can be thought of as a $1 \times 1$ matrix, and its trace is itself. Thus $$(x-\mu)^\top \Sigma^{-1} (x-\mu) = \operatorname{tr}\left((x-\mu)^\top \Sigma^{-1} (x-\mu)\right)$$

More than one step is taken at once.

After applying the above step, use the cyclic property of the trace to obtain $$\operatorname{tr}\left((x-\mu)^\top \Sigma^{-1} (x-\mu)\right) = \operatorname{tr}\left((x-\mu)(x-\mu)^\top \Sigma^{-1} \right)$$ By linearity of the trace operator, you can push the expectation inside $$E \operatorname{tr}\left((x-\mu)(x-\mu)^\top \Sigma^{-1} \right) = \operatorname{tr}\left(E\left[(x-\mu)(x-\mu)^\top\right] \Sigma^{-1} \right).$$

Solution 2:

To add to @angryavian's answer, you can swap expectation and trace because, $$ tr(E[A]) = tr(\begin{bmatrix} E[a_{11}] & \dots & \\ \vdots & E[a_{22}] & \\ & & \ddots\end{bmatrix}) = \sum\limits_{i=1}^N E[a_{ii}]=E[\sum\limits_{i=1}^N a_{ii}] = E[tr(A)] $$