Deriving the formula for multivariate Gaussian distribution

The PDF for the distribution is

$$f(x)=\frac{1}{\sqrt{(2\pi)^n|\boldsymbol\Sigma|}} \exp\left(-\frac{1}{2}({\boldsymbol x}-{\boldsymbol \mu})^T {\boldsymbol\Sigma}^{-1}({\boldsymbol x}-{\boldsymbol \mu}) \right)$$

If I were trying to derive it from scratch, I would start with the univariate Gaussian distribution $$f(x)=\frac{1}{\sqrt{2\pi}\sigma} \exp\left(-\frac{1}{2}\frac{({x}-{\mu})^2}{\sigma^2}) \right),$$

and do the following:

  1. (ignoring the normalisation factor)
  2. substitute $(x-\mu)^2$ to $({\boldsymbol x}-{\boldsymbol \mu})^T({\boldsymbol x}-{\boldsymbol \mu})$
  3. substitute $\sigma^2$ to $({\boldsymbol x}-{\boldsymbol \mu})^T \boldsymbol\Sigma ({\boldsymbol x}-{\boldsymbol \mu})$, which should be equal to the variance along the direction of $({\boldsymbol x}-{\boldsymbol \mu})$ and so exactly what I want.

This would result in the following formula

$$f(x)=\frac{1}{\sqrt{(2\pi)^n|\boldsymbol\Sigma|}} \exp\left(-\frac{1}{2}\frac{({\boldsymbol x}-{\boldsymbol \mu})^T({\boldsymbol x}-{\boldsymbol \mu})}{({\boldsymbol x}-{\boldsymbol \mu})^T \boldsymbol\Sigma ({\boldsymbol x}-{\boldsymbol \mu})}) \right),$$

which doesn't seem right.

My two questions are

  • What is the problem with the approach above? Please let me know if it doesn't make sense at all.
  • Could you recommend a good derivation of the multivariate Gaussian? (couldn't find anything on exchange or Quora).

If putting $({\boldsymbol x}-{\boldsymbol \mu})^T \boldsymbol\Sigma ({\boldsymbol x}-{\boldsymbol \mu})$ made sense, then it would make sense in the univariate case, $(x-\mu)\sigma^2(x-\mu),$ but it doesn't. In the univariate case you have $$ \sigma^2 = \operatorname{E}( (X-\mu)^2 ). $$ In the multivariate case you have $$ \Sigma = \operatorname{E}((X-\boldsymbol \mu) (X-\boldsymbol\mu)^T) = \text{an $n\times n$ matrix, where $\boldsymbol \mu$ is an $n\times 1$ vector.} $$

If $X\sim \operatorname{N}(\mu,\sigma^2)$ then $\dfrac{X-\mu} \sigma \sim\operatorname{N}(0,1).$ Similarly if $\boldsymbol X \sim \operatorname{N}(\boldsymbol\mu,\Sigma)$ then $\Sigma^{-1/2}(\boldsymbol X-\boldsymbol\mu) \sim \operatorname{N}(\boldsymbol0,I_n)$ where $I_n$ is the $n\times n$ identity matrix.

But what is $\Sigma^{-1/2}$? It is a consequence of the finite-dimensional version of the spectral theorem that a nonnegative-definite symmetric real matrix has a nonnegative-definite symmetric real square root, and this is it.

For $\boldsymbol X\sim\operatorname{N}(\boldsymbol 0, I_n),$ the density is $$ \frac 1 {\sqrt{2\pi}^n} \exp \left( \frac{-1} 2 x^T x \right). $$ For $\boldsymbol Y = A\boldsymbol X+\boldsymbol b,$ where $A$ is a $k\times n$ matrix and $b$ is a $k\times 1$ vector, the density is $$ \frac 1 {\sqrt{2\pi}^n} \cdot \frac 1 {\sqrt{\det (AA^T)}} \exp\left( \frac{-1} 2 \left( (\boldsymbol y - \boldsymbol b)^T (A A^T)^{-1} (\boldsymbol y - \boldsymbol b) \right) \right) $$ Notice that $$ \operatorname{var}(Y) = A\Big( \operatorname{var}(X) \Big) A^T = A I_n A^T = A A^T. $$ So the multiplication by $(AA^T)^{-1}$ corresponds to the division by $\sigma^2.$