Generalization of variance to random vectors

Let $X$ be a random variable. Then its variance (dispersion) is defined as $D(X)=E((X-E(X))^2)$. As I understand it, this is supposed to be a measure of how far off from the average we should expect to find the value of $X$.

This would seem to suggest that the natural generalization of variance to the case where $X = (X_1,X_2,\ldots,X_n)$ is random vector, should be $D(X)=E((X-E(X))^T(X-E(X)))$. Here vectors are understood to be columns, as usual. This generalization would again, quite naturally, measure how far off from the average (expectation) we can expect to find the value of vector $X$.

The usual generalization, however, is $D(X)=E((X-E(X))(X-E(X))^T)$, the variance-covariance matrix which, as I see it, measures the correlation of components.

Why is this the preferred generalization? Is $E((X-E(X))^T(X-E(X)))$ also used and does it have a name?

The variance-covariance matrix does seem to contain more information. Is this the main reason or is there something deeper going on here?


If $X \in \mathbb{R}^{n\times1}$ is a column-vector-valued random variable, then $$V=E((X-E(X))(X-E(X))^T)$$ is the variance of $X$ according to the definition given in Feller's famous book. But many authors call it the covariance matrix because its entries are the covariances between the scalar components of $X$.

It is the natural generalization of the $1$-dimensional case. For example, the $1$-dimensional normal distribution has density proportional to $$ \exp\left( \frac{-(x-\mu)^2}{2\sigma^2} \right) $$ where $\sigma^2$ is the variance. The multivariate normal has density proportional to $$ \exp\left( -\frac12 (x-\mu)^T V^{-1} (x-\mu) \right) $$ with $V$ as above.

The variance satisfies the identity $$ \operatorname{var}(AX) = A\Big(\operatorname{var}(X)\Big) A^T. $$ The matrix $A$ need not be $n\times n$. It could be $k\times n$, so that $AX$ is $k\times1$ and then both sides of this identity are $k\times k$.

It follows from the (finite-dimensional) spectral theorem that every non-negative-definite real matrix is the variance of some random vector.

Look at these:

  • http://en.wikipedia.org/wiki/Covariance_matrix
  • http://en.wikipedia.org/wiki/Multivariate_random_variable
  • http://en.wikipedia.org/wiki/Multivariate_normal_distribution
  • http://en.wikipedia.org/wiki/Wishart_distribution
  • http://en.wikipedia.org/wiki/Estimation_of_covariance_matrices

The last-listed article above has a very elegant argument. The trick of considering a scalar to be the trace of a $1\times 1$ matrix is very nice.


The covariance matrix has more information, indeed: it has the variance of each component (in the diagonal), and also the cross-variances. Your value is the sum of the variances of each component. This is not often a very useful measure. For one thing, the components might correspond to entirely different magnitudes, and hence it would little or no sense to sum the variances of each one. Think for example $X = (X_1,X_2,X_3)$ where $X_1$ height of a man, measured in meters, $X_2$ his waist circumference in centimeters, $X_3$ his weight, in kilograms...