Expected distance between two vectors that belong to two different Gaussian distributions

Let $X$, $Y$ be two random variables that follow the Gaussian distributions with mean vectors $\mu_x$, $\mu_y$, and covariance matrices $\Sigma_x$, $\Sigma_y$, respectively. The probability density functions of $X$, $Y$ are given, respectively, by $$ f_{X}(\mathbf{x})=\frac{1}{(2\pi)^{\frac{n}{2}}\lvert \Sigma_x \rvert^{\frac{1}{2}}} \exp\Big\{-\frac{1}{2}(\mathbf{x}-\mu_x)^\top\Sigma_x^{-1}(\mathbf{x}-\mu_x)\Big\}, $$ and $$ f_{Y}(\mathbf{y})=\frac{1}{(2\pi)^{\frac{n}{2}}\lvert \Sigma_y \rvert^{\frac{1}{2}}} \exp\Big\{-\frac{1}{2}(\mathbf{y}-\mu_y)^\top\Sigma_x^{-1}(\mathbf{y}-\mu_y)\Big\}, $$ where $\mathbf{x},\mathbf{y}\in\Bbb{R}^n$. We will be thinking of $\mathbf{x}$, $\mathbf{y}$ as "members" of the distributions $X$, $Y$, respectively.

If we have two fixed vectors, say $\mathbf{x}$, $\mathbf{y}$, then the squared Euclidean distance between them would be equal to $$ \big\lVert \mathbf{x} - \mathbf{y} \big\rVert^2. $$

If we think about $\mathbf{x}$, $\mathbf{y}$ as above, i.e., as members of $X$, $Y$, respectively, then what would be the expected value of this distance?

Thank you very much for your help!


Solution 1:

If $X$ and $Y$ are independent and normal $(\mu_X,\Sigma_X)$ and $(\mu_Y,\Sigma_Y)$ respectively, then:

$$E(\|X-Y\|^2)=\|\mu_X-\mu_Y\|^2+\mathrm{tr}(\Sigma_X+\Sigma_Y)$$

To show this, note that, by independence, $X-Y$ is normal $(\mu_X-\mu_Y,\Sigma_X+\Sigma_Y)$ and that every random variable $Z$ normal $(\mu,\Sigma)$ can be written as $Z=\mu+LU$ where $LL^\top=\Sigma$ and $U$ is standard normal, hence a little bit of matrix calculus should yield the result.

To wit, note that the decomposition $$\|Z\|^2=Z^\top Z=\mu^\top\mu+\mu^\top LU+U^\top L^\top\mu+U^\top L^\top LU,$$ and the fact that $E(U)=0$ and $E(U^\top)=0^\top$ yield $$E(\|Z\|^2)=\mu^\top\mu+E(U^\top L^\top LU).$$ Now, $\mu^\top\mu=\|\mu\|^2$ and $$U^\top L^\top LU=\sum_{k,\ell}(L^\top L)_{k,\ell}U_kU_\ell,\quad E(U_k^2)=1,\quad E(U_kU_\ell)=0\ (k\ne\ell),$$ hence $$E(U^\top L^\top LU)=\sum_{k}(L^\top L)_{k,k}=\mathrm{tr}(L^\top L)=\mathrm{tr}(LL^\top)=\mathrm{tr}(\Sigma).$$ Finally, as desired, $$E(\|Z\|^2)=\|\mu\|^2+\mathrm{tr}(\Sigma).$$

Remarks:

  • This nowhere uses the explicit forms of the densities. As a matter of fact, when solving problems about normal random variables, a useful principle is to avoid as much as possible to manipulate the gaussian densities themselves. Instead, write each $(\mu,\Sigma)$ normal random variable as $\mu+LU$ with $LL^\top=\Sigma$, as we did, and proceed with the standard normal $U$.
  • The mapping $(x,y)\mapsto\|x-y\|^2$ is not a metrics, only $(x,y)\mapsto\|x-y\|$ is.

Solution 2:

If $x,y$ are independent, and thus, uncorrelated, then $p(x,y)$ is their joint probability distribution, which is Gaussian again, with mean: $[\mu_x,\mu_y]^T$ and covariance $\text{diag}\{\Sigma_x,\Sigma_y\}$ (dimensions are $2N\times2N$).

Then, $E_{p(x,y)}[\|x-y\|^2]=E_{p(x,y)}[\|x\|^2+\|y\|^2-2x^Ty]=\mu_x^T\mu_x+\mu_y^T\mu_y+\operatorname{trace}{(\Sigma_x+\Sigma_y)}$, since $x,y$ are independent.

--->The expectation, of course, is taken with respect to the joint probability of the two vectors. Thus, $x,y$ are considered members of the joint, instead of their respective distributions, in order the question to have meaning.

(Edit: as it was pointed out in the comments, the mistake above is the assumption that $E_{p(x,y)}[x^Ty]=0$. It is rather $E_{p(x,y)}[x^Ty]=\mu_x^T\mu_y$, which makes the overall expected value equal to:

$E_{p(x,y)}[\|x-y\|^2]=||\mu_x-\mu_y||^2+\operatorname{trace}{(\Sigma_x+\Sigma_y)}$

Edit (cont'd): my approach is based on the formulas used for calculating the expectation of inner and outer products with respect to a distribution: $E[zz^T]=\operatorname{trace}{(\mu_z\mu_z^T+\Sigma_z)}$, where $z$ follows $N(\mu_z,\Sigma_z)$)

Edit 2: Correction: $\Sigma_z^{-1}$ has been replaced by $\Sigma_z$ (and the same for $x,y$).

You are welcome,

Giannis.

Solution 3:

I would like to point out that the identity that @Did points out has nothing to do with the Gaussian distribution and would hold true for any uncorrelated random vectors $X,Y$. Rather it is a simple consequence of the bias-variance decomposition of the mean squared error:

$$E[\|\hat\theta-\theta\|^2]=\|E[\hat\theta]-\theta\|^2+ tr(V(\hat\theta)),$$

where $\hat \theta$ is some random vector thought of as an estimator for parameter $\theta$. The identity @Did mentions is the case where $\hat\theta=X-Y,\theta=0,$ and $X\sim N(\mu_X,\Sigma_X),Y\sim N(\mu_Y,\Sigma_Y)$ are uncorrelated (so that the variance of the difference is the sum of their variances).


Here is a simple proof to convince you. First, recall some properties:

  1. $V(X)=E[XX']-E[X]E[X']$
  2. $tr(A+B)=tr(A)+tr(B)$
  3. $tr(AB)=tr(BA)$.

Then we have

$$\begin{align}tr(V(\hat\theta))&=tr(V(\hat\theta-\theta))\\ &=tr(E[(\hat\theta-\theta)(\hat\theta-\theta)']-E[\hat\theta-\theta]E[(\hat\theta-\theta)'])\quad (1)\\ &=tr(E[(\hat\theta-\theta)(\hat\theta-\theta)'])-tr(E[\hat\theta-\theta]E[(\hat\theta-\theta)'])\quad (2)\\ &=tr(E[(\hat\theta-\theta)'(\hat\theta-\theta)])-tr(E[\hat\theta-\theta]'E[\hat\theta-\theta])\quad (3)\\ &=E[\|\hat\theta-\theta\|^2]-\|E[\hat\theta]-\theta\|^2\\ \implies E[\|\hat\theta-\theta\|^2]&=\|E[\hat\theta]-\theta\|^2+ tr(V(\hat\theta))& \end{align}$$