I've just encountered the Wasserstein metric, and it doesn't seem obvious to me why this is in fact a metric on the space of measures of a given metric space $X$. Except for non-negativity and symmetry (which are obvious), I don't know how to proceed.

Do you guys have any advices or links to useful references ?

Thanks in advance !
Cyril


Solution 1:

So I assume that what puzzles you are the triangle-inequality and $W_{p}(\mu,\mu)=0$, where $W_{p}$ denotes the $p$-Wasserstein metric.

Here's some preliminary information. I will denote $\Pi(\mu,\nu)$ the collection of all transference plans from $\mu$ and $\nu$, i.e. $\pi\in\Pi(\mu,\nu)$ iff $\mu$ is the first marginal of $\pi$ and $\nu$ is the second. This can also be expressed in form $\mu=(\mathrm{pr}_{1})_{\#}\pi$ and $\nu=(\mathrm{pr}_{2})_{\#}\pi$, where $\#$ denotes the push-forward. If $(X,d)$ is Polish then for every pair of probability measures $\mu,\nu$ there exists an optimal transference plan $\pi\in\Pi(\mu,\nu)$ so that $W_{p}(\mu,\nu)=\left(\int_{X\times X}d(x,y)^{p}\,d\pi(x,y)\right)^{\frac{1}{p}}$. The proof of this can be found in the book 'Topics in optimal transportation', Cedric Villani, 2003, and the key point consists of noting that $\Pi(\mu,\nu)$ is compact in the weak-convergence of measures (which is shown by using Prokhorov's theorem).

Now to the metric itself.

The triangle-inequality uses a so called "Gluing lemma" (also found in Villani's book). It states that if $\mu_{1},\mu_{2},\mu_{3}$ are Borel probability measures on $X$, and $\pi_{1,2}\in\Pi(\mu_{1},\mu_{2})$ and $\pi_{2,3}\in\Pi(\mu_{2},\mu_{3})$ are optimal transference plans, then there exists a Borel probability measure $\mu$ on $X^{3}$ with marginals $\pi_{1,2}$ to the left $X\times X$ and $\pi_{2,3}$ to the right $X\times X$. This measure in a sense glues together $\pi_{1,2}$ and $\pi_{2,3}$. It follows by a simple argument using the marginal properties of each measure that the marginal of $\mu$ to $X\times X$ (the first and third $X$) denoted by $\pi_{1,3}$ is a transference plan in $\Pi(\mu_{1},\mu_{3})$ (not necessarily optimal!) $(*)$. Using minkovski inequality of $L^{p}(X^{3},\mu)$ $(**)$, marginal properties of the measures $(***)$, optimality of $\pi_{1,2}$ and $\pi_{2,3}$ $(****)$, we obtain \begin{align*} W_{p}(\mu_{1},\mu_{3}) &\overset{(*)}{\leq} \bigg(\int_{X\times X}d(x,z)^{p}\,d\pi_{1,3}(x,z)\bigg)^{\frac{1}{p}}\overset{(***)}{=}\bigg(\int_{X\times X\times X}d(x,z)^{p}\,d\mu(x,y,z)\bigg)^{\frac{1}{p}} \\ &\leq \bigg(\int_{X\times X\times X}(d(x,y)+d(y,z))^{p}\,d\mu(x,y,z)\bigg)^{\frac{1}{p}} \\ &\overset{(**)}{\leq}\bigg(\int_{X\times X\times X}d(x,y)^{p}\,d\mu(x,y,z)\bigg)^{\frac{1}{p}}+\bigg(\int_{X\times X\times X}d(y,z)^{p}\,d\mu(x,y,z)\bigg)^{\frac{1}{p}} \\ &\overset{(***)}{=}\bigg(\int_{X\times X}d(x,y)^{p}\,d\pi_{1,2}(x,y)\bigg)^{\frac{1}{p}}+\bigg(\int_{X X\times X}d(y,z)^{p}\,d\pi_{2,3}(y,z)\bigg)^{\frac{1}{p}} \\ &\overset{(****)}{=}W_{p}(\mu_{1},\mu_{2})+W_{p}(\mu_{2},\mu_{3}). \end{align*} So we have the triangle-inequality.

About the $W_{p}(\mu,\mu)=0$, take the homeomorphism $f:X\to\Delta$ given by $x\mapsto(x,x)$, i.e. $\Delta$ is the "diagonal" of $X\times X$. Then take $\nu:=f_{\#}\mu$ (which is a Borel probability measure on the diagonal $\Delta$) and furthermore define a Borel probability measure $\pi$ on the product space $X\times X$ by setting $\pi(A)=\nu(A\cap\Delta)$ for all Borel sets $A$. Now $\pi$ is a transference plan between $\mu$ to itself (not necessarily optimal!), which is a straight-forward proof, and it vanishes outside the diagonal (i.e. $\pi(\Delta^{c})=0$). Since the diagonal is the zero set of the metric $d$, we conclude that \begin{equation*} W_{p}(\mu,\mu)^{p}\leq \int_{X\times X}d(x,y)^{p}\,\pi(x,y)=\int_{\Delta}d(x,y)^{p}\,\pi(x,y)+\int_{\Delta^{c}}d(x,y)^{p}\,\pi(x,y)=0+0=0, \end{equation*} whence $W_{p}(\mu,\mu)=0$.