What is the motivation of Levy-Prokhorov metric?

From Wikipedia

Let $(M, d)$ be a metric space with its Borel sigma algebra $\mathcal{B} (M)$. Let $\mathcal{P} (M)$ denote the collection of all probability measures on the measurable space $(M, \mathcal{B} (M))$.

For a subset $A \subseteq M$, define the $ε$-neighborhood of $A$ by $$ A^{\varepsilon} := \{ p \in M ~|~ \exists q \in A, \ d(p, q) &lt \varepsilon \} = \bigcup_{p \in A} B_{\varepsilon} (p). $$ where $B_{\varepsilon} (p)$ is the open ball of radius $\varepsilon$ centered at $p$.

The Lévy–Prokhorov metric $\pi : \mathcal{P} (M)^{2} \to [0, + \infty)$ is defined by setting the distance between two probability measures $\mu$ and $\nu$ to be $$ \pi (\mu, \nu) := \inf \left\{ \varepsilon > 0 ~|~ \mu(A) \leq \nu (A^{\varepsilon}) + \varepsilon \ \text{and} \ \nu (A) \leq \mu (A^{\varepsilon}) + \varepsilon \ \text{for all} \ A \in \mathcal{B}(M) \right\}. $$

  1. I wonder what the purpose, motivation and intuition of the L-P metric are?
  2. Is the following alternative a reasonable metric or some generalized metric between measures $$ \sup_{A \in \mathcal{B}(M)} |\mu(A) - \nu(A)|? $$ If yes, is this one more simple and easy to understand and therefore maybe more useful than L-P metric?
  3. A related metric between distribution functions is the Levy metric:

    Let $F, G : \mathbb{R} \to [0, + \infty)$ be two cumulative distribution functions. Define the Lévy distance between them to be $$ L(F, G) := \inf \{ \varepsilon > 0 | F(x - \varepsilon) - \varepsilon \leq G(x) \leq F(x + \varepsilon) + \varepsilon \mathrm{\,for\,all\,} x \in \mathbb{R} \}. $$

    I wonder how to picture this intuition part:

    Intuitively, if between the graphs of $F$ and $G$ one inscribes squares with sides parallel to the coordinate axes (at points of discontinuity of a graph vertical segments are added), then the side-length of the largest such square is equal to $L(F, G)$.

Thanks and regards!


Solution 1:

Most of what occurs to me has already been said, but you may find the following picture useful.

If $d_C$ is the Chebyshev metric on $R^2$, i.e. with points $\mathbf{p} = (x_1,y_1)$ and $\mathbf{q} = (x_2,y_2)$ in $R^2$,

$d_C(\mathbf{p,q}) := |x_1-x_2| \vee |y_1-y_2|$,

and $h_C$ is the Hausdorff metric on closed subsets of $R^2$ induced by $d_C$, i.e. with $A$ and $B$ being closed subsets of $R^2$,

$h_C(A,B):= \sup_{\mathbf{p} \in A} d_C(\mathbf{p},B) \vee \sup_{\mathbf{q} \in B} d_C(\mathbf{q},A)$,

where as usual $d_C(\mathbf{p},B) = \inf_{\mathbf{r} \in B} d_C(\mathbf{p,r})$ etc,

then the Levy metric between two distribution functions $F$ and $G$ is simply the Hausdorff distance $d_C$ between the closures of the completed graphs of $F$ and $G$.

Solution 2:

I think you get a complete picture of the Prokhorov metric $\pi$ by combining what Dirk has pointed out and what we already know about the total variation metric. Essentially, $\pi$ is a measure-theoretic analogue of the Hausdorff metric, but loosened up modulo the total variation metric. I will explain what I mean by this.

Suppose we have a probability measure $\mu$. We can imagine two different types (I will call them Type I, Type II) of ways of of slightly changing the measure $\mu$. To make exposition simple, let's suppose $\mu$ is just a pile of N point masses, i.e., $$\mu = \frac1N \sum_{i=1}^N \delta_{x_i}$$ where $x_1, x_2, \cdots, x_N$ are $N$ points in space and $\delta_x$ denotes the point mass at $x$, i.e., the Dirac measure. A type I change is when you cut out a tiny chunk of $\mu$ and then move that chunk arbitrarily. To be precise, we will say that the new probability measure $\nu$ is obtained by a type I change from $\mu$ within $\epsilon >0$ if we have $y_1, \cdots, y_N$ (another list of N points) such that $$\nu := \frac1N \sum_{i=1}^N \delta_{y_i}$$ and $$ \#\{1 \le i \le N: x_i \ne y_i \} \le \epsilon N $$

An essential property of the total variation metric $\delta(\mu, \nu)$ (between probability measures) is that it allows changes of type I. In other words, we have a constant C such that $\delta(\mu, \nu) \le C \epsilon$ whenever $\nu$ is obtained from $\mu$ by type I change within $\epsilon$.

A type II change is when you move all or some of the particles within small distance individually. To be precise, the definition for type II change replaces the condition $$ \#\{1 \le i \le N: x_i \ne y_i \} \le \epsilon N $$ with this condition $$ d(x_i, y_i) < \epsilon \ \forall 1 \le i \le N $$

The Hausdorff metric allows changes of type II in the following sense: there is a constant C such that whenever $x_1, \cdots, x_n$ and $y_1, \cdots, y_n$ are two lists of $N$ points in space such that the above condition holds, the Hausdorff distance between the two sets $\{x_i : 1 \le i \le N\}, \{y_i : 1 \le i \le N\}$ is $\le C\epsilon$.

The Prokhorov metric $\pi$ allows both type I and type II. In fact, you should be able to prove the following fact. $$ \#\{1 \le i \le N: d(x_i, y_i) \ge \epsilon_2 \} \le \epsilon_1 N \implies \nu(A) \le \mu(A^{\epsilon_2}) + \epsilon_1 \ \forall A$$ This is just a type I change within $\epsilon_1$ followed by a type II change within $\epsilon_2$. So the Prokhorov metric is simply what you would have come up with if you tried to define a metric $\pi$ with the nice property that $\pi(\mu, \nu) \le \epsilon$ whenever $\nu$ is obtained by moving particles in a $1-\epsilon$ portion of a ``pile of dirt of unit mass'' $\mu$ in any way within distance $\epsilon$ and the rest $\epsilon$ portion of $\mu$ arbitrarily.

Of course we can think of another metric $\pi'$ that satisfies this nice property simply by definition.

$$ \pi'(\mu, \nu) := \inf_{\gamma \in \Gamma(\mu, \nu)} \kappa(\gamma) $$

where $\Gamma(\mu,\nu)$ means the set of all couplings between $\mu, \nu$ and

$$ \kappa(\gamma) := \inf\{ \epsilon: \gamma\{ (x,y): d(x, y) > \epsilon \} < \epsilon \} $$

The nice property of the Prokhorov metric can then be re-expressed as $\pi \le \pi'$ since $\Gamma(\mu,\nu)$ can be thought of as the collection of all possible ways of moving the pile of dirt $\mu$ into the new pile of dirt $\nu$. Less obvious is the fact that the other inequality $\pi \ge \pi'$ also holds. So in the end, these aren't really two metrics $\pi$ and $\pi'$, they are one same metric $\pi = \pi'$.

The Prokhorov metric reduces to the total variation metric when the discrete metric is assigned to the space $M$. So another way of thinking of $\pi$ is that it is a generalization of the total variation metric that takes the topology of the space into account.