What is the intuition of why convergence in distribution does not imply convergence in probability

Solution 1:

First, in convergence in distribution, the random variables can be defined on different spaces, whereas convergence in probability requires them all to be defined on the same space (unless the limiting 'random variable' is degenerate).

Second, even artificially requiring that all random variables are defined on the same space, it is easy to find examples of sequences of random variables that converge in distribution, but not in probability. Here is (perhaps) the most trivial example.

Let the sample space have only two points $a$ and $b$, each with probability $1/2.$ Let all $X_n$ be the same with $X_n(a) = 0$ and $X_n(b) = 1$; let $Y$ be defined as $Y(a) = 1$ and $Y(b) = 0$. (Notice the crucial switch.) Then $F_{X_n}(x)$ has jumps of $1/2$ at $x = 0$ and $x = 1.$ Also, the CDF of $Y$ is exactly the same function. So $F_{X_n}(x) \equiv F_Y(x)$ and $|F_{X_n}(x) - F_Y(x)| \equiv 0$, making convergence in distribution trivial.

By contrast, $|X_n(a) - Y(a)| = 1$ and $|X_n(b) - Y(b)| = 1,$ so $P(|X_n - Y| = 1) = 1,$ and convergence in probability is impossible.

Finally, for convergence to a constant, the two concepts can be interpreted as equivalent.

Solution 2:

Your example consists of the distribution $(X,Y) \in_R \{(1,0),(0,1)\}$. Considered individually, $X$ and $Y$ are the "same" random variable. However, they are not independent, so $X$ and $Y$ are not the same as two independent copies of $X$.

There are other simple counterexamples out there, such as the following. Let $X$ and $X_1,X_2,\ldots$ be independent standard Gaussians. They all have the same law, so the sequence $X_i$ convergence to $X$ in distribution. But clearly it doesn't converge to $X$ in probability. Here everything is independent, so the problem is different: in fact, the problem is that the sequence $X_i$ is independent of $X$.

Perhaps it's best to consider an example in which $X_i$ does converge to $X$ in probability. Let $X$ be any (reasonable) random variable, let $Z_i$ be independent standard Gaussians, let $Y_i = X + Z_i$, and let $X_i = (Y_1 + \cdots + Y_i)/i$. Under mild conditions, the sequence $X_i$ converges to $X$. The picture here is that the $X_i$ are noisy versions of $X$, but the noise tends to zero. This just doesn't happen in the two counterexamples (yours and mine).