How interpret convergence in probability?
Q1) We say that $X_n\to X$ in probability if $$\forall \varepsilon>0, \lim_{n\to \infty }\mathbb P\{|X_n-X|>\varepsilon\}=0.$$
What does it mean concretely ? What could be the interpretation behind ?
Q2) What would be the difference between
1) $$\forall \varepsilon>0, \lim_{n\to \infty }\mathbb P\{|X_n-X|\leq \varepsilon\}=1$$
2) $$\forall \varepsilon>0, \mathbb P\{\lim_{n\to \infty }|X_n-X|\leq \varepsilon\}=1$$
3) $$\mathbb P\{\forall \varepsilon>0, \lim_{n\to \infty }|X_n-X|\leq \varepsilon\}=1$$
4) $$\lim_{n\to \infty }\mathbb P\{\forall \varepsilon>0, |X_n-X|\leq \varepsilon\}=1.$$
I'm not really sure how to interpret all these four limits since they look almost the same for me. I can see that 1) is nothing more than the convergence in probability. If someone could explain me the difference between all these limit, it would help me very much to understand better those concept of convergence.
1) is convergence in probability as you say. It means that you can use $X_n$ to get an arbitrarily good confidence interval for $X$: you choose the width of your interval, and how confident you want to be, and I can find you an $n$ that does that.
2) and 3) are equivalent, and they are both equivalent to $$\mathbb P(\lim_{n\to\infty}|X_n-X|=0)=1$$ which is "almost sure convergence". The point here is that $\lim_{n\to\infty}|X_n-X|$ is a number that does not depend on $n$, so if it is less than $\epsilon$ for every $\epsilon>0$ then it must be $0$.
4) is a bit different. Again, $\forall \epsilon>0:|X_n-X|\leq \epsilon$ is simply the event $|X_n-X|=0$, so this requires that $\mathbb P(X_n=X)\to 1$.
You can certainly have 2)/3) without 4) - $X_n$ can tend to $X$ without ever being equal to it. But you can also have 4) without 2)/3): suppose $X_n=1$ with probability $1/n$ and $0$ otherwise, and $X=0$. Then $\mathbb P(X_n=X)\to 1$ but almost surely $X_n-X=1$ infinitely often, so $\mathbb P(X_n\to X)=0$.
The last example above also satisfies 1), so is an example of convergence in probability without almost sure convergence.
About your first question: basically convergence in probability is nothing else than $L^1$ convergence once you have decided to ignore high peaks (and here, you can choose the height threshold as small or as large as you please). In fact, the convergence in probability is equivalent to $$\forall M>0, \int(|X_n-X|\land M ) d\mathbb{P}\rightarrow0, n\rightarrow\infty$$ and also to $$\exists M>0, \int(|X_n-X|\land M ) d\mathbb{P}\rightarrow0, n\rightarrow\infty,$$ where $a\land b:=\min(a,b)$.
Edit: let me explain why you should fall in love with this result (if it isn't clear yet).
- It clarifies what is lacking in convergence in probability to be promoted to $L^1$ convergence: a control on how much mass you lose deciding not to look at high peaks. This leads instantly to a formulation of a uniform integrability condition, getting the result: convergence in probability + uniform integrability = convergence in $L^1$.
- As a first application: if you have a dominating $L^1$ random variable on the whole sequence, being the uniform integrability condition automatically satisfied, you instantly get the version of dominated convergence theorem where a.s. convergence is replaced by convergence in probability.
- Now that I shifted the usefulness of convergence in probability to the usefulness of $L^1$ convergence, you may very well wondering why on earth you should be interested in $L^1$ convergence. Let me just mention an example of this circle of ideas. It is a well known fact (see for example David Williams - Probability with martingales) that a martingale $(X_n)_{n\in\mathbb{N}}$ bounded in $L^1$ converges in probability to a random variable $X_\infty$ and so, if your martingale is also uniformly integrable, you get also that $(X_n)_{n\in\mathbb{N}}$ converges in $L^1$ to $X_\infty$. This immediatly gives you the result that $\forall n \in \mathbb{N}, \mathbb{E}(X_\infty | X_n)=X_n$, so a kind of harmonic analysis Poisson's reproducing formula holds for uniformly integrable martingales where, instead of convolving the boundary values with Poisson's kernel in order to reproduce your harmonic function, you reproduce your martingale taking the conditional expectation of its "boundary values", i.e. $X_\infty$.
Now let's prove the claims. To simplify the notation in order to do that, let's define $Y_n:=|X_n-X|$. What we have to prove then is the equivalence among:
$\forall\varepsilon>0, \mathbb{P}(Y_n>\varepsilon)\rightarrow0, n\rightarrow0$;
$\forall M>0, \int(Y_n\land M ) d\mathbb{P}\rightarrow0, n\rightarrow\infty$;
$\exists M>0, \int(Y_n\land M ) d\mathbb{P}\rightarrow0, n\rightarrow\infty$;
That 1) implies 2) is a simple consequence of the fact that 1) implies that for all $M>0$ the sequence $(Y_n\land M)_{n\in\mathbb{N}}$ converges in probability to zero and so, by the variant of the bounded convergence theorem where the convergence in probability takes the place of the a.s. convergence, we get 2).
The fact that 2) implies 3) is obvious.
Now assume 3). Suppose to get a contradiction that 1) doesn't hold. Then there exists $0<\varepsilon <M$, there exists $\delta>0$ and there exists a strictly increasing sequence of positive integers $(n_k)_{k\in\mathbb{N}}$ such that $$\forall k\in\mathbb{N}, \mathbb{P}(Y_{n_k}>\varepsilon)\ge\delta.$$ Then $$\forall k\in\mathbb{N}, \delta\le\mathbb{P}(Y_{n_k}>\varepsilon)\le\mathbb{P}(Y_{n_k}-(Y_{n_k}\land M)>\frac{\varepsilon}{2})+\mathbb{P}(Y_{n_k}\land M>\frac{\varepsilon}{2})\\\le\mathbb{P}(\{Y_{n_k}-(Y_{n_k}\land M)>\frac{\varepsilon}{2}\}\cap\{Y_{n_k}\ge M\})+\mathbb{P}(\{Y_{n_k}-(Y_{n_k}\land M)>\frac{\varepsilon}{2}\}\cap\{Y_{n_k}< M\})+\mathbb{P}(Y_{n_k}\land M>\frac{\varepsilon}{2})\\=\mathbb{P}(Y_{n_k}>M+\frac{\varepsilon}{2})+\mathbb{P}(\emptyset)+\mathbb{P}(Y_{n_k}\land M>\frac{\varepsilon}{2})\\\le\mathbb{P}(Y_{n_k}>M+\frac{\varepsilon}{2})+\frac{2}{\varepsilon}\int Y_{n_k}\land Md\mathbb{P}.$$ So $$\liminf_{k\rightarrow\infty}\mathbb{P}(Y_{n_k}>M+\frac{\varepsilon}{2})\ge\delta.$$ Then $$0=\lim_{k\rightarrow\infty}\int Y_{n_k}\land Md\mathbb{P}\ge\liminf_{k\rightarrow\infty}\int_{Y_{n_k}>M+\frac{\varepsilon}{2}} Y_{n_k}\land Md\mathbb{P}\\\ge\liminf_{k\rightarrow\infty}M\mathbb{P}(Y_{n_k}>M+\frac{\varepsilon}{2})\ge M\delta>0,$$ absurd. So 3) implies 1).
About your second question: the first is equivalent to convergence in probability, the second and the third are nothing else than convergence almost surely, while the fourth is something strange: it is strictly stronger then convergence in probability (e.g. $X_n=1/n$ converges to zero in probability but doesn't satisfies (4)), while it doesn't imply a.s. convergence (as the typewriter sequence of functions shows) neither it is implied by a.s. convergence (as $X_n=1/n$ shows). Basically it states that the set where $X_n$ and $X$ are equal grows in probability to $1$.
This is an interesting question and i have struggled to understand the various convergence modes myself. So I’ll try to give you the intuition i have gained so far.
Convergence in probability requires that $\forall \varepsilon>0, \,P(|X_n-X|> \varepsilon) \to 0$.
If you consider the x-y plane, this can be thought joint distribution of the vectors $(X,X_n)$ getting concentrated around the line y=x.
More precisely the region $|x-y|> \varepsilon$ is the region outside the band delimited by the two lines $y=x+\varepsilon$ and $y=x-\varepsilon$ (these are parallel lines to $y=x$ shifted upwards and downwards).
Then what you want is that the sequence of distributions $(X,X_n)$, when n increases, will disperd less and less probability on that region; conversely, accumulating more and more measure around the line $x=y$ (this sentence is not exactly right as the probabilities might not be monotonic but it’s good for intuition).
Now lets try to understand the difference with convergence almost surely. Even if you have convergence in probability – i.e. $(X,X_n)$ is getting concentrated around $x=y$ - it is still possible for the sequence $(X,X_n)_n$ to occasionally, but infinitely often, fall outside the band around the line.
Convergence almost surely provides that this doesn’t happen i.e. with probability one the sequence will get trapped in the stripe.
To give an example take $\Omega=[0,1]$ with the Lebesgue measure.
Consider the following sequence of events (marke them as $A_n$):
$[0,1]$,
$[0,\frac{1}{2}]$, $[\frac{1}{2},\frac{2}{2}]$,
$[0,\frac{1}{4}]$, $[\frac{1}{4},\frac{2}{4}]$, $[\frac{2}{4},\frac{3}{4}]$, $[\frac{3}{4},\frac{4}{4}]$,…
and so on.
If you take $X_n = I_{A_n}$, then $X_n$ goes to $0$ in probability, but $\forall \omega \in \Omega$ there is a subsequence where $X_n$ is constantly 1.
This convergence translates in the following equivalent expressions:
i) $P(\lim_{n \to \infty} X_n = X)=1$
ii) $\forall \varepsilon>0 , \, P(\bigcap_{n=1}^{\infty} \bigcup_{k=n}^{\infty} |X_k-X|>\varepsilon)=0$
iii) $\forall \varepsilon>0 ,\, P(\bigcup_{n=1}^{\infty} \bigcap_{k=n}^{\infty} |X_k-X|\leq \varepsilon)=1$.
Finally, the expression $\lim_{n\to \infty }P(\forall \varepsilon>0, |X_n-X|\leq \varepsilon)=1$, as Especially Lime says, is equivalent to $\lim_{n\to \infty } P(|X_n-X|=0)=1$.
You can easily see that $P(|X_n-X| \leq \varepsilon) \geq P(|X_n-X|=0)$, this implies covergence in probability. However, the opposite is not true, for example $X_n=\frac{1}{n}$.
Why is this? This expression $\lim_{n\to \infty } P(|X_n-X|=0)=1$ meauseres only when $X_n$ gets exactly equal to $X$ and not arbitrarilly close as every other convergences requires.
So effectivelly is a convergence in probability only for those sub cases missing all the cases when $X_n$ get close to $X$ even though it doens't get there exactly.