What is the difference between the weak and strong law of large numbers?

I don't really understand exactly what the difference between the weak and strong law of large numbers is.

Weak:

\begin{align*} \lim_{n \rightarrow \infty} \mathbb{P}[\mid \bar{X}_n - \mu \mid \leq \epsilon ] = 1 \end{align*}

Strong:

\begin{align*} \mathbb{P}[\lim_{n \rightarrow \infty} \bar{X}_n = \mu ] = 1 \end{align*}

Isn't this a very subtle difference? Since I can chose my $\epsilon$ arbitrarily small I can write for $n \rightarrow \infty$

\begin{align*} \mid \bar{X}_n - \mu \mid \leq \epsilon \\ - \epsilon \leq \bar{X}_n - \mu \leq \epsilon \\ \mu - \epsilon \leq \bar{X}_n \leq \mu + \epsilon \end{align*}

Which of course means that as $\epsilon \approx 0$ should be the same as $\lim_{n \rightarrow \infty} \bar{X}_n = \mu$.

So: In what sense are those conditions actually "different"?


Regarding the weak law I'd like to know if these are actually the same:

\begin{align*} \lim_{n \rightarrow \infty} \mathbb{P}[\mid \bar{X}_n - \mu \mid > \epsilon] = \mathbb{P}[ \mid \lim_{n \rightarrow \infty} \bar{X}_n - \mu \mid > \epsilon] \end{align*}

I ask because the weak law always gets written like the l.h.s. but the strong law always has $\lim_{n \rightarrow \infty}$ inside the probability operator ..


Solution 1:

The weak law of large numbers refers to convergence in probability, whereas the strong law of large numbers refers to almost sure convergence.

We say that a sequence of random variables $\{Y_n\}_{n=1}^{\infty}$ converges in probability to a random variable $Y$ if, for all $\epsilon>0$, $\lim_n P(|Y_n-Y|>\epsilon)=0$.

We say that a sequence of random variables $\{Y_n\}_{n=1}^{\infty}$ converges almost surely to a random variable $Y$ if $\lim_n Y_n(\omega)=Y(\omega)$ for almost every $\omega$, that is, $P(\{\omega:\lim_nY_n(\omega)=Y(\omega)\})=1$.

Almost sure convergence implies convergence in probability, but the converse is not true (that is why the laws of large numbers are called strong and weak respectively). To see that the converse is not true, just consider discrete random variables $Y_n$ satisfying $P(Y_n=1)=1/n$ and $P(Y_n=0)=1-1/n$. Given $0<\epsilon<1$, $P(|Y_n|\leq\epsilon)=p(Y_n=0)=1-1/n\rightarrow 1$, so $Y_n\rightarrow 0$ in probability. However, as $\sum_n P(Y_n=1)=\infty$, by Borel-Cantelly lemma we have that, for almost every $\omega$, $Y_n(\omega)=1$ for infinitely many $n$'s. The sequence $\{Y_n\}$ does not converge almost surely.

Concerning your reasoning, the fact that $\lim_nP(|\bar{X}_n-\mu|\leq\epsilon)=1$ does not imply that, for large $n$, $|\bar{X}_n-\mu|\leq\epsilon$. In my previous example, you do not have $|Y_n|\leq\epsilon$ for every large $n$, as $Y_n=1$ for infinitely many $n$'s.

Solution 2:

I generally think of the following example. Consider the interval $\Omega = [0, 1]$ . This will be our sample space (a quite explicit one). Let us work with the uniform probability. A random variable is essentially a mapping from $\Omega$ to $\mathbb{R}$. We will define the random variables $X_n: [0, 1] \to \{0, 1\}$ as follows. For $\omega \in [0, 1] = \Omega$ $,X_n(\omega) = 1$ if $\omega \in A_n$, where $A_n$ is a problematic interval otherwise it is $0$.

This problematic interval will start like $A_1 = [0, 0.1], A_2 = [0.1, 0.2]$ it will shift to right keeping its length until it reaches the right limit, that is, $A_{10} = [0.9, 1]$. Then its size will halve and it will go to the left boundary, $A_{11} = [0, 0.05]$. It will continue like that. As you can see $P(X_n = 1) = P(A_n)$ which is the length of $A_n$. So by our construction $P(X_n = 1) \to 0$ that is, $P(X_n = 0) \to 1$ this is the weak convergence (convergence in probability). However, it is interesting to observe that for any outcome of the sample space, that is $\forall x \in [0, 1]$, the problematic interval $A_n$ will hit to your outcome $x$ infinitely many times. Therefore, for any outcome of your experiment, the value of $X_n$ will never converge to $0$. The "errors" ($X_n = 1$) will occur with decreasing frequency but they will always occur.

The strong law of large numbers says something different, it says that when you do an experiment (sample from the sample space), if you wait long enough, the value of $X_n$ will converge to $0$, unless (in some cases) you are extremely unlucky and sampled from a probability $0$ event. That would be the case if we had chosen for example $A_n = [0, 1/n]$. For all outcomes except $x = 0$ the sequence $X_n$ will converge to 0 eventually. The term extremely unlucky here stands for sampling $x = 0$.

Whereas in the weak law of large numbers, when you do a single experiment, however long you wait, you may not see convergence as the case in our example.

Solution 3:

The relative position of the symbol for probability with respect to the symbol for limit in their respective definitions:

$$ \begin{cases} \text{WLLN}: \color{blue}{\displaystyle \lim_{n\to \infty}}\color{red}{\Pr}\left(\vert \bar X_n - \mathbb E[X] \vert < \varepsilon \right) = 1\\[3ex] \text{SLLN}: \color{red}{\Pr}\left(\color{blue}{\displaystyle \lim_{n\to \infty}} \vert \bar X_n - \mathbb E[X]\vert =0\right) = 1 \end{cases}$$

Hence, in the WLLN, we are contemplating a sequence of probabilities indexed by the sample size: $\{\Pr_1, \Pr_2,\dots,\Pr_n \}:$ for any infinitesimally small $\varepsilon$ we may choose, the probability that the difference between sample mean and population mean (or expectation of the random variable) is even smaller than the chosen $\varepsilon$ will form a sequence of probability values, which will increase with $n$ because the sample size will approach the population size. This sequence converges to $1.$

In the SLLN, the concept is a sequence of differences between the sample mean of increasing size and the population mean. This difference vanishes to $0$ as $n$ goes to infinity with probability $1,$ or almost surely.

In a way, it is the difference between the assurance that something does happen (SLLN) versus the assurance that what we are after will happen with increasing probability (WLLN), accounting for the fact that SLLN $\implies$ WLLN, but not the other way around.