Can someone explain the Borel-Cantelli Lemma?

I’m looking for an informal and intuitive explanation of the Borel-Cantelli Lemma. The symbolic version can be found here.

What is confusing me is what ‘probability of the limit superior equals $ 0 $’ means.

Thanks!

Let $\{E_n\}$ be a sequence of events. Each event $E_i$ is a collection of outcomes. The limit superior of the collection $\{E_n\}$ is the collection of all those outcomes that appear in infinitely many events. The Borel Cantelli Lemma says that if the sum of the probabilities of the $\{E_n\}$ are finite, then the collection of outcomes that occur infinitely often must have probability zero.

To give an example, suppose I randomly pick a real number $x \in [0,1]$ using an arbitrary probability measure $\mu$. I then challenge my (infinitely many) friends to guess a subset $E_n$ of $[0,1]$ containing $x$. If infinitely many friends guess correctly, I have to buy pizza for everyone. To make sure they don't just guess the whole interval, I make them pay for choosing large subsets -- the cost of choosing a set $E$ is $\mu(E)$. I check my bank account at the end of the guessing and find that I have only made a finite amount of money, i.e. $\sum \mu(E_n) < \infty$ (cheap friends!). Is it likely I will have to buy everyone pizza? No! Because the set $E_{\infty}$ of numbers that infinitely many friends agreed on has measure zero, so my random number $x$ has probability zero of being in there.

Here is a proof I wrote up for sci.math, but never posted:

Borel-Cantelli: Suppose that $\sum_{i=1}^\infty P(A_i)$ is finite, then the probability that infinitely many of the $A_i$ occur is $0$.

Proof: Let $B_k$ be the event $\cup_{i=k}^\infty A_i$ for $k=1,2,\ldots,$. If $x$ is in the event $A_i$'s i.o., then $x\in B_k$ for all $k$. So $x\in \cap_{k=1}^\infty B_k$.

Conversely, if $x\in B_k$ for all $k$, then we can show that $x$ is in $A_i$'s i.o. Indeed, $x\in B_1 = \cup_{i=1}^\infty A_i$ means that $x\in A_{j(1)}$ for some $j(1)$. However $x\in B_{j(1)+1}$ implies that $x\in A_{j(2)}$ for some $j(2)$ that is strictly larger than $j(1)$. Thus we can produce an infinite sequence of integer $j(1)< j(2)< j(3)<\ldots$ such that $x\in A_{j(i)}$ for all $i$.

Let $E$ be the event $\{x:\, x\in A_i \mbox{ i.o.}\}$. We have $$ E = \bigcap_{k=1}^\infty \bigcup_{i=k}^\infty A_i\tag{1} $$ From $E\subseteq B_k$ for all $k$, it follows that $P(E)\leq P(B_k)$ for all $k$. By union bound, we know that $P(B_k)\leq \sum_{i=k}^\infty P(A_i)$. So $P(B_k)\rightarrow 0$, by the hypothesis that $\sum_{i=1}^\infty P(A_i)$ is finite. Therefore, $P(E)=0$.

For events, $\bigcap\approx\inf$ and $\bigcup\approx\sup$. Since $$ \limsup_{n\to\infty}a_n=\operatorname*{\inf\vphantom{p}}_{n\ge1}\sup_{k\ge n}a_k\tag{2} $$ it makes sense to call $$ \limsup_{n\to\infty}A_n=\bigcap_{n\ge1}\bigcup_{k\ge n} A_k\tag{3} $$

In infinite probability spaces the probability of an event being $0$ does not mean it can't happen. This can be confusing, but such is life. Consider for instance flipping a fair coin infinitely many times. What is the probability that in all these flips you'll get heads? Answer: 0 (the probability of all flips heads must be less than the probability of first $n$ flips being heads, but that is equal precisely to $(1/2)^n$, so the probability of all heads is smaller then $(1/2)^n$ for all $n$, which means it's equal to $0$). It can of course happen, but with probability $0$.

In finite probability spaces an event having probability $0$ can safely be interpreted as "can't happen". But in infinite probability spaces you need to refine your intuition.

If I understand your question correctly, this is really what you are asking about and not so much about Borel-Cantelli. Just to make one more comment then, Borel-Cantelli says that if the total sum of the probabilities of events is finite (i.e., small) then the probability for such events occurring infinitely often is $0$. It can happen, but it's bloody unlikely. For some events to occur infinitely often it must be the case that the total sum of individual probabilities is large (i.e., infinity). The Wiki page you cited gives several examples. I hope this helps.

How can you derive $\sin(x) = \sin(x+2\pi)$ from the Taylor series for $\sin(x)$?

How does Gödel Completeness fail in second-order logic?

Is it simply a coincidence that if you differentiate the formula for the volume of sphere you get the formula for the surface area of sphere? [duplicate]

Reference to self-study Abstract Algebra and Category Theory

Why König's lemma isn't "obvious"?

Why formulate continuity in terms of pre-images instead of image?

How many ways are there to fill a 3 × 3 grid with 0s and 1s?

Practical uses of matrix multiplication

The number of partitions of $n$ into distinct parts equals the number of partitions of $n$ into odd parts

All derivatives zero at a point $\implies$ constant function?

The set $\{1,2,3,\ldots,n\}$, where $n \geq 5$, can be divided into two subsets so that the sum of the first is equal to the product of the second

Does symbol "$+$" denote an operation in the notation of a complex number: "$a+ib$"? In case it does, which operation does "$+$" denote?