How do concepts such as limits work in probability theory, as opposed to calculus?

When I am flipping a fair coin and say that as the number of trials approaches $\infty$ the number of heads approaches $50\%$, what do I really mean?

Intuitively, I would associate it with the concept of a limit, as used in calculus:

$$ \lim_{t \to \infty} \left(\frac{H}{H+T}\right)=0.5 \\ \text{Where $t$ = the number of trials, $H$ = the number of heads, and $T$ = the number of tails} $$

However, this intuition seems to break down when I use the formal definition of a limit:

$$ \lim_{t \to \infty} \left(\frac{H}{H+T}\right)=0.5 \text{ if and only if}\\ \text{for every $\varepsilon>0$, there exists $N>0$ such that for all $t$} \\ \text{if $t>N$ then $|0.5-\frac{H}{H+T}|<\varepsilon$} $$

Well I don't know if there will be $N > 0$ that satisifes this definition! It all depends on what comes up. However many times I flip the coin, $\frac{H}{H+T}$ might just equal $0$. So how might I define terms such as "approaches" in probability theory if the conventional definition does not work?

Edit: User nicomezi has pointed out that this a huge topic. Therefore, I will accept even a very short introduction to this subject as an answer.


There are several notions of convergence in probability.

Your example is an instance of the law of large numbers which has a weak form and a strong form.

The weak form states that $H/(H+T)$ "converges in probability" to $0.5$. Formally, for any $\epsilon > 0$, $$\lim_{t \to \infty}P\left(\left|\frac{H}{H+T} - 0.5\right| > \epsilon\right) = 0.$$

The strong form states that $H/(H+T)$ "converges almost surely" to $0.5$. Formally, $$P\left(\lim_{t \to \infty} \frac{H}{H+T} = 0.5\right) = 1.$$ Almost sure convergence implies convergence in probability, hence the strong law implies the weak law (but is harder to prove).


Response to comment:

If you imagine flipping a coin many many times, then you can keep track of the sequence $\frac{H}{H+T}$ as $t$ increases. In your words, this sequence is random, since it "depends on what comes up," so in different parallel universes this sequence will be different. You are correct that it is possible that you always flip tails, in which case the sequence is $0,0,\ldots$. But given a particular sequence, you simply have a sequence of real numbers, so the limit in $\lim_{t \to \infty} \frac{H}{H+T}$ is the usual limit you are familiar with (which may not exist for some sequences). The law of large numbers states that with probability $1$, this sequence not only has a limit, but that limit is $0.5$. So scenarios like the one you mentioned (always flipping tails) happen with probability $0$.


In probability, you work with distributions. When you increase the number of trials and take the average, you get a distribution that tends to a Gaussian, with a narrower variance.

The conclusions that you can draw are based on this probabilistic inference. For an infinite number of drawings, the distribution tends to a Dirac delta.

enter image description here