Flip a coin until a head comes up. Why is "infinitely many tails" an event we need to consider?

Added: In this answer I take the sample space to be the full "Bernoulli space" $\mathcal{B} = \{T,H\}^{\infty}$. As Qiaochu Yuan points out in a comment to my other answer, this is a valid way to go: we identify a finite sequence $T^n H$ with the subset of infinite sequences with initial segment $T^n H$, and with respect to the natural (product) measure on $\mathcal{B}$, this event has probability $2^{-n-1}$, as it should. This is a more complicated space than the discrete space appearing in my other answer, but it is arguably more natural: the other space was contrived to answer a very specific problem about Bernoulli trials, whereas this space is the space of all Bernoulli trials.

You could ask the same question about any element of your sample space: "Why should we bother to have $(H,T,T,H,T,H,H,H,T,\ldots)$ in it -- after all, it only occurs with probability zero."

In this case, every singleton set occurs with probability zero, but if you took them all out you would have no sample space!

Based on your other recent question, I think you should start learning about measures and countable additivity. This has been the mathematical underpinning of probability theory for almost $80$ years.

Note also that the Bernoulli space $\mathcal{B}$ has a beautiful structure -- it can be viewed as $(\mathbb{Z}/2\mathbb{Z})^{\infty}$ and thus endowed with both a group structure and a compatible topology, under which it becomes a compact, totally disconnected abelian topological group. You wouldn't just start pulling points out of a compact topological group, would you? That will ruin everything...


I had previously written an answer which got a fair number of votes. In retrospect, this was a nice answer, but not to this question! (Added: I changed my mind again after reading Qiaochu Yuan's comment. Apologies for leaving two somewhat lengthy answers to the same question. Maybe it will be "educational"?)

The point here is that the sample space here is much simpler than the "Bernoulli space" of countably infinite $\{H,T\}$-valued sequences. This sample space consists of elements $H,TH,TTH,\ldots,T^n H,\ldots$ -- i.e., one element for each $n \in \mathbb{N}$, together with -- perhaps -- the one infinite sequence of all tails.

In particular this sample space is countably infinite. If one is interested only in countably infinite sample spaces then there is no need to bring out measure-theoretic ideas: it is enough to enumerate the elements of the space and assign each element a non-negative real numbered probability in such a way so that the sum of all of these numbers is $1$. Here the element $T^n H$ -- i.e., first $n$ tails, then heads -- should be assigned the probability $\frac{1}{2^{n+1}}$. If we sum over all these singleton probabilities we get $\sum_{n=0}^{\infty} \frac{1}{2^{n+1}} = 1$.

In particular, if we want to put the last point $TTTTT\ldots$ (infinitely many tails) into our space, then we need to assign it probability zero. If you have a discrete probability space then it's true that you don't lose anything by omitting singleton elements of probability zero, any more than you lose anything by omitting zero terms from an infinite series. On the other hand you don't have to omit it: it's legal to put it in and assign it probability zero. Although it won't make any mathematical difference, I would favor doing this from a modelling perspective, because after all flipping a coin infinitely many times and having it come up tails every time is not logically impossible: it just happens that the probability we want to assign to it is zero.


In my opinion it buys us nothing. Let the random variable $X$ be the number of tosses until the first head. If the probability of a tail is anything other than $1$, the sum of the probabilities of $X=n$, as $n$ ranges over the positive integers, is equal to $1$. So "infinitely many tails," if we were interested in it, would have probability $0$. Since it has probability $0$, and all other possible values of $X$ have non-zero probability, we can just forget about this "possibility."


The number of coin tosses until the first head is not a well-defined notion, because of that singular event. Compare it to "the number of coin tosses until the last head".

On the other hand, you could restrict yourself to the set of samples where this random variable $X$ is finite, and then you could ask $\mathbb{E}[X|B]$, where $B$ is the event "$X<\infty$". Since $\Pr[B]=1$, we can ignore the conditioning without effecting the result.

More formally, the r.v. $X$ cannot really attain the value $\infty$. Instead, you need to define it somehow on the tails sample. The exact way won't effect anything since this is a zero-probability event. In fact, usually r.v. are only defined "up to measure zero" anyhow. So the value at the tails sample is "canceled out".