coin toss question

Solution 1:

The problem can be mapped on a one-dimensional discrete random walk. Imagine that you start at 0 and you make a step +1 (for the outcome HT) with probability $1/4$, a step -1 (for the outcome TH) with probability $1/4$, and a step 0 (for the outcomes TT and HH) with probability $1/2$. The question about the number of coin tosses after which both have accumulate the same number of heads is then equivalent to asking after how many steps the random walk returns to the origin.

Let us introduce the random variable $Z_i$ at step $i$ with value +1 (for HT), -1 (for TH), and 0 in the other cases. As described above, it holds that $P(Z_i = 1) =1/4$, $P(Z_i = -1) =1/4$, and $P(Z_i = 0) =1/2$. Furthermore, introduce $S_n = \sum_{i=1}^n Z_i$. $S_n$ denotes the number of heads which player 1 has obtained minus the number of heads which player 2 has obtained after $n$ steps. The sequence $\{S_n\}$ is called random walk. The question about both players having the same number of heads corresponds to the problem of the first return to the origin.

Let us first solve a simpler problem: what is the probability $P_n$ to return after $n$ steps? We should have the same number of $Z$'s having +1 and -1. Each step we have 3 outcomes $Z= \pm 1,0$. In order to return to the origin, we need to have $m$ times $+1$, $m$ times $-1$, and $n-2m$ times $0$ ($0 \leq m \leq n/2$); this event has probability $$ \frac{n!}{m! m! (n-2m)!} P(Z_i = 1)^m P(Z_i = 1)^m P(Z_i = 0)^{n-2m} = \frac{n!}{(m!)^2 (n-2m)! 2^{n+2m}} .$$ The probability to return after $n$ steps is thereby given by $$ P_n = \sum_{m=0}^{\lfloor n/2 \rfloor} \frac{n!}{(m!)^2 (n-2m)! 2^{n+2m} }= {2n \choose n}2^{-2n}.$$ What remains to be done is to obtain the probability for the first return.

There is a quite general relation between the probability $P_n$ for a return after $n$ steps and the probability $\pi_n$ for the first return, $$ P_n= \pi_1 P_{n-1} + \cdots + \pi_n P_0.$$ The interpretation is simple: to return at step $n$ either you have the first return after one step and then return again after $n-1$ steps, or you return the first time after the second step and then again after $n-2$ steps, ..., or you return the first time after $n$ steps ($P_0$ = 1).

The right hand side is a convolution of two probability distributions. Therefore, it is convenient to introduce the generating functions $P(x) = \sum_{n=0}^\infty P_n x^n$ and $\Pi (x) = \sum_{n=0}^\infty \pi_n x^n$ (we set $\pi_0=0$). Multiplying the aforementioned equation by $x^n$ and summing over $n$, we obtain the relation $P(x) = 1 + P(x) \Pi(x)$ (remember $\pi_0 =0$ and $P_0 =1$). We can solve this equation for $$\Pi(x) = \frac{ P(x) -1 }{P(x)}.$$ All what remains to be done is to evaluate $P(x)$. This can be done (binomial theorem) with the result $$ P(x) = \frac{1}{\sqrt{1-x}}$$ and therefore we obtain $$\Pi(x) = 1- \sqrt{1-x}.$$ Expanding this result in a Taylor series, we obtain the probability for the first return (the probability that the game stops after $n$ rounds) $$ \pi_n = \frac{{2n \choose n} }{(2n -1) 2^{2n}} \quad n\geq1.$$

The expected number of rounds can be also calculated $$ \bar n = \sum_{n=1}^\infty \pi_n n \to \infty,$$ i.e., it diverges. The reason is that the asymptotic behavior of $\pi_n$ is given by $$\pi_n \sim \frac{1}{2 \sqrt{\pi} n^{3/2}}.$$

Edit:

The question has arisen about the evaluate the sum leading to $P_n$. Borrowing from the solution of Mike there is an alternative way to get $P_n$ (and thereby proving that the sum really is what I wrote). Instead of looking at the game as having $n$ rounds, we can think that the game has $2n$ rounds where we split the tosses of player 1 and player 2. If player 1 has H then we move $+1/2$ for tail $-1/2$. If player 2 has T then we move $+1/2$ for tail $-1/2$. After two rounds this leads to the same random walk as I described above (HH and TT no move, HT move $+1$, TH move $-1$). The probability $P_n$ to return after $n$ rounds (of the original game) is therefore equivalent to return after $2n$ rounds in the new game. There are $2n \choose n$ possible paths which return after $2n$ rounds because we can choose which $n$ steps which move up (H for player 1 and T for player 2) and in the rest of the steps we have to move down. In total there are $2^{2n}$ paths of length $2n$. The probability to return at the origin after $2n$ steps ($n$ steps in the original game) is given by $$ P_n = {2n \choose n} 2^{-2n}.$$

Solution 2:

The probability of the game ending after $n$ rounds is $\frac{2}{4^n}C_{n-1}$, where $C_n = \frac{1}{n+1} \binom{2n}{n}$, the $n$th Catalan number, and the expected number of rounds is infinite.


To see that the probability is $\frac{2}{4^n}C_{n-1}$, we use the interpretation of the $n-1$ Catalan number as the number of Dyck paths from $(0,0)$ to $(n,n)$ that lie strictly below and do not touch the diagonal $y=x$ except at $(0,0)$ and $(n,n)$.

Assume Player 1 has more heads except at the beginning and at the end. Let $P1(k)$ denote the number of heads Player 1 has after $k$ rounds and $P2(k)$ denote the number of heads Player 2 has after $k$ rounds. Let $Z_k = P1(k) - P2(k)$. Let $(x_k,y_k)$ denote the position on the lattice after $2k$ steps in a Dyck path. Let $X_k = x_k - k$.

Associate the outcomes of the coin flips (H = heads, T = tails, with Player 1's outcome first) with two steps in a Dyck path (R = right, U = up) in the following fashion:

HT $\Leftrightarrow$ RR

HH $\Leftrightarrow$ RU

TT $\Leftrightarrow$ UR

TH $\Leftrightarrow$ UU

We have the following:

  1. If the flips are HT, then $Z_k$ increments by 1. With an RR move, $X_k$ increments by 1.
  2. If the flips are HH, then $Z_k$ doesn't change. With an RU move, $X_k$ doesn't change.
  3. If the flips are TT, then $Z_k$ doesn't change. With a UR move, $X_k$ doesn't change.
  4. If the flips are TH, then $Z_k$ decrements by 1. With a UU move, $X_k$ decrements by 1.

$Z_0 = X_0 = 0$, and both the game and the Dyck path end when $Z_n = X_n = 0$.

So we see that there is a bijection between the set of ways for the game to end after $n$ rounds with Player 1 ahead except at the beginning and the end and the number of Dyck paths from $(0,0)$ to $(n,n)$ that lie strictly below and do not touch the diagonal $y=x$ except at $(0,0)$ and $(n,n)$. Thus the number of ways for the game to end after $n$ rounds with Player 1 ahead except at the beginning and the end is $C_{n-1}$. Double that to account for Player 2 being ahead except at the beginning and the end, and then obtain the probability of ending after $n$ rounds by dividing by $4^n$, the number of possible sequences of flips after $n$ rounds.


To see that the expected number of rounds is infinite, let $X_n$ denote the absolute value of the difference between the number of heads and the number of tails obtained by the two players by the $n$th round. Then $X_n$ is a discrete-time Markov chain in which the transition probabilities are $p_{00} = \frac{1}{2}$, $p_{01} = \frac{1}{2}$, and, for $i \geq 1$, $p_{i,i+1} = p_{i, i-1} = \frac{1}{4}$, $p_{ii} = \frac{1}{2}$. We want $M_0$, the mean recurrence time for state 0.

Such Markov chains are well-studied; see, for example, Example 5 in these notes. There the author solves the more general case in which $p_{00} = 1- \lambda$, $p_{01} = \lambda$, and, for $i \geq 1$, $p_{i,i+1} = \lambda(1-\mu), p_{i, i-1} = \mu (1 - \lambda)$, $p_{ii} = \lambda \mu + (1 - \lambda)(1 - \mu)$.

The solution is that if $\lambda < \mu$, then $$M_0 = \frac{1}{1 - \frac{(1 - \mu)\lambda}{(1- \lambda) \mu}},$$ and if $\lambda \geq \mu$, then $M_0 = \infty$. (As joriki points out in the comments, the author's expression for $\pi_0$ should actually be the expression for $M_0$.)

Since we have $\lambda = \mu = \frac{1}{2}$, $M_0 = \infty$.

Solution 3:

The problem is symmetric with the respect to the two players. Treating the first round separately, we have probability $1/2$ to stop at $n=1$ (both $0$ heads or both $1$ head), and probability $1/2$ to continue with a difference of $1$ in the head counts. It will be convenient to denote by $q_n$ the conditional probability that we end after round $n$ if the first step left us with a difference of $1$. The overall probability to end in round $n$ will then be given by $1/2$ for $n=1$ and by $q_{n-1}/2$ for $n>1$.

The sign of the difference cannot change since we stop when it becomes zero, so we can assume without loss of generality that it's non-negative. Thus we are interested in the probability distribution $p_n(k)$, where $n$ is the round and $k$ is the difference in head counts. This is a Markov chain with an absorbing state at $k=0$. The evolution of the probabilities is given by

$$p_{n+1}(k)=\frac{1}{4}p_n(k-1)+\frac{1}{2}p_n(k)+\frac{1}{4}p_n(k+1)$$

for $k>1$ and $p_{n+1}(1)=\frac{1}{2}p_n(1) + \frac{1}{4}p_n(2)$ for $k=1$. Let's ignore the complication at the origin for a bit and just treat the recursion relation for general $k$. This is a linear operator being applied to the sequence $p_n (k)$ to obtain the sequence $p_{n+1}(k)$, and the eigenfunctions of this linear operator are the sequences $e^{ik\phi}$ with $-\pi < \phi \le \pi$ (since other values of $\phi$ just reproduce the same sequences). We can obtain the corresponding eigenvalue $\lambda(\phi)$ from

$$\lambda(\phi) e^{ik\phi}=\frac{1}{4}e^{i(k-1)\phi}+\frac{1}{2}e^{ik\phi}+\frac{1}{4}e^{i(k+1)\phi}\;,$$

$$\lambda(\phi) =\frac{1}{4}e^{-i\phi}+\frac{1}{2}+\frac{1}{4}e^{i\phi}\;,$$

$$\lambda(\phi)=\frac{1+\cos\phi}{2}=\cos^2\frac{\phi}{2}\;.$$

We have to combine the sequences $e^{ik\phi}$ and $e^{-ik\phi}$ into sines and cosines to get real sequences. Here's where the boundary at $k=0$ comes into play again. The equation $\lambda(\phi) p_n(1)=\frac{1}{2}p_n(1) + \frac{1}{4}p_n(2)$ provides an additional condition that selects a particular linear combination of the sines and cosines. In fact, it selects the sines, since this equation differs only by the term $\frac{1}{4}p_n(0)$ from the general recursion relation, and they can only both be satisfied if this term would have been zero anyway, which is the case for the sines.

Since we know the time evolution of the eigenfunctions (they are multiplied by the corresponding eigenvalue in each round), we can now decompose our initial sequence, $p_1(1)=1$ and $p_1(k)=0$ for $k\neq1$, into sines and write its time evolution as the sum of the time evolution of the sines. Thus,

$$p_1(k)=\int_0^\pi f(\phi) \sin (k\phi)\mathrm{d}\phi$$

for $k \ge 1$, and we obtain our initial sequence by taking $f(\phi)=(2/\pi)\sin\phi$. Then the time evolution is given by

$$p_n(k)=\frac{2}{\pi}\int_0^\pi \sin\phi \sin (k\phi)\left(\frac{1+\cos\phi}{2}\right)^{n-1}\mathrm{d}\phi\;.$$

Now the probability $q_n$ that we end after round $n$ is just the probability $p_n(1)$ times the probability $1/4$ that we move from $k=1$ to $k=0$, and so

$$q_n=\frac{1}{2\pi}\int_0^\pi \sin^2\phi \left(\frac{1+\cos\phi}{2}\right)^{n-1}\mathrm{d}\phi\;.$$

According to Wolfram, this is

$$q_n=\frac{\Gamma(n+1/2)}{\sqrt{\pi}\,\Gamma(n+2)}\;,$$

which I presume could be simplified for integer $n$. We can check that the $q_n$ sum up to $1$:

$$\sum_{n=1}^\infty q_n=\frac{1}{2\pi}\int_0^\pi \sin^2\phi \sum_{n=1}^\infty\left(\frac{1+\cos\phi}{2}\right)^{n-1}\mathrm{d}\phi$$

$$=\frac{1}{2\pi}\int_0^\pi \frac{\sin^2\phi}{1-\frac{1+\cos\phi}{2}}\mathrm{d}\phi$$

$$=\frac{1}{\pi}\int_0^\pi \frac{\sin^2\phi}{1-\cos\phi}\mathrm{d}\phi$$

$$=\frac{1}{\pi}\int_0^\pi \frac{\sin^2\phi}{1-\cos^2\phi}\left(1+\cos\phi\right)\mathrm{d}\phi$$

$$=\frac{1}{\pi}\int_0^\pi 1+\cos\phi\,\mathrm{d}\phi$$

$$=1\;.$$

We can also try to find the first moment of this probability distribution:

$$\sum_{n=1}^\infty n q_n=\frac{1}{2\pi}\int_0^\pi \sin^2\phi \sum_{n=0}^\infty n\left(\frac{1+\cos\phi}{2}\right)^{n-1}\mathrm{d}\phi$$

$$=-\frac{1}{\pi}\int_0^\pi \sin\phi \sum_{n=0}^\infty \frac{\mathrm{d}}{\mathrm{d}\phi}\left(\frac{1+\cos\phi}{2}\right)^n\mathrm{d}\phi$$

$$=\frac{1}{\pi}\int_0^\pi \cos\phi \sum_{n=0}^\infty \left(\frac{1+\cos\phi}{2}\right)^n\mathrm{d}\phi$$

$$=\frac{2}{\pi}\int_0^\pi \frac{\cos\phi}{1-\cos\phi}\mathrm{d}\phi\;.$$

This integral diverges at the limit $\phi=0$ (which corresponds to the sequence wandering off to large $k$), and thus as Mike had already pointed out the expected number of rounds is infinite.