a question related to two competing patterns in coin tossing

Solution 1:

Let $q$ be the probability of tails. Then the answers are

$$1) \frac{q}{1+q}$$

$$2) \frac{1+pq+p^2q+p^2q^2}{p^3q(1+q)}.$$

The ideas and notation in the following derivation are courtesy of Section 8.4 ("Flipping Coins") of Concrete Mathematics. See that section for more details and examples.

Imagine this as a contest in which Player A wins if HTHTH appears first and Player B wins if HTHH appears first. Let $S_A$ and $S_B$ denote the sum of the winning sequences of tosses for HTHTH and HTHH, respectively. (This is overloading the addition operator, but it works.) Let $N$ denote the sum of the sequences in which neither HTHTH nor HTHH has occurred yet. ($N$ includes the empty sequence.) Thus $$S_A = \text{HTHTH + THTHTH + HHTHTH + TTHTHTH + THHTHTH + HHHTHTH} + \cdots,$$ $$S_B = \text{HTHH + THTHH + HHTHH + TTHTHH + THHTHH + HHHTHH} + \cdots,$$ $$N = \text{1 + H + T + HH + HT + TH + TT} + \cdots.$$

Then, by substituting $p$ for H and $q$ for T, $S_A$, $S_B$, and $N$ become the probability that $A$ wins, the probability that $B$ wins, and the expected number of tosses until the game ends, respectively. The first two claims are straightforward. The third follows from the fact that if $X$ is a nonnegative random variable then $E[X] = \sum_{k=0}^{\infty} P(X > k).$

As sums of sequences, we also have $$N \text{ HTHTH} = S_A + S_A \text{ TH} + S_A \text{ THTH} + S_B \text{ THTH},$$ $$N \text{ HTHH} = S_A \text{ H} + S_A \text{ THH} + S_B + S_B \text{ THH}.$$

The idea behind the first equation is that the sequences of tosses obtained by appending HTHTH to $N$ must be a sequence of winning tosses for $A$ or $B$ plus possibly some extra tosses. In particular, each sequence in $N$ HTHTH must be exactly one of 1) a winning sequence for $A$, 2) a winning sequence for $A$ followed by TH, 3) a winning sequence for $A$ followed by THTH, or 4) a winning sequence for $B$ followed by THTH. These are all possibilities because they represent all ways in which part of the sequence HTHTH can begin (i.e., overlap) another occurrence of HTHTH or (in 4) in which HTHH can begin an occurrence of HTHTH. The idea behind the second equation is similar.

Then, substituting $p$ for H and $q$ for T we have $$N p^3q^2 = S_A + S_A pq + S_A p^2q^2 + S_B p^2q^2,$$ $$N p^3q = S_A p + S_A p^2 q + S_B + S_B p^2q.$$

Simultaneously solving these two equations with $S_A + S_B = 1$ yields $$S_A = \frac{q}{1+q},$$ $$S_B = \frac{1}{1+q},$$ $$N = \frac{1+pq+p^2q+p^2q^2}{p^3q(1+q)}.$$


Update, in response to OP's question in the comments: Why does substituting $p$ for H and $q$ for T and the fact that $E[X] = \sum_{k=0}^{\infty} P(X > k)$ for a nonnegative random variable all mean that $N$ is the expected number of tosses until the game ends?

Let $X$ be the number of tosses until the game ends. $P(X > 0)$ is just 1. $P(X > 1)$ is the probability that we haven't seen one of the patterns yet with a single toss; i.e., H + T, with $p$ subbed in for H and $q$ for T. $P(X > 2)$ is the probability that we haven't seen one of the patterns yet with two tosses, which is HH + HT + TH + TT, with $p$ subbed in for H and $q$ for T. $P(X > 3)$ is..., etc. This doesn't get interesting until we look at $P(X > 4)$, which would include all four-length sequences of H and T except for HTHH. Adding all those sequences in which neither HTHH nor HTHTH has occurred yet is exactly $N$, and subbing in $p$ for H and $q$ for T in $N$ thus gives $E[X]$.

See also the solution to Problem 8.21 in Concrete Mathematics, p. 578 (second edition).

Solution 2:

There is a direct, and rather automatic, way to compute the probability to hit A=HTHTH first rather than B=HTHH.

Both motives begin by HTH hence one can wait until HTH first appears. Then, either (1) the next letter is H, or (2) the two next letters are TH, or (3) the two next letters are TT. If (1) happens, B won. If (2) happens, A won. If (3) happens, one has to wait for more letters to know who won. The important fact in case (3) is that, since the last letters are TT, A or B must be entirely produced again.

Hence, $p_B=p_1+p_3p_B$ and $p_A=p_2+p_3p_A$, where $p_i$ for $i=$ 1, 2 and 3, is a shorthand for the conditional probability that ($i$) happens starting from the word HTH. Since $p_1=p$, $p_2=qp$ and $p_3=q^2$, one gets $p_B=p_1/(1-p_3)=p/(1-q^2)$, hence $$ p_B=1/(1+q),\quad p_A=q/(1+q). $$ Similarly, a standard, and rather automatic, way to compute the mean number of tosses before this happens is to consider a Markov chain on the state space made of the prefixes of the words one wishes to complete.

Here, the states are 0 (for the empty prefix), 1=H, 2=HT, 3=HTH, B=HTHH, 4=HTHT and A=HTHTH. The transitions are from 0 to 1 and 0, from 1 to 2 and 1, from 2 to 3 and 0, from 3 to B and 4 and from 4 to A and 0. The transitions from B and from A are irrelevant. The next step is to compute $n_s$ the number of tosses needed to produce A or B starting from any state $s$ amongst 0, 1, 2, 3 and 4, knowing that one is in fact only interested in $n_0$.

The $n_s$ are solutions of a Cramér system which reflects the structure of the underlying Markov chain: $$ n_0=1+pn_1+qn_0,\quad n_1=1+pn_1+qn_2,\quad n_2=1+pn_3+qn_0, $$ $$ n_3=1+qn_4,\quad n_4=1+qn_0. $$ Solving this system of equations backwards, that is, going from the last equation back to the first one, yields $n_3$ in terms of $n_0$, then $n_2$ in terms of $n_0$, then $n_1$ in terms of $n_0$, and finally an equation for $n_0$ alone, which yields Mike's formula for $n_0$, namely: $$ n_0=\frac{1+pq+p^2q+p^2q^2}{p^3q(1+q)}. $$ An accessible reference for these techniques (in the context of genomic sequence analysis) is the book DNA, Words and Models by Robin, Rodolphe and Schbath, at Cambridge UP.