Distribution of the number of trials required for the first occurrence of the event 50 S containing at least one SSSSS.

Solution 1:

This answer is a generating function approach based upon the Goulden-Jackson Cluster Method. We will show for $5\leq r\leq k$: \begin{align*} \color{blue}{P(M_r=k)}&\color{blue}{=\left(\sum_{j\geq 1}(-1)^{j+1}\binom{k-r+1}{j}\binom{k-5j}{k-r}\right.}\\ &\qquad\color{blue}{\left.-\sum_{j\geq 1}(-1)^{j+1}\binom{k-r}{j}\binom{k-1-5j}{k-1-r}\right)p^rq^{k-r}}\tag{1} \end{align*} where the sums in (1) are finite since $\binom{s}{t}=0$ for integral $0<s<t$.

First step: A generating function

We consider the set of words of length $k\geq 0$ built from an alphabet $$\mathcal{V}=\{S,F\}$$ and the set $B=\{SSSSS\}$ of bad words, which are not allowed to be part of the words we are looking for in a first step. We will derive a generating function $G(z)$ where $[z^k]G(z)$, the coefficient of $z^k$ of $G(z)$ gives the number of binary words of length $k$ which do not contain $SSSSS$.

Since we want the number of words which do contain $SSSSS$, we take the generating function of all binary words which is $1+2z+4z^2+8z^3+\cdots = \frac{1}{1-2z}$ and subtract $G(z)$ from it. This way we get $H(z) = \frac{1}{1-2z}-G(z)$.

According to the paper (p.7) the generating function $G(z)$ is \begin{align*} G(z)=\frac{1}{1-dz-\text{weight}(\mathcal{C})}\tag{2} \end{align*} with $d=|\mathcal{V}|=2$, the size of the alphabet and $\mathcal{C}$ the weight-numerator of bad words with \begin{align*} \text{weight}(\mathcal{C})=\text{weight}(\mathcal{C}[SSSSS]) \end{align*}

We calculate according to the paper \begin{align*} \text{weight}(\mathcal{C}[S^5])&=-z^5-z\cdot \text{weight}(\mathcal{C}[S^5])-\cdots-z^4\cdot\text{weight}(\mathcal{C}[S^5])\tag{3}\\ \end{align*}

and get \begin{align*} \text{weight}(\mathcal{C})=\text{weight}(\mathcal{C}[S^5])=-\frac{z^5(1-z)}{1-z^5} \end{align*}

It follows from (2) and (3):

\begin{align*} G(z)&=\frac{1}{1-dz-\text{weight}(\mathcal{C})}\\ &=\frac{1}{1-2z+\frac{z^5(1-z)}{1-z^5}}\\ &=\frac{1-z^5}{1-2z+z^6}\tag{4}\\ \end{align*}

From (4) we obtain \begin{align*} H(z) = \frac{1}{1-2z}-\frac{1+z^5}{1-2z+z^6}\tag{5} \end{align*}

Second step: A refinement

Since we are looking for the number of valid words of length $k$ which contain $50 S$ resp. $r\geq 5$ S in general, we need a refinement of $H(z)$ to keep track of the number of successes $S$. In order to do so we mark successes with $s$.

We obtain from (3) \begin{align*} \text{weight}(\mathcal{C}[S^5])&=-(sz)^5-(sz)\text{weight}(\mathcal{C}[S^5])-\cdots-(sz)^4\text{weight}(\mathcal{C}[S^5]) \end{align*} and get \begin{align*} \text{weight}(\mathcal{C})=-\frac{(sz)^5(1-sz)}{1-(sz)^5} \end{align*}

Using this generalized weight we obtain a generating function $H(z;s)$ \begin{align*} H(z;s)&=\frac{1}{1-(1+s)z}-\frac{1}{1-(1+s)z+\frac{(sz)^5(1-sz)}{1-(sz)^5}}\\ &=\frac{1}{1-(1+s)z}-\frac{1-(sz)^5}{1-(1+s)z+s^5z^6} \end{align*}

Third step: Words terminating with success $S$.

The coefficient $[s^rz^k]H(z;s)$ gives the number of words of length $k$ containing exactly $r$ S with S-runs of length $5$, but not necessarily with an S at the end. To force this we subtract the words of length $k$ which contain exactly $r$ S and S-runs of S of length $5$ and terminate with $F$.

This way we get finally the wanted generating function \begin{align*} \color{blue}{H(z;s)(1-z)}&\color{blue}{=\frac{1-z}{1-(1+s)z}-\frac{\left(1-(sz)^5\right)(1-z)}{1-(1+s)z+s^5z^6}}\tag{6}\\ &=s^5z^5+(s+f)s^5z^6+\left(s^2+3sf+f^2\right)s^5z^7\\ &\qquad+\left(s^3+5s^2f+5sf^2+f^3\right)s^5z^8\\ &\qquad+\left(s^4+7s^3f+\color{blue}{12}s^2f^2+7sf^3+f^4\right)s^5z^9+\cdots \end{align*} where the last line was calculated with the help of Wolfram Alpha.

Note the coefficients of the series correspond with the table entries stated by @GCab.

Looking for instance at the blue marked coefficient $12$ of $s^7f^2z^9$ this gives the number of words of length $9$ containing $7$ S at least one run of $5$ S and ending with S. These words are \begin{align*} \color{blue}{SSSSS}SFFS&\qquad SSFF\color{blue}{SSSSS}\\ \color{blue}{SSSSS}FSFS&\qquad SFSF\color{blue}{SSSSS}\\ \color{blue}{SSSSS}FFSS&\qquad SFFS\color{blue}{SSSSS}\\ SF\color{blue}{SSSSS}FS&\qquad FSSF\color{blue}{SSSSS}\\ FS\color{blue}{SSSSS}FS&\qquad FSFS\color{blue}{SSSSS}\\ F\color{blue}{SSSSS}FSS&\qquad FFSS\color{blue}{SSSSS} \end{align*} where the right-most run of $5$ S is marked blue.

Coefficients of $H(z;s)(1-z)$:

We finally calculate the coefficients of $H(z;s)(1-z)$. We start with

\begin{align*} [s^rz^k]H(z;s) &=[s^rz^k]\frac{1}{1-(1+s)z}-[s^rz^k]\frac{1}{1-(1+s)z+\frac{(sz)^5(1-(sz))}{1-(sz)^5}}\\ &=[s^rz^k]\frac{1}{1-(1+s)z}-[s^rz^k]\frac{1-(sz)^5}{1-(1+s)z+s^5z^6} \end{align*}

At first the easy part:

\begin{align*} \color{blue}{[s^rz^k]\frac{1}{1-(1+s)z}}=[s^rz^k]\sum_{j=0}^{\infty}(1+s)^jz^j =[s^r](1+s)^k \,\,\color{blue}{=\binom{k}{r}}\tag{7} \end{align*}

Now the somewhat longish part. We obtain

\begin{align*} \color{blue}{[s^rz^k]}&\color{blue}{\frac{1}{1-(1+s)z+s^5z^6}}\\ &=[s^rz^k]\sum_{j=0}^\infty\left((1+s)z-s^5z^6\right)^j\\ &=[s^rz^k]\sum_{j=0}^\infty z^j\left((1+s)-s^5z^5\right)^j\\ &=[s^r]\sum_{j=0}^k[z^{k-j}]\sum_{l=0}^j\binom{j}{l}(-1)^ls^{5l}z^{5l}(1+s)^{j-l}\tag{8}\\ &=[s^r]\sum_{j=0}^k[z^j]\sum_{l=0}^{k-j}\binom{k-j}{l}(-1)^ls^{5l}z^{5l}(1+s)^{k-j-l}\tag{9}\\ &=[s^r]\sum_{j=0}^{\left\lfloor k/5\right\rfloor}[z^{5j}]\sum_{l=0}^{k-5j}\binom{k-5j}{l}(-1)^ls^{5l}z^{5l}(1+s)^{k-5j-l}\tag{10}\\ &=[s^r]\sum_{j=0}^{\left\lfloor k/5\right\rfloor}\binom{k-5j}{j}(-1)^js^{5j}(1+s)^{k-6j}\tag{11}\\ &=\sum_{j=0}^{\min\{\left\lfloor k/5\right\rfloor, \left\lfloor r/5\right\rfloor\}}(-1)^j\binom{k-5j}{j}[s^{r-5j}](1+s)^{k-6j}\\ &=\sum_{j\geq 0}(-1)^j\binom{k-5j}{j}\binom{k-6j}{r-5j}\tag{12}\\ &\,\,\color{blue}{=\sum_{j\geq 0}(-1)^j\binom{k-r}{j}\binom{k-5j}{k-r}}\tag{13} \end{align*}

Comment:

  • In (8) we apply the rule $[z^s]z^tA(z)=[z^{s-t}]A(z)$. We also set the upper limit of the outer sum to $k$ since other values do not contribute to the coefficient of $z^k$.

  • In (9) we change the order of summation of the outer sum $j \to k-j$.

  • In (10) we observe we have to take multiples of $5$ only of the index $j$ due to the term $z^{5l}$.

  • In (11) we select the coefficient of $z^{5j}$.

  • In (12) we select the coefficient of $s^{r-5j}$.

  • In (13) we use the binomial identity \begin{align*} \binom{k-5j}{j}\binom{k-6j}{r-6j}&=\frac{(k-5j)!}{j!}\,\frac{1}{(r-6j)!(k-r-j)!}\\ &=\frac{1}{j!(r-6j)!}\,\frac{(k-5j)!}{(k-r)!}=\binom{r-5j}{j}\binom{k-5j}{k-r} \end{align*}

and it follows from (6) and (13): \begin{align*} [s^rz^k]&\frac{1-(sz)^5}{1-(1+s)z+s^5z^6}\\ &=\left([s^rz^k]-[s^{r-5}z^{k-5}]\right)\frac{1}{1-(1+s)z+s^5z^6}\\ &=\sum_{j\geq 0}(-1)^j\binom{k-r}{j}\binom{k-5j}{k-r}-\sum_{j\geq 0}(-1)^j\binom{k-r}{j}\binom{k-5-5j}{k-r}\\ &=\binom{k}{r}+\sum_{j\geq 1}\binom{k-r}{j}\binom{k-5j}{k-r} +\sum_{j\geq 1}(-1)^j\binom{k-r}{j-1}\binom{k-5j}{k-r}\\ &=\binom{k}{r}+\sum_{j\geq 1}(-1)^j\binom{k-r+1}{j}\binom{k-5j}{k-r}\tag{14} \end{align*}

and we obtain \begin{align*} \color{blue}{[s^rz^k]H(z;s)}&=[s^rz^k]\frac{1}{1-(1+s)z}-[s^rz^k]\frac{1-(sz)^5}{1-(1+s)z+s^5z^6}\\ &\,\,\color{blue}{=\sum_{j\geq 1}(-1)^{j+1}\binom{k-r+1}{j}\binom{k-5j}{k-r}}\tag{15} \end{align*}

Last step: Putting all together

We consider the (interesting) case $5\leq r\leq k$ only. Taking the results from (6) and (15) we can now write the coefficients of $H(z;s)(1-z)$ as

\begin{align*} [s^rz^k]&H(z;s)(1-z)\\ &=[s^rz^k]H(z;s)-[s^rz^{k-1}]H(z;s)\\ &=\sum_{j\geq 1}(-1)^{j+1}\binom{k-r+1}{j}\binom{k-5j}{k-r}\\ &\qquad-\sum_{j\geq 1}(-1)^{j+1}\binom{k-r}{j}\binom{k-1-5j}{k-1-r} \end{align*} and the claim (1) follows.

Solution 2:

a) Your approach to deduce the recurrence is correct, the problem is to fix the appropriated initial conditions and bounds of validity.

b) To the purpose of solving that clearly we need to proceed as follows.

Given a sequence of Bernoulli trials, with probability of success $p$ (failure $q=(1-p)$), allow me to represent that by a binary string $1 = $ success, $0 =$ failure so as to keep congruence with other posts I am going to link to.
For the same reason and for putting your recurrence with proper initial conditions, allow me to change your denominations and consider

the binary strings of length $n$, having $m$ zeros and $s$ ones, including a one which is fixed at the end of the string;
also let' go general and consider run of consecutive ones of length $r$.

We indicate as $$ P(s,r,n) = N_c (s,r,n)\, p^{\,s } q^{\,n - s} $$ the probability that in a string of length $n$, with total $s$ ones and terminating with a one, there might be runs of consecutive ones of length $r$ or greater.

Now your recurrence reads $$ \eqalign{ & P(s,r,n) = q\,P(s,r,n - 1) + pq\,P(s - 1,r,n - 2) + p^{\,2} q\,P(s - 1,r,n - 2) + \cr & + \cdots + p^{\,4} q\,P(s - r + 1,r,n - r) + \left( \matrix{ n - 1 - r \hfill \cr s - 1 - r \hfill \cr} \right)p^{\,s} q^{\,n - s} \cr} $$

Note that each term is a homogeneous polynomial in $p^s\, q^{n-s}$, so we do not need to bring them around and we can profitably concentrate on the number of strings given by $N_c$, that is $$ \bbox[lightyellow] { \eqalign{ & N_c (s,r,n) = \cr & = \left\{ {\matrix{ 1 & {\left| \matrix{ \;0 \le r \le s \hfill \cr \;1 \le s = n \hfill \cr} \right.} \cr {\sum\limits_{k = 0}^{r - 1} {N_c (s - k,r,n - 1 - k)} + \binom{ n - r - 1 }{ s - r - 1 } } & {\left| \matrix{ \;0 \le r \le s \hfill \cr \;1 \le s < n \hfill \cr} \right.} \cr 0 & {{\rm otherwise}} \cr } } \right. \cr} } \tag{1}$$

Regarding the conditions,

  • the case $s=n$ was not covered in the construction, and must be added;
  • because of the one in last position, $s$ shall be greater than $1$;
  • the remaining are obvious.

The recurrence above has been checked with direct computation for the smaller values of the parameters.
Example:
Nb_s_ones_r_cons_2

c) The recurrence (1) can solved in a closed form as a finite sum.

Consider in fact the strings of this type

Nb_s_ones_r_cons_1

Their total number is $\binom{n}{s}$ and those having a run of length $r$ or greater are $N_c (s+1,r, n+1)$. Therefore, the complement of $N_c$ will represent the strings of the same architecture, which have runs up to $r-1$.

The number of strings composed as above but excluding the last one, which have runs of length up to $r-1$ is given by $$ N_b (s,r - 1,m + 1) $$ where $$ N_b (s,r,m)\quad \left| {\;0 \leqslant \text{integers }s,m,r} \right.\quad = \sum\limits_{\left( {0\, \leqslant } \right)\,\,k\,\,\left( { \leqslant \,\frac{s}{r+1}\, \leqslant \,m} \right)} {\left( { - 1} \right)^k \binom{m}{k} \binom { s + m - 1 - k\left( {r + 1} \right) } { s - k\left( {r + 1} \right)}\ } $$ as explained in various posts, refer mainly to this and to this other one.

But because of the presence of the one at the end, we have to deduct from the above the strings which end in zero plus $ r-1$ ones, giving a final run of $r$.
These are $$ N_b (s-r+1,r - 1,m ) $$ and we conclude that $$ \bbox[lightyellow] { \eqalign{ & N_c (s + 1,r,n + 1) = N_c (s + 1,r,s + m + 1) = \cr & = \left( \matrix{ s + m \cr s \cr} \right) - N_b (s,r - 1,m + 1) + N_b (s - r + 1,r - 1,m) = \cr & = \left( \matrix{ n \cr s \cr} \right) - N_b (s,r - 1,n - s + 1) + N_b (s - r + 1,r - 1,n - s) \quad \left| {\;0 \le s,r,m} \right. \cr} } \tag{2}$$

$N_b$ is more present in literature, has plenty of recurrent relations, and a simple o.g.f. . Not to make the answer too long, I am not going further into details.

d) Summing on $n$.

Consider the strings composed as shown in the sketch in para. c) above.

Their total number is $\binom {s+m}{s} = \binom {n}{s}$ and each has the same probability $p^{s+1}\, q^m = p^{s+1}\, q^{n-s}$.

Keeping $n$ fixed, and summing over $s$ we get $$ \sum\limits_{\left( {0\, \le } \right)\,s\,\left( { \le \,n} \right)} {\binom{ n }{ s } p^{\,s + 1} q^{\,n - s} } = p\left( {p + q} \right)^{\,n} = p $$ which is obvious, since if we add the complementary strings ending in zero we get $(p+q)^{n+1} =1$.

Keeping instead $s$ fixed and summing on $n$, which means to sum on $m$, gives $$ \eqalign{ & \sum\limits_{\left( {0\, \le \,s\, \le } \right)\,n\,} {\binom{ n }{s} p ^{\,s + 1} q^{\,n - s} } = \sum\limits_{\left( {0\, \le } \right)\,\,m\,} {\binom{ s + m }{m} p^{\,s + 1} q^{\,m} } = \cr & = p^{\,s + 1} \sum\limits_{\left( {0\, \le } \right)\,\,m\,} {\binom{ - s - 1 }{m} \left( { - 1} \right)^{\,m} q^{\,m} } = {{p^{\,s + 1} } \over {\left( {1 - q} \right)^{\,s + 1} }} = 1 \cr} $$ which is the Negative Binomial distribution.

Since, by its combinatoric meaning we have $$ \left\{ \matrix{ 0 \le N_c (s + 1,r,n + 1) \le N_c (s + 1,r - 1,n + 1) \le \binom{n }{ s } \hfill \cr N_c (s + 1,s + 2,n + 1) = 0 \hfill \cr N_c (s + 1,s + 1,n + 1) = 1 \hfill \cr N_c (s + 1,1,n + 1) = \binom{n }{ s } \hfill \cr N_c (s + 1,0,n + 1) = \binom{n }{ s } \hfill \cr } \right. $$ then $$ 0 \le P_c (s + 1,r,p) = \sum\limits_{\left( {0\, \le \,s\, \le } \right)\,n\,} {N_c (s + 1,r,n + 1)p^{\,s + 1} q^{\,n - s} } \le 1 \quad \left| \matrix{ \,0 \le s \hfill \cr \;0 \le r \le s + 1 \hfill \cr \;0 < p < 1 \hfill \cr} \right. $$ converges (albeit slowly), and given $s,p$, it is a CDF in $(s+1-r)$ (in case with a further shift of the support).

Unfortunately, as to my knowledge, the sum in $n$ of $N_c$ (and of $N_b$) does not have a closed form : re. to this already cited post.
It is possible however to compute, from (2), a double o.g.f. if you are interested in.