Counting binary sequences with no more than $2$ equal consecutive numbers

I invented the following problem, but I can't solve it.

There are $n$ $1$'s and $n$ $0$'s and my question is what is the number of ways to arrange them avoiding $3$ equal consecutive numbers.

So for instance, $n=3$ gives 001011, 001101, 010011, 010101, 010110, 011001, 011010, and the same sequences that start with $1$, that's $14$ in total. The first values are: $2,6,14,34,84,208$. (starting with $n=1$) It seems to be almost an exponential function.

I started defining the function $N(a,b)$ which gives all possible sequences with $a$ zeroes and $b$ ones such that no $3$ consecutive numbers are equal, and that begin with a $0$. The same for $E(a,b)$ which gives all possible sequences with $a$ zeroes and $b$ ones such that no $3$ consecutive numbers are equal, and that begin with a $1$. It's not hard to see that $N(a,b)=E(b,a)$.

I found a kind of recursive formula for $N(a,b)$.

Such a sequence can start with $00$. The rest of it is a sequence that starts with a $1$, that gives $E(a-2,b)=N(b,a-2)$ possibilities.

If it starts with $01$, every tail is possible unless those that start with $11$. The question is what this number of sequences is that start with $11$. But because after these two $1$'s comes a $0$, it's simply $N(a-1,b-3)$.

Adding everything up gives $$N(a,b)=N(a-1,b-1)+N(b-1,a-1)+N(b,a-2)-N(a-1,b-3).$$

This is another approach of mine:

Every sequence consists a certain number of series of $1$'s and of $0$'s. Let $n$ be divisible by $2$ (The other case is almost the same.)

The number of series is at least $n/2$ and at most $n$. The number of $0$-series can be equal to or one more then the number of $1$-series. Because every series has at least one element, the total is $$N(a,a)=1+\sum_{k=n/2}^{n-1}\binom{k}{n-k}\left(\binom{k}{n-k}+\binom{k+1}{n-k-1}\right)$$ but this summation is not worth being called a solution. And I can't simplify it.

That's all I have found. Maybe these OEIS references may help you answering my question:

http://oeis.org/A177790

http://oeis.org/A003440

And here's a list of $N(a,b)$ for $a,b\leq 10$. ($a$ increases to the right and $b$ increases downwards)

$$\begin{array}{|c|c|} \hline b\backslash a&0&1&2&3&4&5&6&7&8&9&10\\ \hline 0&&1&1&0&0&0&0&0&0&0&0\\ \hline 1&0&1&2&2&1&0&0&0&0&0&0\\ \hline 2&0&1&3&5&5&3&1&0&0&0&0\\ \hline 3&0&0&2&7&12&13&9&4&1&0&0\\ \hline 4&0&0&1&6&17&29&33&26&14&5&1\\ \hline 5&0&0&0&3&16&42&71&84&72&45&20\\ \hline 6&0&0&0&1&10&42&104&175&214&196&136\\ \hline 7&0&0&0&0&4&30&110&259&434&545&527\\ \hline 8&0&0&0&0&1&15&86&286&648&1082&1389\\ \hline 9&0&0&0&0&0&5&50&241&741&1627&2709\\ \hline 10&0&0&0&0&0&1&21&156&663&1916&4098\\ \hline \end{array}$$

(I can always make it longer if someone asks me to.)

I'll complete the work begun by Christian Blatter:

Let $$ s(x,y) \;=\; \frac {\left(1+x+x^2\right) \left(1+y+y^2\right)} {1-\left(x+x^2\right) \left(y+y^2\right)} $$ as above.

Note, that this follows immediately from the fact that allowed words consist of a sequence of elements from {01, 011, 001, 0011} optionally preceded by 1 or 11 and followed by 0 or 00. Hence the class of allowed words is $$ \mathcal{S}\;=\; (\mathcal{E}+\mathcal{Z_1}+\mathcal{Z_1}^2) \times \text{SEQ}((\mathcal{Z_0}+\mathcal{Z_0}^2)\times(\mathcal{Z_1}+\mathcal{Z_1}^2)) \times (\mathcal{E}+\mathcal{Z_0}+\mathcal{Z_0}^2) $$ which translates directly to give $s(x,y)$.

If we let $$ g(x) \;=\; g(z,x) \;=\; s(x\sqrt{z},\sqrt{z}/x) \;=\; \frac {\left(x^2+x \sqrt{z}+z\right) \left(1+x \sqrt{z}+x^2 z\right)} {x^2 (1 - z - z^2) - x (1 + x^2) z \sqrt{z}},$$ then the constant term of $g(x)$ (i.e. coefficient of $x^0$ in the expansion of $g(x)$) gives the generating function $h(z)$ for sequences with the same number of $0$s and $1$s where the exponent of $z$ records the half-length. This is what we require.

Now, the constant term in a function $g(x)=g(z,x)$ is given by the sum of the residues of $g(x)/x$ at those poles $\alpha$ of $g(x)$ for which $\lim\limits_{z\rightarrow0}\alpha(z)=0$ (see e.g. Stanley, Enumerative Combinatorics II Sect. 6.3).

It simply remains to work through the details. There are two suitable poles (at $x=0$ and $x=\frac{1-z-z^2-\sqrt{1-2 z-z^2-2 z^3+z^4}}{2 z \sqrt{z}}$). Summing the residues of $g(x)/x$ at these values gives $$ h(z)\;=\; \frac{(1-z)^2 \sqrt{1+z+z^2}}{z^2 \sqrt{1-3 z+z^2 }} -\frac{1}{z^2}. $$ Since the radius of convergence of $h(z)$ is $\frac{1}{2} \left(3-\sqrt{5}\right)$, the growth rate is its reciprocal $\frac{1}{2} \left(3+\sqrt{5}\right) \approx 2.56126$.

The following is a first step to remove the combinatorics out of the way.

We work in the realm of formal power series. Let $x$, $y$ be indeterminates and assign any $01$-word with $r$ zeros and $s$ ones the "worth" $x^r y^s$. Denote by $f_0(x,y)$ and $f_1(x,y)$ the total worth of the allowed words ending in $0$, resp. $1$. Then $$f_0(x,y)=\bigl(1+f_1(x,y)\bigr)(x+x^2)\ ,\qquad f_1(x,y)=\bigl(1+f_0(x,y)\bigr)(y+y^2)\ .\qquad(*)$$ These equations can be explained as follows: You get any allowed word ending with $0$ by writing $0$ or $00$ after the empty word or after an allowed word ending with $1$. Thereby the worth of each affected word takes up a factor $x$ or $x^2$. The total worth of all words created in this way is the right side of the first equation. The second equation is explained similarly.

One can solve the equations $(*)$ for the $f_i(x,y)$. It follows that the total worth of all allowed words is given by $$s(x,y)=1 +f_0(x,y)+f_1(x,y)={(1+x+x^2)(1+y+y^2)\over1- (x+x^2)(y+y^2)}\ .$$ It remains to compute the coefficient $g_n$ of $x^n y^n$ in the Taylor series of $s(x,y)$. In his answer David Bevan has determined the generating function of the $g_n$, namely $$1+\sum_{n=1}^\infty g_n(z)= h(z)\;=\; \frac{(1-z)^2 \sqrt{1+z+z^2}}{z^2 \sqrt{1-3 z+z^2 }} -\frac{1}{z^2}\ .$$

It is not difficult to verify that the function $g(z):=h(z)-1$ satisfies the quadratic equation

$$(1-3z+z^2) g(z)\Bigl(1+z^2 +{z^2\over2}g(z)\Bigr)-2z\equiv0\ .$$

From this equation we can distill a recursion formula for the $g_n$: One has $$g_n=0\quad (n\leq 0),\quad g_1=2\ ,$$ and then $$g_n=3 g_{n-1}-2 g_{n-2}+3 g_{n-3}- g_{n-4}-{1\over2}\sum_{k=1}^{n-3}\bigl(g_k-3 g_{k-1}+g_{k-2}\bigr) g_{n-k-2}\qquad(n\geq2)\ .$$ This formula produces for $n\geq1$ the values $$2, 6, 14, 34, 84,208, 518, 1296, \ldots\quad,$$ as listed in http://oeis.org/A177790, quoted by the OP.

Here's some asymptotics from a (not very rigorous) probabilistic argument:

Let ${\bf x} = (x_i), \;i= 1,2, \dots 2m$, be $2m$ geometric iid variables with $p=1/2$, $p(x_i)=2^{-x_i}$, $x_i \in \{ 1,2,3, \dots\}$ (these correspond to the run lengths of 0s and 1s in the original problem).

Let $s_1=\sum_{i=1}^m x_i$ , $s_2=\sum_{i=m+1}^{2m} x_i$ be the sums of each half, and define the sets-events:

$A: s_1=n \cap s_2=n$

$C: x_i \le 2 \,\, \forall i$

Note that $p({\bf x})= 2^{-s}$, hence the points inside $A$ (fixed $s=s_1+s_2=2n$) are equiprobable.

Also note that the conditioned variable $x_i | x_i\le 2$ has mean $4/3$ and variance $2/9$.

Now,

$$ P(C| A) = \frac{P(A |C)\, P(C)}{P(A)}$$

where $P(C)= \left(3/4\right)^{2m} $

Because the set $A$ is composed of equiprobable points, we have $P(A) = 2^{-2n} |A |$

And because of the same fact (equiprobable points), $$P(C| A ) = \frac{|A \cap C|}{|A|}$$

Then $$ |A \cap C| = 4^n \left(3/4\right)^{2m} P(A|C) =N(n,m)$$

where $2 N(n,m)$ counts all the valid configurations with $m$ runlengths of each color (factor 2 is to account for symmetry of colours). We are interested in

$$N(n)=\sum_m 4 N(n,m)$$

(the extra 2 factor accounts -approximation that could be refined- for the configurations which runlengths differ in one).

First approximation: $P(A|C)$ is peaked around $ 4/3 m=n$. Just replacing we get

$$ \log N(n) \approx \frac{n}{2} \log (27/4) $$

but this is very rough.

Better approximation:

For each fixed $m$, the sum of each half can be approximated (CLT) by a normal $ \mathcal{N}(\frac{4}{3}m,\frac{2}{9}m) $ Hence

$$P(A|C) \approx \left( \frac{1}{\sqrt{2 \pi 2 m/9}} \exp{[-\frac{(3n-4m)^2}{4m}}]\right)^2$$

We approximate the sum on $m$ by a integral, and substitute $m=r n$.

$$N(r) \approx \alpha 4^n \int \frac{1}{r} (3/4)^{rn} e^{-n\frac{(3-4r)^2}{2r}} dr= \alpha \int \frac{1}{r} exp\left(n \left[ b + 2 a r -\frac{(3-4r)^2}{2r} \right] \right) dr= $$

with $a=\log(3/4)$, $b=\log(4)$. Applying Laplace method, the function inside the exponential has maximum at $r_0=\frac{3}{2\,\sqrt{4-\mathrm{log}\left( \frac{3}{4}\right) }}=0.7244...$ and its value is $$f_0 = \mathrm{log}\left( 4\right) -6\,\sqrt{4-\mathrm{log}\left( \frac{3}{4}\right) }+12 =0.96226302631297$$

Then,

$\log(N(n)) = 0.96226302631297 n - \frac{1}{2}\log(n) + o(\log(n))$

Some computed exact values of $log(N)$ and the error of the approximation above (and the "zero order" one.

  n    exact      apr    apr0
 10    9.0114  -0.5401   0.5363
 20   18.3564  -0.609    0.739
 30   27.802   -0.6347   0.8411
 40   37.2947  -0.6486   0.8962
 50   46.8149  -0.6578   0.9237
100   94.6049  -0.6812   0.8722
150  142.5286  -0.6945   0.6871
250  238.5197  -0.7147   0.1731
350  334.5956  -0.7325  -0.4257
450  430.7133  -0.7496  -1.0662
550  526.8560  -0.7663  -1.7318
650  623.0153  -0.7828  -2.414
750  719.1864  -0.7992  -3.108

Update: a much simpler (and perhaps more precise) asymptotic can be obtained by using $N(n) \approx 2 \sum {k \choose n-k}^2$ with ${m \choose p m} \approx \exp(m H(p))$, changing sum by integral and using the same substitution and Laplace method. It gives the same result as David's answer.

Counting binary sequences with no more than $2$ equal consecutive numbers

Related

Recent Posts