Do runs of every length occur in this string?

sequences-and-series

In reference to the strings defined here (constructed by repeatedly appending the last "half" of the current string), consider the particular infinite string $s$ generated by starting with $\text{abc}$:

$$\begin{align} \quad &\text{abc}\\ &\text{abcbc}\\ &\text{abcbccbc}\\ &\text{abcbccbcccbc}\\ &\cdots\\ &\text{______________________________}\\ s = \ &\text{abcbccbcccbcbcccbccbcbcccbccccb...} \end{align} $$

More formally, the general rewriting rule is $$a_0 a_1 \cdots a_{n-1} \ \ \to \ \ a_0 a_1 \cdots a_{n-1} a_{\left\lfloor\frac{n}{2}\right\rfloor } a_{\left\lfloor\frac{n}{2}\right\rfloor+1} \cdots a_{n-1}. $$ Clearly,

$\text{a}$ occurs only in the initial position,
$\text{b}^k$ occurs infinitely often for $k=1$, but never occurs for $k\ge2$,

and one may conjecture that

$\text{c}^k$ occurs for every $k\ge 1$ (and hence, infinitely often for each $k\ge 1$).

How to prove or disprove the conjecture?

Some possibly-relevant facts:

Computations show that the index of the first occurrence of $\text{c}^k$ is as follows, for some small $k$: $$\begin{align} &\text{substring} \quad & \text{index}\\ &\text{c} &\text{2}\\ &\text{cc} &\text{4}\\ &\text{ccc} &\text{7}\\ &\text{cccc} &\text{26}\\ &\text{ccccc} &\text{27308}\\ &\text{cccccc} &\approx 10^{519}\\ &\text{ccccccc} &? \ (\gt 10^{40677})\\ \end{align} $$ (The exact index for the first occurrence of $\text{c}^6$ is a $520$-digit number, and $\text{c}^7$ does not occur in the first $10^{40677}$ terms of $s$.)
Let $L_n$ be the length of the $n$th intermediate string in the generating process illustrated above. Then $$L_{n+1} = L_n + \left\lfloor\frac{L_n + 1}{2}\right\rfloor,\ \ L_0 = 3.$$ Hence, $L_n$ grows exponentially: $$L_n \gtrsim 3(\frac{3}{2})^n.$$
Let $s_n$ be the $n$th intermediate string, and let $t_n$ be the $n$th appended string (so $s_n = s_{n-1} t_n$). Now, every intermediate string ends with $\text{bc}$, so $\text{c}^k$ would first occur only when some $t_n$ begins with $\text{c}^{k-1}$, in which case $s_n= s_{n-1} t_n$ will contain the first occurrence of $\text{c}^k$ beginning at index $L_{n-1}-1$.
After $\text{c}^k$ first occurs, the number of instances of $\text{c}^k$ grows approximately exponentially in the number of iterations, as does the length, and the ratio $$p_k = \frac{\text{number of instances of c}^k\text{ on the }n\text{th iteration}}{L_n} \approx \frac{1}{\text{index of the first occurrence of c}^k} $$ is approximately a constant independent of $n$. Since $\text{c}^{k+1}$ first occurs when one of these instances of $\text{c}^k$ happens to begin the "last half" of an intermediate string, this may be compared to a sequence of Bernoulli trials, each with success probability $p_k$. For such a process, the expected number of trials to get the first success is just $1/p_k$, so the index of the first occurrence of $\text{c}^{k+1}$ would be compared to $$\frac{2}{3} L_{n_k + 1/p_k} \approx 2(\frac{3}{2})^{n_k + i_k} $$ where $n_k$ is the number of iterations to get the first occurrence of $\text{c}^k$, and $i_k$ is the corresponding index. E.g., the first-occurrence index of $\text{c}^6$ would be compared to $2(\frac{3}{2})^{n_5 + i_5} = 2(\frac{3}{2})^{22 + 27308} \approx 10^{4813}$ (when in fact it is approximately $10^{519}$). Similarly, the first-occurrence index for $\text{c}^7$ (if it exists) would be compared to $2(\frac{3}{2})^{n_6 + i_6} \approx 10^{10^{518}}$. These comparisons are quite poor, but may help to understand how the first-occurrence indices can be so enormous.

Solution 1:

Okay, to the OP of this question. Since you requested a computer program.

Below is a string based program written in Javascript that is able to calculate index of $cccc$ in your string. See for yourself It does construct your string.

input = 'abc';
m=20;
for (i = 0; i < m; i++) {
l = input.length;
n=(l/2);
input = input + input.substring(n, l);
}
var index=input.search(/cccc/i);
alert(index)

For $m=20$ it constructs a string where $cccc$ is at the index $26$,$ccc$ is at $7$,$cc$ is at $4$.

For $m=30$ it constructs a string where $ccccc$ is at the index of $27308$. (which follows your conjecture).

For $m=40$ it constructs a string, where $cccccc$ is not there and it returns $-1$, to denote that search found nothing.

Intrestingly, when I increase m to $45$, it exceeds the computational power and does not return anything on the browser (I am using chrome on win 7 64-bit). And this is because the browser storage for string as per this answer is about 5 MB which is a string of 2,621,440 characters.

You can however, will get success in re-writing this code in Java because it supports maximum string length of $2^{31}-1$ possibly over 2 billion as per this answer.

But, for that you need at least $1024 MB$ of heap size. I firmly believe your conjecture can be proved :).

Good luck!

PS: I did not had success on recreating this on Java, since I am not an expert. However, I will return to repost the solution in Java.

Do runs of every length occur in this string?

Solution 1:

Related

Recent Posts