Interesting behavior of the $2$-tag system $\{a\rightarrow abc,b\rightarrow ab,c\rightarrow c\}$

Consider the following $2$-tag system:

$$\begin{cases}a\rightarrow abc\\b\rightarrow ab\\c\rightarrow c\end{cases}$$

In brief, a $2$-tag system is a Turing-complete string substitution system wherein you check the current first letter against the rules given, append the corresponding letters (e.g. $a$ maps to $abc$) to the end of the string, and then remove the first two letters from the string. Repeat until string disappears or cycle is detected.

I assert that for any starting string $(abc)^{k}$, this system will always cycle back to $(abc)^k$.

e.g. for $k=1$:

abc
  cabc
    bcc
      cab
        bc
          ab
            abc

I am fairly certain this is true but can't prove it.

If anyone can provide a proof, counterexample, or persuasive argument for or against its validity, I'll consider this answered.

I also suspect there's no easy way to calculate the period of the cycle and would welcome input on that too. For reference, the first 25 periods (courtesy of r.e.s.) are:

$\{6, 24, 78, 96, 63, 1030, 6558, 3936, 5922, 75756, 40474, 1594320, 79630, 7166907, 837353,\\1311364, 9620370, 6835198, 577674197, 2092070640, 784506057, 1116341675, 282429536478, 11933733565, 169161110854\}$

My gut sense is that this is not really a Collatz-level problem; the fact that it's in a fairly well-behaved bounded cycle suggests to me that by gluing a few of these rules together properly, it should be provable that $(abc)^k$ necessarily repeats. That said, my gut is often bad at number theory.


Some observations

The number of $b$'s is fixed at $k$, and they're always preceded by $a$, unless it's a $b$ at the start of the string. At each end of the string, and between every $b$ and $a$, there may be between $0$ and $2$ $c$'s, and the way they float around and interact with parity seems like what drives the cycles. Those constraints can be confirmed by establishing that, bottom line, the only unique substrings you can generate within this system are $\{ab,abc,abcc\}$. All possible strings reachable from $(abc)^k$ will be some combination of those pasted together, possibly with a wraparound truncation to start. You can confirm this via inspection or with the "table method" described in De Mol (2010).

This provides a strict lower bound of $2k/3$ for the length of the shortest word in the cycle (note that $(ab)^k$ always appears), with $4k/3$ as the upper bound on word length. We can take a stab at an upper bound on period by treating $b$ as dividing the string into $k+1$ partitions, each of which contains $0$ to $2$ $c$'s, yielding at most $3^{k+1}$ unique configurations. Since the combination of leading and trailing $c$'s can't exceed two, we eliminate the possibilities $\{ccScc,cScc,ccSc\}$. This leaves us with an upper bound of $$f(k) := 3^{k+1}-3.$$

This seems to work well for small $k$ and provides the exact period for $k \in \{1,2,3,7,12,23\}$, which is as high as I can confirm.


We write $U = ab, V = abc, W = abcc$.


Definition:

  • A nonempty string will be called "regular" if it is the concatenation of finitely many copies of $U, V, W$.

  • If $X$ is any regular string, then we write $X'$ for the string obtained from $X$ by removing the leading $a$. Such a string will be called "quasi-regular".

  • A "normal" string is a string that is regular or quasi-regular.


The system can then be translated to a system on normal strings.

More precisely, we have the following rules (*): \begin{eqnarray} UX &\rightarrow& XV\\ VX &\rightarrow& X'W\\ WX &\rightarrow& XW\\ U'X &\rightarrow& X'U\\ V'X & \rightarrow& XU\\ W'X & \rightarrow& X'V \end{eqnarray} where $X$ denotes any regular string.

Note that this rule is injective: from the resulting string, we can deduce what the original string is.


Definition:

  • The "weight" of a regular string is the total number of $U, V, W$ in the string.
  • The weight of a quasi-regular string is the weight of the regular string it comes from.

(Equivalently, the weight of any normal string is just the number of $b$'s appearing in it.)


It is obvious from the above rules that the weight of a normal string does not change after applying the rules.

Therefore, starting from any normal string, the generated system can have only finitely many possible strings.

Since the procedure continues forever, the system is eventually periodic.

However, the rule being injective, the system must be purely periodic, hence will eventually return to the initial status.

This works for any normal string. In particular, it works for the string $V^k$.


(*)There is an exception when $X$ is the empty string, which we exclude here.