Computational verification of Collatz problem

Solution 1:

Edit 2019-08-30:

Added algorithm in pseudocode

Edit 20190831

added Python code and description and reference to C implementation on codereview

The Collatz function is defined as $$ \text{collatz}(n):=\begin{cases} 3n+1,& n\equiv 1 \pmod 2 \\ \frac n 2, & n \equiv 0 \pmod 2 \end{cases}$$

A trajectory of n with respect to a function $f$ or an $f$-trajectory of $n$ is the sequence $$n, \;f(n), \;f(f(n)), \;f^3(n),\;\ldots$$

A subsequence of such a trajectory I will call a subtrajectory.

We are interested if the collatz-trajectory of a positive integer $n$ is either unbounded or if it will cycle. At the moment the trajectories of all numbers investigated so far will cycle. The cycle for all these number is the cycle $4,2,1,4,..$

If a trajectory cycles then a subtracjectory must contain identical values and vice versa.

We define now the following function that is related to the Collatz function: $$ \text{c}(n):=\begin{cases} \frac {3n+1} 2, & n\equiv 1 \pmod 2 \\ \frac n 2, & n \equiv 0 \pmod 2 \end{cases}\tag{1.1} $$ A c-trajectory of $n$ will be a Collatz-subtrajectory of $n$.

Instead of the $c$-trajectory of $n$ $$n, c(n), c^2(n),\ldots$$ we can construct a new sequence

$$n+1, c(n)+1, c^2(n)+1, \ldots$$

This is a trajectory with respect to the function $d$

$$d(n):=c(n-1)+1\tag{2.1}$$

$$\begin{array} 27&41&62&31&47&71&\ldots\\ 28&42&63&32&48&72\ldots \end{array}\tag{2.2}$$

From $(2.1)$ follows

$$c(n)=d(n+1)-1$$ and by induction one can prove $$d^k(n)=c^k(n-1)+1\tag{2.3}$$ $$c^k(n)=d^k(n)-1$$

From $(1.1)$ and $(2.1)$ we get $$ \text{d}(n):=\begin{cases} \frac{n+1} 2,& n\equiv 1 \pmod 2 \\ \frac {3n} 2, & n \equiv 0 \pmod 2 \end{cases}\tag{2.4}$$

From $c$ and $d$ we can generate new functions

$$c^+(n)=\begin{cases} \frac{3n+1}2 , & n\equiv 1 \pmod 2 \\ \frac n {2^k},& n=2^ka, k>0, a\equiv 1\pmod 2 \end{cases} $$

$$d^+(n)=\begin{cases} \frac{n+1}2 , & n\equiv 1 \pmod 2 \\ \left(\frac {3} {2}\right)^kn,& n=2^ka, k>0, a\equiv 1\pmod 2 \end{cases} $$

We can rewrite this definitions as

$$c^+(n)=\begin{cases} c(n) , & n\equiv 1 \pmod 2 \\ c^k(n),& n=2^ka, k>0, a\equiv 1\pmod 2 \end{cases} $$

$$d^+(n)=\begin{cases} d(n) , & n\equiv 1 \pmod 2 \\ d^k(n),& n=2^ka, k>0, a\equiv 1\pmod 2 \end{cases} $$

and we see that $c^+$-trajectories are $c$-subtrajectories and $d^+$-trajectories are $d$-subtrajectories.

Finally we define

$$T(n)=\begin{cases} c^+(n) , & n\equiv 1 \pmod 2 \\ c^+(d^+(n+1)-1),& n\equiv 1\pmod 2 \end{cases} $$

An again we have that a trajectory of $T$ is a subtrajectory of $c$. If $n$ is odd this is trivial, if $n$ is even then $$T(n)=c^+(d^+(n+1)-1)=c^+(d^{k_1}(n+1)-1)=c^+(c^{k_1}(n))=c^{k_2}(c^{k_1}(n))=c^{k_2+k_1}(n)$$

The function $T$ is the function that you use for your calculations.

The following algorithm assumes that $k$ is a positive integer and $u$ is an odd positive integer. There are two different variables $n_c$ and $n_d$ instead of one variable to show which values are from the trajectory of $c$ and therefore of the Collatz function and which values are from the trajectory of $d$ and therefore from the sequence that we get by adding $1$ to the trajectory values of the Collatz function. The termination condition depends on the purpose of the algorithm. Step 2 is used to simplify the comments and should not be implemented. $$ \begin{array}[lrc]\\ Step&Precondition&Action&Comment &&Comment\\ 1&&n_c\gets n_0&/* n_0 \; \text{is the start value}&*/\\ 2&&&/*x\gets n_c&*/&\\ 3&/*n_c \text{ is odd}*/&n_d\gets n_c+1&/*x+1&*/\\ 4&/*n_d=2^ku*/&n_d\gets 3^ku&/*d^+(x+1)&*/&/*a(x)*/\\ 5&/*n_d \text{ is odd}*/&n_c\gets n_d-1&/*d^+(x+1)-1&*/\\ 6&/*n_c=2^ku*/&n_c\gets u&/*c^+(d^+(x+1)-1)&*/&/*b(a(x))*/\\ 7&&\mathbf{if }\;n_c =1 \; \mathbf{then}&/* \text{or} \; n_c<n&*/\\ &&\quad \text{stop}\\ &&\mathbf{else}\\ &&\quad \mathbf{goto} \text{ Step 2} \end{array} $$

This algorithm can be easily transformed to a pseudocode/Python3 program.

% is the modulo operator
// is integer division
** is the power operator
x += y means x=x+1, similar holds for other operators

Here is the program:

n=n0
while n>1:
    n+=1
    k=0
    while n%2==0:
        k+=1
        n//=2
    n*=3**k
    n-=1
    while n%2==0:
        n//=2

It can be rewritten by using some functions and replacing the variable k by e.

ctz(n) returns e, where $n=2^eu$, $u$ is odd
rsh(n,e) returns $\frac n{2^e}$
lut(e) returns $3^e$

the new program:

n=n0
while n>1:
    n+=1
    e=ctz(n)
    n=rsh(n,e)
    n*=lut(e)
    n-=1
    n=rsh(n,ctz(n))

The function ctz can be implemented by counting how often n can be repeatedly divided by two until the result is odd or by counting the number of trailing $0$ of the binary representation of n.
The function rsh can be implemented by multiplying n n-times by $2$ or by shifting the binary representation $n$-times to the right.
The function lut(e) returns $3^k$ and can be implemented by a lookup table if the number e will not become too large.

This program now looks like to the C-implementation of the algorithm posted by the OP at codereview.stackexchange.

You can get the $c^+$-trajectory from the $c$-trajectory in the following way: If you current value on the trajectory is odd, than proceed on the $c$-trajectory to the next value. If it is even then proceed to the next odd value (the second branch of the definition of $c^+$) The same holds for the construction of $d^+$ from $d$. This method is shown on the picture. The circled numbers are the values of the $c^+$ (first line) and $d^+$ (second line) trajectory of 27. The last two lines show how to construct the trajectory of $T$ from a trajectory of $c$ and $d$. If you start from an odd value $n$ then got to the opposite even value n+1 of the $d$ trajectory. From this go to the next odd value of the $d$-trajectory. Then go to the opposite even value of the $c$-trajectory by subtracting $1$ and from this go to the next odd value of the $c$-trajectory.

enter image description here

At the moment I cannot see any advantage in using the function $T$ instead of $c^+$ or $d^+$.

I evaluated the number of function calls one needs using $c^+$, $d^+$ and $T$ until the the trajectory reaches $1$. For all odd numbers $n \in \{3,...,N\}$ I summed these path lengths up and got the following numbers

     N  c+ all   c+ 2nd  d+ all   d+ 2nd    T all
  1000   16506     5469   16267     5461     5452
 10000  229650    76314  226297    76302    76275
100000 2848611   949409 2829632   949374   949358

So from this we see that the number of function calls need to reach the value $1$ in the trajectory is for the functions $d$ and $c$ about the same and three times higher than for the function $T$. But note that a call of the function $T$ contains a call to the second branch of $c^+ $ and a call to the second branch of $d^+$. So all in all one I cannot see that there is any large improvement in using $T$

To check if the trajectory of all numbers $n$ less than $N$ cycles one does not calculate the trajectory values until they reach $1$ but only until it reaches a value less than the start value $n$. I also calculated the number of iterations for different $N$

      N    c+all   c+2nd    d+all   d+2nd    T all
   1000     2696     895     2166     637     892
  10000    25909    8662    21002    6145    8660
 100000   260246   86777   210708   61692   86760
1000000  2612479  871075  2114522  620923  871073

Conclusion

The OP asked if his procedure is correct and I showed here that he uses the function $T$ and that a trajectory of $T$ is a subtrajectory of the Collatz function. So his procedure is correct. Additionally I showed that he cannot expect a substantial performance gain by using $T$ instead of $c^+$ because the number of iteration is the same (maybe they differ by a constant factor).

This is the Python 3 program that generates the data of the table

def c(n):
    # this is the function c+
    if n%2==1:
        return (3*n+1)//2
    else:
        while n%2==0:
            n//=2
        return n
def d(n):
    # this is the function d+
    if n%2==1:
        return (n+1)//2
    else:
        m=1
        while n%2==0:
            n//=2
            m*=3
        return m*n
def T(n):
    # this is the function T
    if n%2==1:
        return c(d(n+1)-1)
    else:
        return(c(n))

def statistics(n,f):
    if f == d:
        i=n+1
    else:
        i=n
    # stop_value=i  # stop if trajectory <=n 
    stop_value=2 # stop if trajectory <=2
    cnt=0
    even_cnt=0
    while i>stop_value:
        i=f(i)
        cnt+=1
        if i%2==0:
            even_cnt+=1
    return(cnt,even_cnt)

for N in [1000,10000,100000]:
    print(N)
    for f in (c,d,T):
        all_calls=0
        even_calls=0
        for N in range(3,N,2):
            tmp=statistics(N,f)
            all_calls+=tmp[0]
            even_calls+=tmp[1]
        print(f,all_calls,even_calls)

Computational verification of Collatz problem

Solution 1:

Related

Recent Posts