Is it already known that $\sum_{i=1}^x\cos(S(i))\sim ax\cos(b\ln x)$, as $x\to\infty$, where $S(i)$ is the number of Collatz steps from $i$ to $1$?

Some commenters were concerned with the lack of intermediate points. Here's a plot with all values up to 10000. I think there's definitely something going on here.

plot of sum_i cos collatz steps for i

Interestingly, $\cos(S(i))$ itself has a strong cosine shape.

plot of cos collatz steps for i

Code:

import numpy as np
import matplotlib.pyplot as plt

N = 10000

mem_S = {1:0}

def S(n):
    if n in mem_S:
        return mem_S[n]
    if n % 2 == 0:
        ans = S(n//2) + 1
    else:
        ans = S(3*n + 1) + 1
    return ans

n = np.arange(1, N+1)
S = np.array([S(n) for n in range(1, N+1)])
cos_S = np.cos(S)
sum_cos_S = np.cumsum(cos_S)

plt.plot(n, sum_cos_S)
plt.show()

plt.plot(n, cos_S)
plt.show()

Edit: I have a much better idea of what's going on now. Long story short, this pattern happens because of a numerical coincidence:

$$ 2\pi\frac{\log 2 - \frac{1}{3}\log 6}{\frac{1}{3}\log 6} = 1.0088... \approx 1 $$

For plot 2 above, we can see that $\cos(S(n))$ seems to look "on average" like $\cos(b\log n)$. So the nice looking wave we get for $\Sigma_m^n \cos(S(m))$ just comes from the sum averaging out the noise in $\cos(S(n))$.

The question is why $\cos(S(n))$ seems to look so much like $\cos (b \log n)$ for some $b$.

Here's the plot that answers that question. ($n$ goes up to 100000 in this plot):

scatter plot of S(n) vs log n coloured by e^(iS(n))

The individual points are coloured according to $e^{iS(n)}$. (Command to plot is plt.scatter(np.log(n), S, c=(S % (2*np.pi)), cmap="twilight"))

The first thing to notice is that the points clump up into clusters, and those clusters are organized into a lattice. I have a rough explanation for the existence of the lattice, although the fact that it's so neat and orderly still surprises me. The rough explanation is that for large numbers, the $+1$ in $3n+1$ is tiny compared to the overall size of the number. So generally, we can expect that a $3n+1$ step is more or less like adding $\log 3$ onto $\log n$ and an $n/2$ step is like subtracting $\log 2$. Getting from $n$ to 1 on a linear scale means getting from $\log n$ to 0 on a log scale. So the points of the lattice are given by $(a \log 2 - b \log 3, a+b)$ for integers $a,b$, and the small $+1$ errors create a spread out cluster of points around each lattice point.

Now that the lattice is explained, the next question is why there are vertical stripes of colour, corresponding to $e^{iS(n)}$. It was perfectly possible that these stripes not be vertical, but some other angle. That would cause the pattern to be destroyed by destructive interference.

The lattice basis vectors given by the expression above are $(\log 2, 1)$ and $(-\log 3, 1)$. We can change basis to replace $(-\log 3, 1)$ with $(\log 2 + \log 3, 0) = (\log 6, 0)$. The $S(n)$ component of this vector is 0, so lattice points separated by this vector all have the same value of $S(n)$, and thus the same value of $e^{iS(n)}$.

So the pattern must have $\log 6$ periodicity along the $\log n$ axis. But in fact, the wavelength we observe is $\frac{1}{3} \log 6$, a special case of $\log 6$ periodicity.

We can now use the remaining basis vector, $(\log 2, 1)$ to check if the stripes are vertical. Incrementing $S(n)$ changes the phase $e^{iS(n)}$ by a factor of $e^i$. If the stripes are vertical, then adding $\log 2$ to $\log n$ should change the phase by $e^{2\pi i \log 2 / \frac{1}{3}\log 6}$. Since,

$$ 2\pi\frac{\log 2 - \frac{1}{3}\log 6}{\frac{1}{3}\log 6} = 1.0088... \approx 1 $$

these phase differences are approximately equal, as expected. This implies the stripes are indeed approximately vertical. We could get exactly vertical stripes by scaling $S(n)$ very slightly before applying the cosine. Without this correction, the formula will eventually fail for large $n$. This method also gives a value for $b$, being $2\pi$ over the wavelength:

$$ b = \frac{6\pi}{\log 6} = 10.5201... $$


Here is non-rigorous heuristic approach-

The values of $S(x)$ are random locally, but we can make a global estimate $s(x)$. In half the cases, $x$ is even, where we get $S(x)=1+S(\frac{x}{2})$. In the other half of cases, we get $S(x)=2+S(\frac{3x+1}{2})$. Thus: $$s(x)=\frac{1}{2}(1+s(\frac{x}{2})) + \frac{1}{2}(2+S(\frac{3x}{2}))$$ Taking $u=\frac{3x}{2}$, we have: $$s(u) = 2s(\frac{2u}{3}) - s(\frac{u}{3}) - 3$$ We can see that $s(x)$ must grow logarithmically. Let us estimate $s(x)=b\log{x}$. Then, our equation above gives us the constraint: $$b\log{u} = 2b(\log{u}+\log{2}-\log{3}) - b(\log{u}-\log{3}) - 3$$ $$b(2\log{2}-\log{3}) = 3 \implies b = \frac{3}{2\log{2}-\log{3}} \approx 10.43$$

Now, we have a rough estimate for $S(i)$, given by $s(i)$. Let's try to use this to estimate our summation: $$f(x) = \sum_{i=1}^x \cos(S(i)) \approx \sum_{i=1}^x \cos(s(i)) = \sum_{i=1}^x \cos(b\log{i}) \sim \int_1^x \cos(b\log{u}) du$$

Evaluating the integral, we get: $$f(x) \approx \frac{bx\sin(b\log{x})+\cos(b\log{x})}{b^2+1} \sim \bigg(\frac{b}{b^2+1}\bigg)x\sin(b\log{x})$$

Since phase changes do not affect the estimation much, we get something of the form $ax\cos(b\log{x})$, where $a=\frac{b}{b^2+1} \approx 0.095$.

Our values of $b$ match, but the values of $a$ do not. I am unsure why this happens. Maybe this method can be refined to get more accurate values, or I have made a mistake in the calculation, or the experimental value of $a$ will approach the estimated value.


Differentiating both $\sum S(i)$ and $i\cos\log i$ and then taking arc cosines roughly translates your claim to $S(i)\sim \log(i)$, which is plausible if you believe multiplications by three and divisions by two appear at a roughly equal ratio in the Collatz sequence for all starting values.

In fact, we clearly have $S(i)\geq \log_2(i)$ and your conjecture may be equivalent to a strong Collatz conjecture of the form that $S(i)$ is not only finite but behaves like $S(i)\leq C\log i$ in some asymptotic and amortized sense.

PS: undoing your operations to handwaveily reduce your conjecture to something more intuitively believable is in no way supposed to take away from your discovery. In fact, maths is often about transforming a problem until it suddenly looks manageable, and the comparison of the scatter plots of $S(i)$ and of $\cos(S(i))$ in the other answer shows that your transformations do appear to reveal a cleaner, more manageable, structure. Indeed, the fact that the values of $\cos(S(i))$ change phase continuously, even though the $S(i)$ don't, is very interesting to me and I'm hoping someone that knows number theory better than I can elucidate.

Edit: according to https://www.ams.org/journals/mcom/2003-72-242/S0025-5718-02-01425-4/home.html researchers conjecture that $S(i)\sim 6.95log(i)$ for most values of $S(i)$.