Efficiency of the prime generating constant $2.920050977316 \dots$ for the purpose of compressing a list of primes.

The constant $c \approx 2.920050977316 \dots$ has the property that if $\{a_n\}$ is the sequence defined by $a_1 = c, a_{n+1} = \lfloor a_n \rfloor (a_n - \lfloor a_n \rfloor + 1),$ then $\lfloor a_n \rfloor$ is the $n$th prime.

We need to calculate primes in order to calculate the constant rather than the other way around, so this does not yield a method of generating new primes. However, we might be able to use the constant to compress an already known list of primes. The question then becomes, how many digits of $c$ are needed to generate the first $n$ primes? Note that it takes $\sim n \log n$ digits to store the first $n$ primes naively and $\sim n \log \log n$ digits by storing the differences between consecutive primes.


We can refine the result of the paper as follows, where we will use $\lfloor x\rfloor$ for the floor function and $\{x\}=x-\lfloor x\rfloor$ to be the fractional part function.

Let $s_0,s_1,s_2,\ldots,s_n$ be a finite sequence of integers such that $s_k \leq s_{k+1} < 2s_k$ for every $k$. Let $I$ be the set of $c$ for which the sequence inductively defined by $a_1=c$ and $a_{n+1}=\lfloor a_n\rfloor(1+\{a_n\})$ has the property that $\lfloor a_k\rfloor = s_k$ for each $k$. There is some $z$ such that $$I=\left[z, z + \frac{1}{\prod_{i=0}^{n-1}s_i}\right).$$

This tells us that we need to choose the starting value in an interval of length $\frac{1}{\prod_{i=0}^{n-1}s_i}$, which generically takes $\log_{10}\left(\prod_{i=0}^{n-1}s_i\right)$ digits of precision. However, just writing out all the terms in the sequence would require writing about the same number of digits since $\log_{10}(\prod s_i) = \sum \log_{10}(s_i)$. Representing sequences this way doesn't have any advantage at least asymptotically - and this is not surprising, since this method is a fairly literal transcription of sequences satisfying the given inequalities - and this inequality alone is not a very helpful way to single out the sequence of primes.

Let's define $f(x)=\lfloor x\rfloor (1+\{x\})$. Note that $f$ is just a linear function of slope $n$ on each interval $[n,n+1)$. Define its iterates as $f^0(x)=x$ and $f^{n+1}(x)=f(f^{n}(x))$. Let's say that a given constant $c$ represents a sequence $s_0,\ldots,s_n$ if $\lfloor f^k(x) \rfloor = s_k$ for each $k$. Let me give two inductive proofs of the claimed theorem using this terminology.

Proof 1: We prove the statement directly by induction on $n$. The statement is trivial if $n=0$. Suppose that we wish to represent $s_0,s_1,s_2,\ldots,s_{n+1}$. By the inductive hypothesis, the set of representations of $s_1,s_2,\ldots,s_{n+1}$ is an interval $I$ of the prescribed length. A representation of $s_0,s_1,s_2,\ldots,s_{n+1}$ is just some $x$ such that $\lfloor x\rfloor = s_0$ and $f(x) \in I$. The second part of this condition simplifies to saying $s_0(x-s_0+1) \in I$, since we know that floor of $x$. Note that, since $I$ is a subset of $[s_1,s_1+1) \subseteq [s_0,2s_0)$, the condition that $s_0(x-s_0+1) \in I$ implies $\lfloor x\rfloor = s_0$. The set of representations of $s_0,s_1,s_2,\ldots, s_{n+1}$ therefore is the preimage of $I$ under a linear function of slope $s_0$, hence has length $\frac{1}{s_0}$ times the length of $I$. This immediately proves the desired statement.

Proof 2: For this proof, we use an inductive argument that builds up the sequence from the end rather than from the start, which enables somewhat clearer computation of the interval itself. We need to prove an additional statement as part of the inductive hypothesis, however:

If $I$ is the interval of representations for $s_0,s_1,s_2,\ldots,s_n$, then $f^n$ is linear when restricted to $I$ and has image $[s_n,s_n+1)$.

Essentially, we need that $f$ behaves well when we know what it represents and that it hits every possible terminating value. Clearly, if $n=0$, the sequence has only one term and is represented exactly on the interval $[s_0,s_0+1)$, on which $f^0$ is the identity function.

Next, let $I$ be the interval of $x$ representing the sequence $s_0,s_1,\ldots,s_{n-1}$. Note that $f^{n-1}$ restricted to $I$ is, by hypothesis, some linear function taking $I$ to $[s_{n-1},s_{n-1}+1)$. However, $f$ restricted to the interval $[s_{n-1},s_{n-1}+1)$ is also linear, satisfying $f(x)=s_{n-1}(x-s_{n-1}+1)$. The composition $f^n$ is therefore also linear on the interval $I$ - and, by calculation, takes $I$ to the interval $[s_{n-1},2s_{n-1})$. Note that, by hypothesis, the interval $[s_n,s_n+1)$ is contained within $[s_{n-1},2s_{n-1})$ and, in fact, is a proportion of exactly $\frac{1}{s_{n-1}}$ of that interval, hence its preimage under $f^n$ within $I$ - which is the set of sequences representing $s_0,s_1,\ldots,s_{n-1},s_n$ is the same proportion of $I$. In particular, since $I$ had length $\frac{1}{\prod_{i=0}^{n-2}s_i}$, it must be that this preimage has length $\frac{1}{\prod_{i=0}^{n-1}s_i}$, as was desired. Moreover, observe that we have already established the refinement we used for the inductive hypothesis.

If we work out the details of the calculation implied here, we end up determining that $$z=s_0+\sum_{k=0}^{n-1} \frac{s_{k+1}-s_k}{\prod_{i=0}^k s_k}$$ which also extends to finding the (unique) $c$ that generates an infinite sequences satisfying the condition that $s_k \leq s_{k+1}<2s_k$ for every $k$ and the additional condition that $s_{k+1}<2s_k-1$ for infinitely many $k$.