Write the integers from any $n$ through $0$ descending in a column, where $n \geq 2$, and begin a second column with the value $2n$. For each entry after that, if the two numbers on that line share a factor, copy the the entry unchanged, but if they're coprime, subtract $1$.

We'll refer to the first column as $a$, where each value is the same as its index, and the second column as $b$, where the $a$th row's entry is $b_a$. The $0$-index refers to the bottom row. Equivalently,

$$ b_a = \begin{cases} 2n & \textrm{if } a = n \\ b_{a+1} - 1 & \textrm{if }\gcd(a+1,b_{a+1})=1 \\ b_{a+1} & \textrm{otherwise}\end{cases}$$


Consider the following example where $n=8$. I've also included a column showing $\gcd(a,b_a)$, and colored those $b_a$ that share a factor with $a$ and thus don't change.

$$ \begin{array}{|c|c|c|} \hline a & b_a & (a,b_a) \\ \hline 8 & \color{red}{16} & 8 \\ \hline 7 & 16 & 1 \\ \hline 6 & \color{red}{15} & 3 \\ \hline 5 & \color{red}{15} & 5 \\ \hline 4 & 15 & 1 \\ \hline 3 & 14 & 1 \\ \hline 2 & 13 & 1 \\ \hline 1 & 12 & 1 \\ \hline 0 & 11 & 11 \\ \hline \end{array} $$


Assertion: $b_0$ will always be prime.

Why? Well, suppose not, that some smaller prime $p<b_0$ divides it. In particular, let $p$ be the smallest prime factor that divides $b_0$. Since $b_0 \neq b_n$, and $p\geq 2$, we have $p<n$, so if a prime does divide $b_0$, it must be in our column of $a$ values.

$p \mid b_0 \implies p \mid b_p$. This is because $p$ can only divide $b_0$ if it has already been established by dividing $b_{kp}$ for some $k\geq 1$. A factor cannot have its first appearance at $b_0$ unless it is prime.

That said, $p \mid b_p \implies b_p = b_{p-1}$. However, that means $b_{p-1} \not\equiv b_0 \pmod {p}$, regardless of which $b_a$ decrement or not; there are one too few to make it back to our asserted divisibility, and we're left with $b_0 \not\equiv 0 \pmod {p}$, i.e. $p \nmid b_0$, a contradiction. (Recall that $b_1 - b_0 = 1$ always, preventing a constant $0 \pmod p$ all the way down.)

$$ \begin{array}{|l|l|} \hline n & 2n \\ \hline \dots & \dots \\ \hline p & b_p \equiv 0 \pmod{p} \\ \hline p-1 & b_{p-1} \equiv 0 \pmod {p} \\ \hline p-2 & b_{p-2} \equiv \{0\text{ or } p-1\} \pmod{p} \\ \hline p-3 & b_{p-3} \equiv \{0\text{ or } p-1 \text{ or }p-2\} \pmod{p} \\ \hline \dots & \dots \\ \hline 0 & b_0 \not\equiv 0 \pmod{p} \\ \hline \end{array} $$


Conclusion: As we've established there can be no smallest prime factor dividing $b_0$, it must be prime. Now that we have prime $b_0$, we can apply the same process arbitrarily with any $n$, and immediately we've shown there exists a prime in any $(n,2n)$ interval.


It's pretty clear I got the logic wrong for an important chunk of the proof, and I'm working on a clever way to solve that, but in the meantime, I have an idea for a less elegant fix.

If you look at the actual mechanism of what's going on, it's basically this. The subtracting one only when coprime essentially maintains a number (the difference $b_a - a$ for any $a$) which it's trying to rule out as a prime. This starts off as $n$, which is automatically bumped up to $n+1$ on the next line since $n \mid 2n$. Thereafter, whenever any factor in $a$ is shared by a factor in $b_a -a$, it's marking $b_a-a$ as composite and moving on. You can see that in this partial chart for $n=113$, where the right hand column is just the difference of the first two:

$$ \begin{array}{|l|l|l|} \hline 113 & 226 = 2 \cdot 113 & 113 \\ \hline 112 = 2^4\cdot 17 & 226 = 2 \cdot 113 & 114=2\cdot 3 \cdot 19 \\ \hline 111 = 3\cdot 37 & 226 = 2 \cdot 113 & 115 = 5 \cdot 23 \\ \hline 110 = 2\cdot 5\cdot 11 & 225 = 3^2 \cdot 5^2 & 115 = 5 \cdot 23 \\ \hline 109 & 225 = 3^2 \cdot 5^2 & 116 = 2^2 \cdot 29 \\ \hline 108 = 2^2 \cdot 3^3 & 224=2^5 \cdot 7 & 116 = 2^2 \cdot 29 \\ \hline 107 & 224=2^5 \cdot 7 & 117 = 3^2 \cdot 13 \\ \hline 106 = 2 \cdot 53 & 223 & 117 = 3^2 \cdot 13 \\ \hline 105 = 3 \cdot 5 \cdot 7 & 222 = 2\cdot 3 \cdot 37 & 117 = 3^2 \cdot 13 \\ \hline 104 = 2^3 \cdot 13 & 222 = 2\cdot 3 \cdot 37 & 118 = 2\cdot 59 \\ \hline 103 & 222 = 2\cdot 3 \cdot 37 & 119 = 7 \cdot 17 \\ \hline 102 = 2 \cdot 3 \cdot 17 & 221=13 \cdot 17 & 119 = 7 \cdot 17 \\ \hline 101 & 221=13 \cdot 17 & 120 = 2^3 \cdot 3 \cdot 5 \\ \hline 100 = 2^2 \cdot 5^2 & 220 = 2^2 \cdot 5 \cdot 11 & 120 = 2^3 \cdot 3 \cdot 5 \\ \hline \dots & \dots & \dots \\ \hline 88 = 2^3 \cdot 11 & 214 = 2 \cdot 107 & 126 = 2 \cdot 3^2 \cdot 7 \\ \hline 87 = 3 \cdot 29 & 214 = 2 \cdot 107 & 127 \\ \hline 86 = 2 \cdot 43 & 213 = 3 \cdot 71 & 127 \\ \hline \dots & \dots & \dots \\ \hline \end{array} $$

It takes $14$ non-decrements, which is exactly the amount needed to get from $113$ through the big gap there up to the next prime $127$, and thereafter there are no more shared factors and it stays $127$ the whole way down, and it does indeed always work like this.

So the size of the prime gap is one determiner of how long that "trial division" section lasts, and the other is the size of the factors themselves. As I said, any factor present will do, and I can't discern much rhyme or reason to it, so that leaves us with making a worst-case upper bound estimate of the sum of the least prime factors comprising every number in the prime gap. In this example, I think that adds up to $60$ or so, but it's one of if not the worst case around.

To make this rigorous, we can use the current best upper bound established on prime gap size for sufficiently large $x$ of $x^{0.525}$. If we consider some large $x$ as having a gap of that size, we can immediately mark half of those entries as being even, which means in the worst case, it would require two $a$-decrements to move past each of those entries within the gap. So that half of the gap is just

$$x^{0.525} / 2 \times 2 = x^{0.525},$$

and leaves us half left to deal with. Here, we could undoubtedly continue to whittle down our estimate by taking out other small factors, but I'm not sure that really helps anyway. Ignoring removing small factors, our bottom line is that we need

$$x^{0.525} x^k < x,$$

where $x^k$ represents an upper bound for the sum of the least prime factors in that gap, and it looks like we need $k<0.475$. I would expect that $x^k$ to work out to something more like $\log{x}$, but I'm not aware of any bounds that immediately say that.

So no, this isn't a completed proof either, but I thought I would share some of my thinking. I'm still hoping for a nice elegant solution to pop out. That said, if this approach can be made to work, that should instantly prove my approach valid for large $n$... but of course, using something more powerful than Bertrand's postulate to help do it sort of defeats the purpose. More updates later.


One other thing worth a mention. There's an easy workaround for scenarios where this fails. If $b_0=cd$, some composite, restart the process using $c, (c+1)d$, and repeat as necessary. This lets you do fun stuff like hit the prime values in $p(p+1)$.

For example, starting with $\{29, 29\cdot 30\}$ will yield $b_0=851=23\cdot 37$. Restart with $\{23, 23\cdot 37 + 23\}$, and you'll get a valid $b_0=853$. This seems to work fine empirically, but I have to doubt there's any way to justify it rigorously.


Update: Just a quicky. I got to thinking about Arnaud's note about reverse engineering, and I've got an idea to float. I tried doing some mapping of the origin possibilities for various $b_0$, and while the primes are nice and robust, the composites are not. The very best they have to offer in the first 500 or so is probably:

graph

which makes sense, what with $209$ being a larger semiprime and $233$ up top being one half of a problem semiprime that shows up a bit.

I had hoped that that possibility graphs for the primes could be infinite, but if my code is right, it turns out they're merely far larger than the non-primes. Here's a sample:

\begin{array}{|l|l|l|l|} \hline \mathbf{b_0} & & \textbf{nodes} & \textbf{max length} \\ \hline 101 & 101 & 6206 & 818 \\ \hline 102 & 2\cdot 3\cdot 17 & 1 & 0 \\ \hline 103 & 103 & 9779 & 918 \\ \hline 104 & 2^3\cdot 13 & 1 & 0 \\ \hline 105 & 3\cdot 5\cdot 7 & 4 & 2 \\ \hline 106 & 2\cdot 53 & 1 & 0 \\ \hline 107 & 107 & 11059 & 1074 \\ \hline 108 & 2^2\cdot 3^3 & 1 & 0 \\ \hline 109 & 109 & 6293 & 1094 \\ \hline 110 & 2\cdot 5\cdot 11 & 1 & 0 \\ \hline 111 & 3\cdot 37 & 4 & 2 \\ \hline 112 & 2^4\cdot 7 & 1 & 0 \\ \hline 113 & 113 & 8886 & 1184 \\ \hline 114 & 2\cdot 3\cdot 19 & 1 & 0 \\ \hline 115 & 5\cdot 23 & 8 & 4 \\ \hline 116 & 2^2\cdot 29 & 1 & 0 \\ \hline 117 & 3^2\cdot 13 & 4 & 2 \\ \hline 118 & 2\cdot 59 & 1 & 0 \\ \hline 119 & 7\cdot 17 & 44 & 14 \\ \hline 120 & 2^3\cdot 3\cdot 5 & 1 & 0 \\ \hline 121 & 11^2 & 70 & 22 \\ \hline 122 & 2\cdot 61 & 1 & 0 \\ \hline 123 & 3\cdot 41 & 4 & 2 \\ \hline 124 & 2^2\cdot 31 & 1 & 0 \\ \hline 125 & 5^3 & 20 & 8 \\ \hline 126 & 2\cdot 3^2\cdot 7 & 1 & 0 \\ \hline 127 & 127 & 12230 & 1268 \\ \hline \end{array}

I also analyzed some parameters from the first $15000$ non-prime graphs. There are a few strong correlations, particularly that between large semiprimes and larger graphs, but the most promising find is what looks like a strong bound on the ratio of total nodes in the graph to $b_0$. It was $<1$ always, and looked to be decreasing, suggesting a strong bound might be possible. (This same ratio was $>1$ for all primes, and scaled very close to linearly.)

Since the maximum length (or height if you like) of the graph is the critical piece that determines whether or not this whole conjecture works, and since that length is a subset of the total graph, a hard limit on the number of nodes would effectively be a proof that the conjecture holds up.

To be clear, "nodes" correspond to starting pairs of numbers which would lead to a given $b_0$. The pair of numbers in question are the ones we previously called $n$ and $2n$, but we're being more flexible now. So, if it turned out there were some compelling reason why any given composite $m$ must have less than $m$ different starting pairs that led to its being $b_0$, that would be sufficient for a proof.


Latest attempt

All right. I'm going to try justifying the original $(n,2n)$ approach again.

First, however, I think it serves to look at $(n,n+2)$ as the seed pair. $n=16$ looks good for illustrative purposes. Here's a chart for it; as someone else pointed out, the $b$ column is unnecessary in this case. We could replace it with $c=b-a$, which is more clear and will share all of $b$'s relevant factors, since we're only interested in where $a$ and $b$ overlap. That said, we'll leave it in for this one.

$$ \begin{array}{|ll|ll|ll|} \hline \textbf{a} & & \textbf{b} & & \textbf{c} & \\ \hline 16 &= 2^4 & 18 &= 2 \cdot 3^2 & 2 & \\ \hline 15 &= 3\cdot 5 & 18 &= 2 \cdot 3^2 & 3 \\ \hline 14 &= 2 \cdot 7 & 18 &= 2 \cdot 3^2 & 4 &= 2^2 \\ \hline 13 & & 18 &= 2 \cdot 3^2 & 5 & \\ \hline 12 &= 2^2 \cdot 3 & 17 & & 5 & \\ \hline 11 & & 16 &= 2^4 & 5 & \\ \hline 10 &= 2 \cdot 5 & 15 &= 3\cdot 5 & 5 & \\ \hline 9 &= 3^2 & 15 &= 3\cdot 5 & 6 &= 2 \cdot 3 \\ \hline 8 &= 2^3 & 15 &= 3\cdot 5 & 7 \\ \hline 7 & & 14 &= 2 \cdot 7 & 7 & \\ \hline 6 &= 2 \cdot 3 & 14 &= 2 \cdot 7 & 8 &= 2^3 \\ \hline 5 & & 14 &= 2 \cdot 7 & 9 &=3^2 \\ \hline 4 &= 2^2 & 13 & & 9 &=3^2 \\ \hline 3 & & 12 &= 2^2 \cdot 3 & 9 &=3^2 \\ \hline 2 & & 12 &= 2^2 \cdot 3 & 10 &=2\cdot 5 \\ \hline 1 & & 12 &= 2^2 \cdot 3 & 11 &\\ \hline 0 & & 11 & & 11 &\\ \hline \end{array} $$

We're using the same system here for determining the successive values in $b$ as we did earlier: subtract $1$ when coprime with $a$, otherwise move it down unchanged.

I'd mostly like to use this table to point out there's nothing magical or inexplicable happening here. It's probably most clear in $c$: we're simply counting up from $2$, and keeping each value until it matches with a factor in $a$, and then we increment by one. Any factor at all will do, so long as it's shared with $a$.

A few things to notice. First, since $a$ is ascending with no pauses and $c$ is descending but waiting for a match before incrementing, it's natural that $c$ will grow more slowly, but given the large number of small factors available as $a$ flies by, it'll still grow a respectable amount.

Second thing, and this is really the important one, is to note the $11$ at the bottom of the column. Our whole system is predicated on the notion that this number will always be a prime, provided you plug in reasonable seed values. And this table shows why.

To state the obvious first off, we had to end on something. We didn't know it had to be prime, perhaps, but it's obvious $c$ was counting up and going to end somewhere. More to the point, note that we're not claiming that it's going to reach any specific prime yet, just that it's slowly growing. So the question is, why should we expect that bottom value to be prime necessarily?

Look at the penultimate prime, the $7$. It won't always be $7$, but there will always be a next-to-last prime, and after we hit it, there's often a spatter of small factor annihilation just like we see below. Whether this happened at $7$ or at $737$, the space and factors needed to bridge the gap to the next prime will always be available.

The upshot is that a prime will always be waiting there, since obviously no big factors are showing up between $1$ and $0$. In particular, only smaller factors come after the penultimate prime. Usually there's plenty of room; this example shows as close as it ever gets to having the prime superseded by small factors.

I realize this isn't proof-level justification that that can never happen. That said, I think I could explicitly point out a sufficiently covering bijective mapping of factors from one column to the other that always takes place, but at the moment I'm satisfied if that was persuasive.

And that's the bulk of it. I think taking $(n,n+2)$ better illustrates the underlying mechanism, but if you look closely, you'll notice that line $7$ with $14$ next to it. That means that from there down, this chart is identical to if we had used $(7,14)$ as our seed pair from the outset.

The same applies to any $(p,2p)$; there are arbitrarily many $(n,n+2)$ charts that can be cut off to get whatever pair you like. Presumably this is true for $(n,2n)$ as well, although we'll avoid that just to play it safe. And of course there is no need of actually finding such charts; if you subscribe to the validity of the example process, that should suffice to show the validity of using any $(p,2p)$ as a seed pair.

A couple of closing notes, then. When we do use $(p,2p)$, it has the additional handy feature of providing not only a prime in that range, but the very next prime larger than $p$. This should make sense after having seen our example.

And finally, do note that this gives us what we were after all along: a proof of primes in every $2n$ interval. We can of course apply this as much as we like using any arguments we want for $p$. Some of my additional data also suggest that after five or ten early exceptions have passed, we should be able to use $(4p,5p)$, and sometime after getting up to $1000$ or so, $(9p,10p)$ and even $(19p,20p)$, giving us far tighter bounds on those intervals.

I think that covers it. So what crucial element did I miss this time? Specifically, is the factor matchup stuff a critical tricky part which defeats the whole purpose if I omit it, or is it as straightforward to actually prove as I hope?

(...actually since writing that, I went and ran a battery of tests against that general principle of factor matching. It is ROBUST. This is the least of what it can do reliably. Still not a proof, but I'm much more convinced one would be easy to come up with now.)


Solution 1:

Partial answer.

Conjecture 1: $b_0$ is the smallest prime larger than $n$.

Conjecture 2: $b_0$ is always a prime number as soon as $b_n$ is greater than $n+1$ and lower than some increasing bound. For a fixed $n$, all those prime values of $b_0$ make up a set of consecutive primes.

What is proved so far:

Regarding Conjecture 1

  • If the bottom-right value is a prime, then it's the smallest prime larger than $n$.
  • The conjecture is true when the gap between $n$ and the next prime is $|p-n|\leq 4$

Regarding Conjecture 2

The table below shows the range of $b_n$ values for which $b_0$ is a prime. Blockquote

Proof of Conjecture 1 in the case where $n=p-1$ with $p$ prime.

The second row is $(p-2, p+(p-2))$, which are coprime numbers, and therefore by an immediate induction since $p$ is prime you can see that every subsequent row is of the form $$(a,p+a)$$ down to the last row $(0,p)$ as promised.$\,\,\square$

Proof in the case where $n=p-2$ with $p$ prime ($p>2$).

The second row is $(p-3, 2(p-2))$ and these two are not coprime: since $p>2$ is prime, $p-3$ is even. Therefore the third row is $(p-4, (p-4)+p)$ and from here we conclude the same way as before. $\,\,\square$

Proof in the case where $n=p-3$ with $p$ prime.

There you start to see some new arguments, where the proof is not constructive.

The second row is $(p-4, (p-4)+(p-2))$. They're coprime since $p$ is odd. You go down to $(p-5, (p-5)+(p-2))$. As long as you keep coprime pairs, you go down as $(p-k, (p-k)+(p-2))$. But the trick is that $p-2$ can't be prime, otherwise you wouldn't be in the case $n=p-3$, $p$ prime but rather $n=q-1$, $q$ prime (first case treated above) with $q=p-2$. So at the very least, when $a$ becomes a factor of $p-2$, you will get $(a,a+(p-2))$ and from there get down to $(a-1,(a-1)+(p-1))$.

From then on you can't stay at a difference $b-a=p-1$ for long, since $p-1$ is even. As soon as $a$ becomes even you will get up to $b-a=p$ and win.$\,\,\square$

Proof (sketch) in the case where $n=p-4$ with $p$ prime ($p>2$).

The proof for $n=p-3$ can be repeated: you're going to get rid of the difference $b-a=p-3$ very fast since $p$ is odd, you're getting rid of $b-a=p-2$ sooner or later since $p-2$ can't be a prime, and then you're getting rid of $b-a=p-1$ in at most two moves since $p$ is odd.$\,\,\square$



  • One problem in the general case is that you can't reverse-engineer the table, e.g. $(1,8)$ could come from $(2,8)$ or it could come from $(2,9)$.

  • If you add a column $b-a$, it starts at $n$, and goes non-decreasing. If it ever reaches a prime number, then it will stay at that prime number, since from then down you will have $(a=k, b=p+k)$ down to $(0,p)$ and the output will therefore be the smallest prime greater than $n$.

  • So all you've got to do is prove that you do reach a prime at some point. You could try to do that assuming Bertrand's postulate, it would already be some achievement.

Solution 2:

Let me start by saying, this is awesome!

Here is a partial answer.

Let me call the number next to $i$ on the table $a_i$. Also, I would rather work with $b_i=a_i-i$. Notice that $$ \operatorname{gcd}(i, a_i) = \operatorname{gcd}(i, a_i -i) = \operatorname{gcd}(i, b_i). $$ As we go down the table, we follow the rules:

  • $a_n = 2n$, so $b_n = n$.
  • $a_{n-1} = 2n$, so $b_{n-1} = n+1$.
  • If $(a_i, i) = 1$, then $a_{i-1} = a_i - 1$, so $b_{i - 1} = b_i$
  • If $(a_i, i) \neq 1$, then $a_{i-1} = a_i$, so $b_{i - 1} = b_i + 1$
  • At the end, $a_0 = b_0 = q$.

Now, if we look at the sequence $b_i$ as $i$ decreases, it will increase until it hits a prime and then it won't ever increase. I have no clue why it would reach this prime before $n$ steps.

I am like 85% confident on my coding skills and I think this works for all $n$'s up to $80000$. Also, if you look at the number of steps before you reach a prime, the numbers look half as long (as in it looks like the square root), so I am going to guess that the sequence reaches a prime pretty fast.

Solution 3:

The argument given makes no sense to me (and, judging from the comments, I'm not alone). To try to fix it, I suggest that you

  1. Use some notation which lets you talk unambiguously about different rows in the table. It's fairly standard to use subscripts for states in a process, so define $$b_a = \begin{cases} 2n & \textrm{if } a = n \\ b_{a+1} - [a+1, b_{a+1} \textrm{ coprime}] & \textrm{otherwise} \end{cases}$$
  2. Fix $2 \le p < q$ to be the smallest non-trivial factor of $q$ (assumed composite).
  3. Work up from $a=0$ to $a=p$ rather than beginning the argument at $a=p$.

But it's not going to be an easy task, because there are unstated assumptions which don't seem to be justified. In particular, the line

And if $p \mid q$, then $p \mid q+p$. But if it did, then because the right side would be unchanged on the next line $p-1$

seems to assume that if $b_0$ is composite with prime factor $p$ then $b_p = b_0 + p$. It's easy to derive a contradiction from "$b_0$ is composite with prime factor $p$ and $b_p = b_0 + p$". It's easy to show that if $p$ is the smallest prime factor of $b_p$ then $b_0 = b_p - p$. But neither of those is anywhere near sufficient: the goal is to derive a contradiction from the much simpler statement that $b_0$ is composite.

Edit: it's now claimed explicitly that $p | b_0$ implies $p | b_p$, but to me it looks like a proof by assertion. This needs much more detail to show that there's a justified argument.


Another issue which I think should be addressed is the strength of the argument. In particular, why should the same argument not hold when we change the definition to $b_n = n^2$? It's still the case that if $b_0$ is composite then it has a prime factor $p$ which has appeared in the first column, but under these starting conditions e.g. $n=10$ yields $b_0 = 95$.