How often is a sum of $k$ consecutive primes also prime?

Let's define a $k$-sum as a sum of $k$ consecutive primes. For example, $15=3+5+7$ is a $3$-sum. How many $k$-sums are themselves prime? Here's one way to formulate the question more precisely: What is a simple formula $f_k(x)$ such that the number of prime $k$-sums less than $x$ is $\sim f_k(x)$?

Here's some handwaving. The number of primes less than $x$ is $\pi(x)\sim\frac{x}{\ln x}$, so the number of $k$-sums less than $x$ is $$n_k(x)\sim\pi(x/k)\sim\frac{x/k}{\ln(x/k)}=\frac{x}{k\ln x - k\ln k}\sim\frac{x}{k\ln x}.$$

If the primes and $k$-sums are totally uncorrelated (?), then the number of prime $k$-sums should be something like $$\pi_k(x)\sim\frac{\pi(x)n_k(x)}{x}\sim\frac{x}{k(\ln x)^2}?$$

But that's not right, because all $k$-sums (except the first) have the same parity as $k$. So for $k$ even, $\pi_k(x)\sim 0$, whereas for $k$ odd, we should multiply the above guess by 2: $$\pi_k(x)\sim\frac{2x}{k(\ln x)^2}?$$

There are just naive heuristics, and I don't know how to go further. Do you?

Bonus questions: Roughly how large is the first prime $k$-sum, as a function of $k$? What are the statistics for primes that are $k$-sums for at least $n$ different values of $k>1$? For example, $863$ is a $k$-sum for $3$ values of $k$: $5$, $7$, and $15$.

(This is a followup question to "What does this music video teach us about 863?")


Edit: Note that Project Euler problem 50 also asks about large prime sums of consecutive primes. I haven't seen any link to theory, though.


Solution 1:

Heuristics suggest the asymptotics are $2C \pi(\pi(x/k)) \sim \frac{2Cx}{k (\ln x)^2}$ where $C$ is a constant correction factor to account for the bias in distribution of $k$-sums. The fraction of $k$-sums that are nonzero mod $p$ is slightly larger than in the uniform distribution if $k$ is odd, and slightly smaller if $k$ is even (which would cause similar correction factors in questions like when is half the sum of $4$ consecutive primes equal to a prime).

The bias is due to correlation between nearby partial sums of the sequence of primes, mod $p$. Except when adding $p$, adjacent sums must be different, and the increased chance of leaving in one step means a larger chance of returning in two steps. The same alternating pattern for even/odd number of steps disturbs the uniform $1/p$ odds of returning to the same place after $k$ steps (which is the same event as a $k$-sum of $0$).


Fix an odd number $k$, and an odd prime $p$.

If, for purposes of this problem,

the sequence of primes, reduced modulo an odd prime $p$, behaves like an i.i.d sequence of uniformly distributed nonzero residues mod $p$.

then the sequence of $k$-tuples in the $k$-sums mod $p$ is a uniform random walk on the directed graph $G$ whose vertices are the subset of $(\mathbb{Z}/p)^k$ that excludes all the planes $x_i = x_{i+1}$, and edges join $(x_1,\dots,x_k) \to (x_2,\dots,x_k, y)$ with $y \neq x_k$. The graph is strongly connected and the walk is aperiodic, so there is a unique stationary distribution, to which the walk converges exponentially fast. For making the analysis more precise it could be important to know whether this convergence rate is uniform in $p$, but I will ignore that complication. The stationary distribution is the uniform distribution, since the indegree = outdegree = $(p-1)$ for all vertices. Hence,

the distribution mod $p$ of the probabilistic $k$-sums is the same as that of of $k$-sum on vertices of $G$ (in the uniform distribution on $G$)

and we are reduced to a problem of counting solutions to $x_1 + x_2 + \dots x_k = a$ with $x_i \neq 0$ mod $p$. By inclusion-exclusion the number of solutions is a polynomial $P_a(t)$ evaluated at $t=p$. The polynomial is monic of degree $k-1$, does not depend on $p$, and the dependence on $a$ is only in whether $a=0$ or not. Hence there are, for every $k$, two polynomials counting the number of $k$-sums with $a=0$ and $a \neq 0 \mod p$, which will be denoted $P_k(t)$ and $Q_k(t)$, subject to the recursions

$Q_k(t) = P_{k-1}(t) + (t-2)Q_{k-1}(t)$ and $P_k(t) = (t-1)Q_{k-1}(t)$, with initial conditions $P_1(t)=0$ and $Q_1(t)=1$

Of course $P_k + (t-1)Q_k = (t-1)^k$, and the recursion shows $P_k - Q_k= -(P_{k-1}-Q_{k-1})$ alternates sign, so that $(P_k(t),Q_k(t))=(\frac{(t-1)^k + (-1)^{k}(t-1)}{t},\frac{(t-1)^k + (-1)^{k-1}}{t})$. As a check on the calculations, these counts are nearly equal, in fact the bias is the smallest possible given that all nonzero residue classes must receive the same deviation from the uniform distribution. Setting $t=p$ and dividing by $(p-1)^k$ gives the asymptotic proportion of zero and nonzero $k$-sums mod $p$ as

$p_0 = \frac{1}{p} + (-1)^k \frac{1}{p(p-1)^{k-1}}$ and $p_{\neq 0} = (1 - \frac{1}{p}) + (-1)^{k-1} \frac{1}{p(p-1)^{k-1}}$

so that in the optimistic assumption of rapid convergence (of the actual prime sums) to that asymptotics, the density of $k$-sums that are nonzero mod $p$ can be written as $p_{\neq 0} = (1 - \frac{1}{p})c_p$ with $c_p= 1 + \frac{(-1)^{k-1}}{(p-1)^k}$ the local correction term.

The infinite product $C = \displaystyle \prod_p c_p$ converges to the presumed correction factor. For $k=2$ it is the twin prime constant, and there is no closed-form evaluation for any $k > 1$.

Solution 2:

For fixed $k$, the n-th $k$-sum is about $kn\log n$ and so $n_k(x)\approx x/(k\log x).$ For any prime $p$ you can find the chance that a $k$-sum is divisible by $p$ as the coefficient of $x^0$ in $$ (x+x^2+\cdots+x^{p-1})^k=\left(\frac{x^p-1}{x-1}\right)^k\pmod{x^p} $$ divided by $p^k$, call it $c_k(p)$.

Then the heuristic probability that a $k$-sum is prime needs to be modified by $$ C_k=\prod_{p\text{ prime}}\frac{p-pc_k(p)}{p-1} $$ to get $$ \frac{C_kx}{\log^kx}. $$