What is the sum of the binomial coefficients ${n\choose p}$ over prime numbers?

Solution 1:

Following Qiaochu Yuan's approach, the inequalities $$ \frac{2^n}{\log n} \ll S_n \ll \frac{2^n }{\log n} $$ seems plausible. The lower bound is a conjecture, but it is possible to prove the upper bound.

Notations in this answer

$T_n \sim \mathrm{B}(n,\frac12)$ is the binomial distribution.

$S_n=\sum_{p\leq n} \binom np$ summed over $p$ prime.

$\pi(y)=\sum_{p\leq y}1$ is the prime counting function.

$A(n)\ll B(n)$ means $|A(n)|\leq CB(n)$ for some absolute constant $C>0$.

Lower Bound (Conjecture)

Fix $x>0$. We have $$ P\left(p \mathrm {\ is \ prime}, \ T_n=p, \ \frac n2 -x\sqrt n\leq T_n \leq \frac n2 + x\sqrt n\right) \leq \frac {S_n}{2^n}. $$ Since binomial coefficients $\binom nk$ peak at $k=n/2$ and become smaller when $k$ is further away from $n/2$, we take the following as a lower bound of the probability.

$$ \left(\pi(\frac n2+x\sqrt n)-\pi(\frac n2-x\sqrt n)\right)P\left(T_n=\lfloor \frac n2+x\sqrt n\rfloor\right). $$

By Stirling's formula, and $\log (1+t)=t-\frac{t^2}2+O(\frac1{t^3})$ for $|t|\leq 1/2$, we have $$ P\left(T_n=\lfloor \frac n2+x\sqrt n\rfloor\right)\sim \frac{2}{\sqrt{2\pi n}} e^{-2x^2}. $$

If we have the following conjecture (see this survey by Yildrim for more information), $$ \pi(\frac n2+x\sqrt n)-\pi(\frac n2-x\sqrt n)\sim \frac{2x\sqrt n}{\log n}, $$ then we have the conjectural lower bound $$ \frac{4x\cdot 2^n}{e^{2x^2}\sqrt{2\pi}\log n} \lesssim S_n. $$

Upper Bound (Easy Version)

By Hoeffding's inequality, we give a bound of sum over primes further away from $n/2$. $$ P\left(p \mathrm {\ is \ prime}, \ T_n=p, \ |T_n-\frac n2|>\sqrt{n \log\log n} \right) $$ $$ \leq P\left( |T_n-\frac n2|\geq \sqrt{n \log\log n}\right)\leq 2e^{-2\log\log n}\ll \frac{1}{(\log n)^2}. $$ For the primes close to $n/2$, we apply Brun-Titchmarsh inequality, $$ P\left(p \mathrm {\ is \ prime}, \ T_n=p, \ |T_n-\frac n2|\leq \sqrt {n \log\log n }\right) $$ $$\leq \left(\pi(\frac n2 + \sqrt {n \log\log n})-\pi(\frac n2-\sqrt {n \log\log n})\right)P\left(T_n=\lfloor \frac n2\rfloor\right) $$ $$ \ll \frac{\sqrt{n\log\log n}}{\log n} \cdot \frac{1}{\sqrt n} = \frac{\sqrt{\log\log n}}{\log n}. $$ Therefore, we have the upper bound $$ S_n\ll \frac{2^n\sqrt{\log\log n}}{\log n}. $$

Upper Bound (Added on 9/28)

With more care, we can remove $\sqrt{\log\log n}$ from the upper bound.

Again, by Hoeffding's inequality, $$ P\left(p \mathrm {\ is \ prime}, \ T_n=p, \ |T_n-\frac n2|>\sqrt{n \log\log n} \right) \ll \frac1{(\log n)^2}. $$

For the primes in $|T_n-\frac n2|\leq\sqrt{n \log\log n} $, consider the subintervals $$ \frac n2 + x\sqrt n \leq p < \frac n2 + (x+1)\sqrt n $$ for nonnegative integers $x\leq \sqrt{\log\log n}$ first.

Then the negative integers $-\sqrt{\log\log n}\leq x$ are treated similarly.

The number of primes in this interval is by Brun-Titchmarsh inequality, $\ll \frac{\sqrt n}{\log n}$, while $$P(T_n=p)\leq P\left(T_n=\lfloor \frac n2 + x\sqrt n\rfloor\right)\sim \frac{2}{\sqrt{2\pi n}} e^{-2x^2}.$$

Note that the last asymptotic still holds if $|x|\leq \sqrt{\log\log n}$. Then we have

$$ P\left(p \mathrm {\ is \ prime}, \ T_n=p, \ \frac n2 + x\sqrt n \leq p < \frac n2 + (x+1)\sqrt n\right) $$ $$ \ll \frac{\sqrt n}{\log n} \cdot \frac{e^{-2x^2}}{\sqrt n}. $$ Thus by summing over $x$, $$ P\left(p \mathrm {\ is \ prime}, \ T_n=p, \ |T_n-\frac n2|\leq \sqrt {n \log\log n }\right)$$ $$\ll \sum_{x=0}^{\infty}\frac{e^{-2x^2}}{\log n}\ll \frac 1{\log n}. $$ Therefore, we obtain $$ S_n\ll \frac{2^n}{\log n}. $$

Update on 2019/3/4

Nilotpal Kanti Sinha and I started working on writing a paper on this subject. Here is current progress. The proofs are too long to be contained here, but the main idea of splitting up the sum into short intervals is present in this answer. To prove 1, we need Huxley's zero density estimate and its consequence on the primes in the short intervals. (Chapter 5 of this note by Angel Kumchev: https://tigerweb.towson.edu/akumchev/a5.pdf).

  1. We have for almost all $n$, $$ S_n= \frac{2^n}{\log n}+O\left(\frac{2^n}{(\log n)^2}\right) \ \textrm{as }n\rightarrow \infty. $$

Here, almost all means that the number of $n\in [1,N]\cap \mathbb{Z}$ for which the asymptotic formula fails is $o(N)$.

  1. We have $$ \alpha:=\liminf_{n\rightarrow\infty}\frac{S_n\log n}{2^n}\leq 1\leq \limsup_{n\rightarrow\infty} \frac{S_n \log n}{2^n} \leq 4. $$

  2. The statement $\alpha>0$ implies that, there is $b>0$ and $N_0(b)>0$ such that, $$ \pi\left(\frac n2 +\sqrt {n\log\log n}\right)-\pi\left( \frac n2-\sqrt {n\log\log n}\right)\geq \frac{b\sqrt n}{\log n} \ \textrm{for all }n\geq N_0(b). $$