Asymptotics for a partial sum of binomial coefficients

Solution 1:

This is more of an extended comment than an answer, but you may find it useful.

In Exercise 9.42 of Concrete Mathematics (page 492 in the second edition), the authors establish the asymptotic formula $$\sum_{k=0}^{\lambda n} {n\choose k}=2^{n H(\lambda)-\lg(n)/2+O(1)}$$ where $0<\lambda< 1/2$, $H(\lambda)= \lambda \lg({1\over \lambda})+(1-\lambda)\lg({1\over 1-\lambda})$, and $\lg$ is the binary logarithm.

The sum on the left is a small fraction of the full sum $2^n$. Note that this is a multiplicative approximation, the ratio of the sum and the approximation remains bounded as $n\to\infty$, not the difference.

Their result has an interpretation using probability. Write $$\sum_{k=0}^{\lambda n} {n\choose k} =2^n\,\mathbb{P}(X_n/n\leq \lambda)$$ where $X_n$ is a binomial$(n,1/2)$ random variable. The theory of large deviations suggests an approximation $$\mathbb{P}(X_n/n\leq \lambda)\approx \exp(-nI(\lambda))$$ where $I$ is the rate function $I(x)=x\log(x)+(1-x)\log(1-x)+\log(2)$. This gives the leading factor in their approximation; they also divide by $\sqrt{n}$ for further accuracy.

If you are willing to use the standard normal distribution function $\Phi(z)=\mathbb{P}(Z\leq z)$, then by the central limit theorem, we have $$ \mathbb{P}(X_n\leq \lambda n) =\mathbb{P}\left({X_n-n/2\over\sqrt{n/4}}\leq {\lambda n-n/2\over\sqrt{n/4}} \right) \approx\mathbb{P}\left(Z\leq \sqrt{n}(2\lambda-1) \right).$$ In other words, we have the approximation $$ \sum_{k=0}^{\lambda n} {n\choose k} =2^n\, \Phi(\sqrt{n}(2\lambda-1)).$$ This seems to be at least as accurate as the Concrete Mathematics approximation, and you can get more accuracy by using the "continuity correction".

More detailed asymptotic results for binomial tails can be found in this paper by Andrew Carter and David Pollard. In particular, see Theorem 1. I hope you find what you want there; happy hunting!

Solution 2:

Asked for is an approximation to the sum $$\sum_{j=0}^{cn} {n\choose j}$$ for some fixed constant $c$. We provide more than this, an approximation to the finite sum over binomial coefficients $$\sum_{j=a}^b {n\choose j}$$ for integers $0\le a \le b \le n$ The asymptotic approximation (which proves to be useful numerically, even for modest values of $n$) can be obtained by combining two ideas.

First, the sum of a function $g$ over the integers can be written as an integral over $g$ plus an asymptotic series of the odd derivatives of $g$ by using the midpoint form of the Euler-Maclaurin sum formula, $$\sum_{j=a}^b g(j) \sim \int_{a-1/2}^{a+1/2}g(x)\,dx + \sum_{j=1}^{\cdots} {-(1-2^{1-2j})B_{2j}\over (2j)!}\big(g^{(2j-1)}(b+1/2)-g^{(2j-1)}(a-1/2)\big)$$ Here $B_{2j}$ is a Bernoulli number.

Second, the binomial coefficient itself is the real function $$g(x) = {n\choose x} = {\Gamma(n+1)\over \Gamma(n-x+1)\Gamma(x+1)}\ ,\qquad\hbox{$x>-1$}$$ evaluated at the integers; and an asymptotic series for that real function can be developed using the asymptotic expansion of for the $\Gamma$-function (Stirling's approximation) $$\ln \Gamma(x) \sim (x-1/2)\ln x - x + {1\over 2}\ln (2\pi) + \sum_{m=0}^{\cdots} {B_{2m}\over (2m)(2m-1) x^{2m-1}}$$ Combining the two series gives an asymptotic approximation to the desired sum where all the terms are known, the various integrations required being tractable.

The lead approximation, as other contributors have noted, involves the error function $${\rm erf}\,(x) = {1\over \sqrt{\pi}}\int_0^x e^{-t^2}\,dt$$ and we have $$S \equiv {1\over 2^n}\sum_{j=a}^b {n\choose j}\sim {1\over 2} \,{\rm erf}\, (y_b) - {1\over 2}\,{\rm erf}\,(y_a) + O(1/n)$$ The advantage to working with $S$ and not the original sum is that $S$ is $O(1)$ as $n\to \infty$ (and $S=1$ for $a=0$ and $b=n$). We have made the definitions $$\eqalign{ \xi_a & = a-n/2-1/2 \cr y_a &= \xi_a/\sqrt{n/2}\cr} \qquad \eqalign{ \xi_b &= b-n/2+1/2\cr y_b &= \xi_b/\sqrt{n/2}\cr}$$ Using Maple${}^{\rm TM}$ to do the necessary integrations exactly and to organize the necessary algebra, we can find the higher terms as $$S \approx f(y_b)-f(y_a) + {\sqrt{2}\over 6\sqrt{\pi}}{1\over n^{3/2}}\big(h(\xi_b)-h(\xi_a)\big)$$ Here $$f(y) = {1\over 2}\,{\rm erf}\,(y) + {1\over\sqrt{\pi}}\sum_{j=1}^{\cdots} {e^{-y^2}L_j\over n^j} $$ is a sum whose coefficients $L_j$ are all $O(1)$ for large $n$, even if both $a$ and $b$ are $O(n)$. We also find $$h(y) = e^{-2\xi^2/n} \sum_{j=0}^{\cdots} {\xi M_j(\xi)\over n^j}$$ which is a more complicated sum; the coefficients $\xi M_j(\xi)$ can be large since we can have $\xi = O(n)$, but when $\xi$ is large the contribution to $h$ is heavily suppressed by the lead factor of $e^{-2\xi^2/n}$, so that $h$ functions numerically as a series whose terms diminish successively, though I do not know if they fall as $O(1/n)$ or not. In practice a keeping few terms works well if $n$ is not too small.

The first few corrections are $$\eqalign{ L_1 &= {1\over 12}\,(2y^2-3)y\cr L_2 &= -{1\over 1440}\,(40y^6-292y^4+410y^2-45)y\cr L_3 &={1\over 362880}\,(1120y^{10}-20048y^8+103248y^6-159768y^4+33390y^2+14175)y\cr L_4 &= -{1\over 87091200}(22400y^{14}-745920y^{12}+8413728y^{10}-38540496y^8\cr &\qquad\qquad\qquad\qquad\qquad\qquad+65383848y^6-19336212y^4-15416730y^2+893025)y\cr }$$ and $$\eqalign{ M_0(\xi) &=1\cr M_1(\xi) &=-{9\over 10} + {43\over 15}\lambda -{4\over 3}\lambda^2\cr M_2(\xi) &= +{25\over 224}-{1153\over 252}\lambda+{3296\over 315}\lambda^2-{268\over 45}\lambda^3+{8\over 9}\lambda^4\cr M_3(\xi) &=+{49\over 64}-{69\over 160}\lambda-{12329\over 600}\lambda^2+{62506\over 1575}\lambda^3-{112636\over 4725}\lambda^4+{728\over 135}\lambda^5-{32\over 81}\lambda^6\cr }$$ where $\xi$ and $\lambda = \xi^2/n$ can both be $O(n)$ if both $a$ and $b$ are $O(n)$. With the terms up to $L_4$ and $M_3$ included, the first term missing from the expansion has an error of $O(1/n^5)$.

Asymptotic series have the characteristic numerical defect that if one adds too many terms, the partial sum of the series departs from the function being approximated. To check that, I have evaluated the series with including the terms out to $L_7$ and $M_6$, when the first term missing from the series is $O(1/n^8)$; and for $n$ as modest as $100$ then the absolute error in $S$ improves steadily with an increasing number of terms to $<10^{-14}$ for $a=0$ and all $b$ with $0\le b\le n$. So if the series begins to depart from the true sum it will have to do so at higher order than that. Evaluation is fast because the evaluation to approximate a sum involves but two calls to ${\rm erf}$ (which are fast); two calls to $\exp$; and one call to sqrt, outside of the evaluation of a few polynomials.