Accuracy of approximation to inclusion-exclusion formula in prime sieve

Solution 1:

To help with the notation I will change your $k$ to an $m$, so we have

$$ A_n = n - \sum \lfloor \frac{n}{p_i} \rfloor + \sum \lfloor \frac{n}{p_i p_j} \rfloor - \sum \lfloor \frac{n}{p_i p_j p_k} \rfloor + \cdots, $$

where the summations are over distinct primes in our set $ \lbrace p_1,p_2,\ldots,p_m \rbrace ,$ and

$$ B_n = n \prod_{i=1}^m \left( 1- \frac{1}{p_i} \right). $$

Using $ x \ge \lfloor x \rfloor > x – 1 $ we have

$$ A_n \ge n - \sum \frac{n}{p_i} + \sum \frac{n}{p_i p_j} - \binom{m}{2} -
\sum \frac{n}{p_i p_j p_k} + \sum \frac{n}{p_i p_j p_k p_l} - \binom{m}{4} + \cdots.$$

Noting that $$\binom{m}{0}+\binom{m}{2}+\binom{m}{4} \cdots = 2^{m-1},$$

we obtain $A_n \ge B_n – 2^{m-1}.$ Similarly, we have $A_n \le B_n + 2^{m-1}$ and hence we can bound

$$ \left| \frac{A_n – B_n}{n} \right| \le \frac{2^{m-1}}{n} = O(1/n).$$

However, if we take $n \equiv -1 \textrm{ mod } \prod_{i=1}^m p_i$ then we have

$$A_n – B_n = \prod_{i=1}^m \left( 1- \frac{1}{p_i} \right) = \text{a constant.} \quad (1)$$

So the difference is a constant infinitely often.

EDIT: Since the differences between the $ \lfloor \frac{n}{p_i p_j \ldots} \rfloor $ and the $ \frac{n}{p_i p_j \ldots} $ are a maximum when $n \equiv -1 \textrm{ mod } \prod_{i=1}^m p_i$ it appears we have

$$A_n – B_n \le \prod_{i=1}^m \left( 1- \frac{1}{p_i} \right), \quad (2)$$

with equality when $n \equiv -1 \textrm{ mod } \prod_{i=1}^m p_i.$ I do not have a proof of this latter claim, although I have a proof for (1), which I hope to be able to modify to prove (2).

EDIT2: I could not find a proof of (2) because, sadly, it's not true! However, (1) is exact for the given $n.$

EDIT3:

Here's a vast improvement on my previous comments, which I believe solves your problem completely.

Suppose $n \equiv -r \textrm{ mod } \prod_{i=1}^m p_i, $ where $ 0 \le r < \prod_{i=1}^m p_i$ then we have

$$A_n – B_n \le r \prod_{i=1}^m \left( 1- \frac{1}{p_i} \right).$$

I proved this simply by expanding $A_n – B_n$ and doing the summations. Also, I believe the exact formula to be

$$A_n – B_n = \delta + r \prod_{i=1}^m \left( 1- \frac{1}{p_i} \right) - \Phi_P(r),$$

where $\delta = 1$ when $r$ is coprime to all the $p_i$ and $0$ otherwise, and $\Phi_P(r)$ is the number of positive integers $\le r$ which are coprime to the $p_i.$

Note that when $n=711,$ $r=9,$ $\delta = 0$ and $\Phi_P(9) = 2,$ $P=\lbrace 2,3,5 \rbrace,$ so the above formula gives $$A_{711}-B_{711} = 9 \left( 1 - \frac{1}{2} \right) \left( 1 - \frac{1}{3} \right) \left( 1 - \frac{1}{5} \right) – 2 = 0.4$$ which agrees with your $190-189.6 = 0.4$

When $n=107,$ $r=19,$ $\delta = 1$ and $\Phi_P(19) = 6,$ $P=\lbrace 2,3,7 \rbrace,$ so we have $$A_{107}-B_{107} = 1 + 19 \left( 1 - \frac{1}{2} \right) \left( 1 - \frac{1}{3} \right) \left( 1 - \frac{1}{7} \right) – 6 = 0.42857\ldots,$$

which is also in agreement with the rounded figures you've given in the question.

Sorry, I do not have time to type up the full proof.

Solution 2:

This might be slightly off tangent, but it's definitely in the spirit of your post, and is found at the beginning of most texts in sieve theory.

When your fixed set of primes is the first $k$ primes and $n = p_{k}^2$ then $B_{n}$ is the main term of the sieve of Eratosthenes-Legendre, which gives an approximation for the number of primes in the interval $[p_{k}, p_{k}^2]$. In this setting Merten's theorem gives an asymptotic for $B_{n}$. (It's called Merten's Third theorem in the Wikipedia page). However, an appeal to the prime number theorem shows that this is a reasonably bad approximation, and that $B_{n}$ overestimates the number of primes by a factor of $2e^{-\gamma}$ where $\gamma$ is the Euler-Mascheroni constant.

Terry Tao has a good blog posting on why repeated use of the inclusion-exclusion principle yields non-optimal results.