Unusual pattern in the distribution of odd primes
I have recently noticed an unusual pattern in the distribution of odd primes.
Each one of the following sets contains approximately half of all odd primes:
- $A_n=\{4k+1: 0\leq k\leq n\}=\{1,5,9,13,\dots,4n+1\}$
- $B_n=\{4k+3: 0\leq k\leq n\}=\{3,7,11,15,\dots,4n+3\}$
- $C_n=\{6k+1: 0\leq k\leq n\}=\{1,7,13,19,\dots,6n+1\}$
- $D_n=\{6k+5: 0\leq k\leq n\}=\{5,11,17,23,\dots,6n+5\}$
More precisely:
- Let $P(S)$ denote the number of odd primes in the set $S$
- Let $\pi(x)$ denote the number of odd primes smaller than $x$
Then for every sufficiently large value of $n$:
- ${P(A_n)}\approx{P(B_n)}\approx\frac12\pi(4n+4)$
- ${P(C_n)}\approx{P(D_n)}\approx\frac12\pi(6n+6)$
Now, all of this is pretty easy to observe (though probably not so easy to prove).
The following facts are subsequently obvious for every sufficiently large $n$ as well:
- ${P(A_n)}\leq{P(B_n)}\implies{P(A_n)}\leq\frac12\pi(4n+4)\leq{P(B_n)}$
- ${P(A_n)}\geq{P(B_n)}\implies{P(A_n)}\geq\frac12\pi(4n+4)\geq{P(B_n)}$
- ${P(C_n)}\leq{P(D_n)}\implies{P(C_n)}\leq\frac12\pi(6n+6)\leq{P(D_n)}$
- ${P(C_n)}\geq{P(D_n)}\implies{P(C_n)}\geq\frac12\pi(6n+6)\geq{P(D_n)}$
This is because $A_n$ and $B_n$ as well as $C_n$ and $D_n$ are "complementary" to each other:
- The set ${A_n}\cap{B_n}$ is empty, and the set ${A_n}\cup{B_n}$ contains all odd primes smaller than $4n+4$
- The set ${C_n}\cap{D_n}$ is empty, and the set ${C_n}\cup{D_n}$ contains all odd primes smaller than $6n+6$
Nevertheless, for almost every value of $n$:
- ${P(A_n)}\leq{P(B_n)}$
- ${P(C_n)}\leq{P(D_n)}$
The graphs and table below provide some empirical evidence:
range | odd primes | cases where either P(A)>P(B) or P(C)>P(D)
-----------|------------|-------------------------------------------
10000 | 1228 | 0
100000 | 9591 | 1
1000000 | 78497 | 239
10000000 | 664578 | 239
100000000 | 5761454 | 1940
I would expect primes to be equally distributed between $A_n$ and $B_n$ and between $C_n$ and $D_n$.
In other words, I would expect:
- [Number of primes of the form $4k+1$] $\approx$ [Number of primes of the form $4k+3$]
- [Number of primes of the form $6k+1$] $\approx$ [Number of primes of the form $6k+5$]
But since the empirical evidence above suggests otherwise, my questions are:
- Is this indeed the case, or do they become equally distributed on a larger range?
- If this is indeed the case, what research has been conducted attempting to explain it?
Thanks
The phenomenon you observe is real. This is known, the case for $4$, as Chebyshev's bias Another relevant keyword is Shanks–Rényi race problem.
"Prime number races" by Granville and Martin is a fantastic introduction to this circle of ideas.
But, let me include some basic information here (more-or-less self-plagiarizing an MO-answer).
On a rough scale the frequency counts of primes congruent $1$ and $3$ modulo $4$ are the same; both counting functions are asymptotic to $\frac{1}{2} \text{li}(x)$ with error terms essentially as commonly know from the prime counting function. This is the well-know Prime Number Theorem for arithmetic progressions (mentioned by Dietrich Burde).
However, if one compares the exact counts of primes congruent to $1$ and $3$ modulo $4$ respectively, let me call the respective counting functions $\pi_1(x)$ and $\pi_3(x)$, then one notes (at least at the start) that there are more congruent to $3$ than congruent to $1$, so $\pi_3(x) > \pi_1(x)$, an observation made by Chebyshev. However, Littlewood showed that the difference $\pi_3(x) - \pi_1(x)$ can also be negative, and even is infinitely often essentially as negative as it can get (under the assumption that both should not deviate from $\text{li}(x)/2$ by more than $\sqrt{x}$ and a little).
So, now one might think that one just came across a phenomenon of small numbers, as you said, with this initial bias however this is not so there is a bias in the distribution.
If one defines $P$ to be the set of all integers such that $\pi_3(x) > \pi_1(x)$ then these are not "half of the integers".
Rubinstein and Sarnak proved (under widely believe conjectures on zeros of L-functions, GRH and GSH) that the logarithmic density of this set, that is the limit of
$$
\frac{1}{\log x} \sum_{n \in P} \frac{1}{n}
$$
is $0.9959...$, so quite close to $1$ though not equal to $1$ so "almost all" is perhaps too strong.
The Dirichlet density of primes $p\equiv 1 \bmod 4$ and of primes $p\equiv 3 \bmod 4$ is both equal to $\frac{1}{2}$, but the number of primes up to $x$ in both classes can differ. This is what is meant by Prime number races ( see the answer of quid).