Unusual pattern in the distribution of odd primes

I have recently noticed an unusual pattern in the distribution of odd primes.

Each one of the following sets contains approximately half of all odd primes:

  • $A_n=\{4k+1: 0\leq k\leq n\}=\{1,5,9,13,\dots,4n+1\}$
  • $B_n=\{4k+3: 0\leq k\leq n\}=\{3,7,11,15,\dots,4n+3\}$
  • $C_n=\{6k+1: 0\leq k\leq n\}=\{1,7,13,19,\dots,6n+1\}$
  • $D_n=\{6k+5: 0\leq k\leq n\}=\{5,11,17,23,\dots,6n+5\}$

More precisely:

  • Let $P(S)$ denote the number of odd primes in the set $S$
  • Let $\pi(x)$ denote the number of odd primes smaller than $x$

Then for every sufficiently large value of $n$:

  • ${P(A_n)}\approx{P(B_n)}\approx\frac12\pi(4n+4)$
  • ${P(C_n)}\approx{P(D_n)}\approx\frac12\pi(6n+6)$

Now, all of this is pretty easy to observe (though probably not so easy to prove).

The following facts are subsequently obvious for every sufficiently large $n$ as well:

  • ${P(A_n)}\leq{P(B_n)}\implies{P(A_n)}\leq\frac12\pi(4n+4)\leq{P(B_n)}$
  • ${P(A_n)}\geq{P(B_n)}\implies{P(A_n)}\geq\frac12\pi(4n+4)\geq{P(B_n)}$
  • ${P(C_n)}\leq{P(D_n)}\implies{P(C_n)}\leq\frac12\pi(6n+6)\leq{P(D_n)}$
  • ${P(C_n)}\geq{P(D_n)}\implies{P(C_n)}\geq\frac12\pi(6n+6)\geq{P(D_n)}$

This is because $A_n$ and $B_n$ as well as $C_n$ and $D_n$ are "complementary" to each other:

  • The set ${A_n}\cap{B_n}$ is empty, and the set ${A_n}\cup{B_n}$ contains all odd primes smaller than $4n+4$
  • The set ${C_n}\cap{D_n}$ is empty, and the set ${C_n}\cup{D_n}$ contains all odd primes smaller than $6n+6$

Nevertheless, for almost every value of $n$:

  • ${P(A_n)}\leq{P(B_n)}$
  • ${P(C_n)}\leq{P(D_n)}$

The graphs and table below provide some empirical evidence:

enter image description hereenter image description hereenter image description here

 range     | odd primes | cases where either P(A)>P(B) or P(C)>P(D)
-----------|------------|-------------------------------------------
 10000     | 1228       | 0
 100000    | 9591       | 1
 1000000   | 78497      | 239
 10000000  | 664578     | 239
 100000000 | 5761454    | 1940

I would expect primes to be equally distributed between $A_n$ and $B_n$ and between $C_n$ and $D_n$.

In other words, I would expect:

  • [Number of primes of the form $4k+1$] $\approx$ [Number of primes of the form $4k+3$]
  • [Number of primes of the form $6k+1$] $\approx$ [Number of primes of the form $6k+5$]

But since the empirical evidence above suggests otherwise, my questions are:

  1. Is this indeed the case, or do they become equally distributed on a larger range?
  2. If this is indeed the case, what research has been conducted attempting to explain it?

Thanks


The phenomenon you observe is real. This is known, the case for $4$, as Chebyshev's bias Another relevant keyword is Shanks–Rényi race problem.

"Prime number races" by Granville and Martin is a fantastic introduction to this circle of ideas.

But, let me include some basic information here (more-or-less self-plagiarizing an MO-answer).

On a rough scale the frequency counts of primes congruent $1$ and $3$ modulo $4$ are the same; both counting functions are asymptotic to $\frac{1}{2} \text{li}(x)$ with error terms essentially as commonly know from the prime counting function. This is the well-know Prime Number Theorem for arithmetic progressions (mentioned by Dietrich Burde).

However, if one compares the exact counts of primes congruent to $1$ and $3$ modulo $4$ respectively, let me call the respective counting functions $\pi_1(x)$ and $\pi_3(x)$, then one notes (at least at the start) that there are more congruent to $3$ than congruent to $1$, so $\pi_3(x) > \pi_1(x)$, an observation made by Chebyshev. However, Littlewood showed that the difference $\pi_3(x) - \pi_1(x)$ can also be negative, and even is infinitely often essentially as negative as it can get (under the assumption that both should not deviate from $\text{li}(x)/2$ by more than $\sqrt{x}$ and a little).

So, now one might think that one just came across a phenomenon of small numbers, as you said, with this initial bias however this is not so there is a bias in the distribution.

If one defines $P$ to be the set of all integers such that $\pi_3(x) > \pi_1(x)$ then these are not "half of the integers". Rubinstein and Sarnak proved (under widely believe conjectures on zeros of L-functions, GRH and GSH) that the logarithmic density of this set, that is the limit of $$ \frac{1}{\log x} \sum_{n \in P} \frac{1}{n} $$
is $0.9959...$, so quite close to $1$ though not equal to $1$ so "almost all" is perhaps too strong.


The Dirichlet density of primes $p\equiv 1 \bmod 4$ and of primes $p\equiv 3 \bmod 4$ is both equal to $\frac{1}{2}$, but the number of primes up to $x$ in both classes can differ. This is what is meant by Prime number races ( see the answer of quid).