Generalized PNT in limit as numbers get large

Solution 1:

This is a great question. There has been a lot of work done regarding the distribution of the number of prime factors function, and the most famous of which is the Erdos Kac Theorem, which states that the number of prime factors is in fact normally distributed.

We can ask, what is the average number of prime factors for integers in the interval $[1,N]$? Lets define the function $\omega(n)=\sum_{p|n} 1$ to be the number of distinct prime factors of $n$.

In 1917, Hardy and Ramanujan used the circle method to prove that almost all integers in the interval $[1,N]$ asymptotically have $\log \log N$ prime factors. Specifically, if $$\mathcal{E}_N=\{n\leq N: |\omega(n)-\log \log N|>(\log \log N)^{3/4}\},$$ then $|\mathcal{E}_N|=o(N)$. In 1934 Turan gave an alternative proof of this by showing that both the mean and variance of $\omega(n)$ are equal to $\log \log N$. Specifically, Turan proved that $$\frac{1}{N}\sum_{n\leq N}\omega (n) =\log \log N(1+o(1))$$ and that $$\frac{1}{N}\sum_{n\leq N}(\omega (n)-\log \log N)^2 =\log \log N(1+o(1)).$$

Now, here is where things become really interesting. Since we know the mean and variance, lets normalize and look at the function $$\frac{\omega(n)-\log \log n}{\sqrt{\log \log n}}.$$ We can ask, how is this distributed? In 1940, Erdos and Kac proved that $\omega(n)$ is normally distributed. That is, the number of prime factors function behaves like the normal distribution with mean $\log \log N$ and variance $\log \log N$. Specifically, for any fixed real numbers $a,b$, we have that$$\frac{1}{N}\left|\left\{n\leq N:\ a\leq\frac{\omega(n)-\log \log n}{\sqrt{\log \log n}}\leq b\right\}\right|=\frac{1}{\sqrt{2\pi}}\int_a^b e^{-x^2/2}dx +o(1).$$

Solution 2:

I am not quite sure that I understand exactly what you're asking, but I think that some of the following points may shed some light on the matter.

Suppose we fix $n$ and let $k$ vary. Studying $\pi_k(n)/n$ is then equivalent to looking at the distribution of the number of prime factors of integers up to $n$ (by the way, this turns out not to be very sensitive to the distinction between "with repetition" and "without repetition").

We have to be a bit careful as the asymptotic for $\pi_k(n)$ is intended for $k$ fixed and $n$ growing (but we can still use it as long as $k$ grows sufficiently slowly with $n$). With that caveat in mind, the approximate distribution given by GPNT can be rewritten as:

$$\frac{\pi_{k+1}(n)}{n} \approx \frac{\lambda^{k} e^{-\lambda}}{k!},\quad k=0,1,2,\ldots$$

where $\lambda = \log \log x$, and I have shifted the value of $k$ by 1. By rewriting it this way, we can recognize the right hand side exactly as a Poisson distribution with parameter $\lambda = \log \log x$.

Loosely interpreted, every number (above 1) is guaranteed to have at least one prime factor, but the additional primes beyond that behave Poissonly with a rate of $\log \log x$. The peak of any Poisson distribution always occurs at $\lambda$. You don't need to take derivatives to see this, just look at the ratio of the $k-1$ and $k$ terms:

$$\frac{\lambda^k e^{-\lambda}}{k!} \cdot \frac{(k-1)!}{\lambda^{k-1} e^{-\lambda}} = \frac{\lambda}{k}.$$

So clearly the Poisson distribution is increasing in $k$ for $k < \lambda$ and decreasing for $k > \lambda$, and the peak occurs at $\lfloor\lambda\rfloor$. Moreover, it is well-known (see the linked Wikipedia article) that the normal distribution with equal mean and variance $\lambda$ approximates the Poisson distribution when $\lambda$ is large. Perhaps this explains the bell curve you observed.

Finally, note that equal mean and variance $\log \log x$ is precisely the situation that arises from the Erdős-Kac theorem in Eric Naslund's answer, although let me reiterate that I am deliberately ignoring the relative sizes of $n$ and $k$ for which these separate facts are known to be true.