Given the first N integers, how many large prime factors can I disallow and still have half the set remaining?

Conjecture:
For $N$ sufficiently large, take the set of positive integers up to $N$. Then remove all numbers which have a prime factor larger than $\sqrt{N}$. More than half the set will remain.

Example:

Suppose I had the set of integers from 1 to 37. Now I keep only the multiples of 2, 3, and 5, disallowing any larger prime factors. I end up with the following set. $$\{2,3,4,5,6,8,9,10,12,15,16,18,20,24, 25,27,30,32,36\} $$

Just over half the set $19/37 = .\overline{513}$ remains.

Is my conjecture true?


Not an answer, but some progress: As a number that has a factor greater than $\sqrt N$ cannot have any other factor, each prime of interest removes a disjoint subset. The number removed is then $$\sum_{\substack{p \text { prime} \\ p \gt \sqrt N}}\left \lfloor \frac Np \right \rfloor$$ We want to prove this sum is less than $\frac N2$. Two natural things to do are to 1) ignore the floor function and 2) assume that the primes are "randomly distributed" and that each number $n$ has $\frac 1{\log n}$ chance of being prime. Then change the sum to an integral and get $$\sum_{\substack{p \text { prime} \\ p \gt \sqrt N}}\left \lfloor \frac Np \right \rfloor\approx N\int_{\sqrt N}^N\frac {dn}{n \log n}=N\left. \frac 1{\log(\log(n))}\right |_{\sqrt N}^N=N\log 2$$ which fails badly, so it appears ignoring the floor is too coarse. Just doing a numeric test in Alpha of $$\sum _{n=\sqrt N}^N \left \lfloor \frac N{n \log n}\right \rfloor$$ for for $N=10000$ gave $3954$, so I suspect the prime number theorem can prove it for large $N$. Then a computer search will clear small $N$


The conjecture is wrong. Asymptotically, the proportion of removed numbers is, as Ross Millikan found, $\log 2$.

We can see that also in a different way: For $k < \sqrt{N}$, we remove the numbers $k\cdot p$ for all primes $\sqrt{N} < p \leqslant \frac{N}{k}$, and since no number $\leqslant N$ can have two prime factors $> \sqrt{N}$, we count no number multiple times. So the count of removed numbers is

$$\sum_{k=1}^\sqrt{N} \left(\pi\left(\frac{N}{k}\right) - \pi(\sqrt{N})\right) \approx \sum_{k=1}^\sqrt{N} \pi\left(\frac{N}{k}\right) - \sqrt{N}\pi(\sqrt{N}).$$

$\sqrt{N}\pi(\sqrt{N}) \approx \frac{2N}{\log N}$, which for small $N$ is a sizable proportion of the numbers $\leqslant N$, but for large $N$ becomes insignificant.

For the sum, we underestimate $\pi(x)$ by using $\frac{x}{\log x}$, so

$$\begin{align} \sum_{k=1}^\sqrt{N} \pi\left(\frac{N}{k}\right) &\geqslant \sum_{k=1}^\sqrt{N} \frac{N}{k(\log N - \log k)}\\ &\sim N\int_1^{\sqrt{N}} \frac{dt}{t(\log N - \log t)}\\ &= N\cdot\log 2. \end{align}$$

Generally, if for $1 \geqslant\alpha \geqslant \frac12$, we remove the multiples of all primes $> N^\alpha$, the same argument shows that the count of removed numbers is

$$\sum_{k=1}^{N^{1-\alpha}} \left(\pi\left(\frac{N}{k}\right) - \pi(N^\alpha)\right) \approx \sum_{k=1}^{N^{1-\alpha}}\pi\left(\frac{N}{k}\right) - N^{1-\alpha}\pi(N^\alpha) \approx \sum_{k=1}^{N^{1-\alpha}}\pi\left(\frac{N}{k}\right) - \frac{N}{\alpha\log N},$$

and the sum is asymptotically

$$\sum_{k=1}^{N^{1-\alpha}}\pi\left(\frac{N}{k}\right) \sim N\int_1^{N^{1-\alpha}} \frac{dt}{t(\log N - \log t)} = N \int_0^{1-\alpha} \frac{dv}{1-v} = N\cdot\log \frac1\alpha,$$

so to keep half of the numbers $\leqslant N$, we must choose $\alpha \geqslant e^{-1/2}$.