Most efficient algorithm for nth prime, deterministic and probabilistic?

For large $n$, your best bet is to estimate the $n$th prime number $x$ e.g. using http://en.wikipedia.org/wiki/Prime_number_theorem#Approximations_for_the_nth_prime_number and then use sub-linear methods suggested by Daniel Fischer and others to calculate the exact number $\pi(x)$ of primes less than or equal to your estimate $x$ for the $n$th prime. Then you have a choice to make. Either you can zero in on the $n$th prime number using primality testing to go through candidates starting from your initial guess, or you can refine your estimate for the $n$th prime and recalculate the number of primes less than or equal to your estimate again. Considering that the given error for the $n$th prime number estimate given on the wikipedia link is $O(n / (\log n)^2)$, and that calculating $\pi(x)$ is about $O(x^{2/3})$ time or better, a good strategy is to essentially do a sort of binary search to refine your choice of $x$ so that $\pi(x)$ gets closer and closer to $n$, until the error between $\pi(x)$ and $n$, multiplied by the computational cost for primality testing for numbers about equal to $x$, is lower than the computational cost of computing $\pi(x)$ again. Then you do primality testing to zero in on the exact $n$th prime number.

This is mostly elaborating on user2566092's good answer. First note that at this time we're looking at $n < 10^{22}$ or so, probably $n < 2^{64}$, as otherwise even the fast prime counting methods start taking a very long time. Two things are important for the performance: a good estimate, and a good prime count method. Having a fast counting method for the endpoint is of course important also, but less than the other two. I don't believe doing more than one prime count should ever be necessary -- if it turns out faster, I believe that indicates the estimate or sieving need improvement.

For the estimate, the 2nd order Cipolla 1902 estimate from Wikipedia is not a bad start, but it's substantially worse than some other methods. You can add a small third order correction to the Cipolla formula which will reduce the error in about 1/4 at $10^{15}$. Much better is using an inverse Riemann R function, which is 3 orders of magnitude smaller error at $10^{15}$ -- off by only 11 million vs. 37 billion. I've found what works best for me is ${\rm Li}^{-1}(n) + {\rm Li}^{-1}(\sqrt{n})/4$, which will nearly always underestimate (nice for sieving) and gets almost as close as inverse R, while also being a little easier to calculate. Experiment with different estimates.

Once we've found an estimate, we run a fast prime count, e.g. Lehmer's method, LMO, or extended LMO. The latter has open source implementations able to calculate $10^{14}$ in a couple seconds, and $10^{16}$ in half a minute (timings will vary based on machine of course), with very little memory use.

Once we've done the prime count, we just need to count primes backwards or forwards to make up for the error. If we used a good estimate, the number should be quite small in relation to the number we did a prime count for. At this point it should also be noted that there are fast deterministic primality tests for numbers under $2^{64}$. Either BPSW, a 7-base Miller-Rabin test, or a 3-base hashed Miller-Rabin test will be completely accurate for all 64-bit numbers. I noted earlier that I use a tight but low estimate, so it rarely ever has to count backwards -- using repeated prev_prime calls works fairly well. For forwards search, which is the usual case, I do segment sieves and counts of the resulting blocks of bits. There are a number of optimizations possible here.

To give you an idea, it looks like nth_prime$(10^{12})$ takes 0.8sec, and nth_prime$(10^{15})$ about 1min 2sec on my computer, without using any tables. That's substantially faster than sieving primes up to ~3.7e16.

If you were going to count one at a time, a sieve would be the better choice. Even at large values using a traditional sieve, you can use the sieve to efficiently reduce segments so only a few candidates need to be run through a primality test. In my opinion, the Sieve of Eratosthenes is superior to the Sieve of Atkin, even in the rare cases when the latter is properly implemented. Segmenting is important for large sizes. If you just want to get the job done rather than implement it yourself, look at primesieve. It's really fast, well maintained, and has nice interfaces for C and C++.

There is one other alternative, a hybrid table approach. This is used by the Nth Prime Page, and there was some discussion of putting it in SAGE years ago. Here rather than using a prime count function, we just store large tables of prime counts or nth prime positions, do the lookup, then sieve the difference. The downside is that it doesn't scale well to large values (e.g. the Nth Prime Page is limited to $10^{12}$). I use a method like this for twin prime counts, and even with only ~100 values it can save a lot of time. If your values aren't too large, and you don't want to implement all the work for a fast computed nth prime, this may be something to consider. You just need some tables (compiled or loaded on demand) and fast segment sieving.

Most efficient algorithm for nth prime, deterministic and probabilistic?

Related

Recent Posts