Generate a list of primes up to a certain number

Solution 1:

That sieve posted by George Dontas is a good starting point. Here's a much faster version with running times for 1e6 primes of 0.095s as opposed to 30s for the original version.

sieve <- function(n)
{
   n <- as.integer(n)
   if(n > 1e8) stop("n too large")
   primes <- rep(TRUE, n)
   primes[1] <- FALSE
   last.prime <- 2L
   fsqr <- floor(sqrt(n))
   while (last.prime <= fsqr)
   {
      primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
      sel <- which(primes[(last.prime+1):(fsqr+1)])
      if(any(sel)){
        last.prime <- last.prime + min(sel)
      }else last.prime <- fsqr+1
   }
   which(primes)
}

Here are some alternate algorithms below coded about as fast as possible in R. They are slower than the sieve but a heck of a lot faster than the questioners original post.

Here's a recursive function that uses mod but is vectorized. It returns for 1e5 almost instantaneously and 1e6 in under 2s.

primes <- function(n){
    primesR <- function(p, i = 1){
        f <- p %% p[i] == 0 & p != p[i]
        if (any(f)){
            p <- primesR(p[!f], i+1)
        }
        p
    }
    primesR(2:n)
}

The next one isn't recursive and faster again. The code below does primes up to 1e6 in about 1.5s on my machine.

primest <- function(n){
    p <- 2:n
    i <- 1
    while (p[i] <= sqrt(n)) {
        p <-  p[p %% p[i] != 0 | p==p[i]]
        i <- i+1
    }
    p
}

BTW, the spuRs package has a number of prime finding functions including a sieve of E. Haven't checked to see what the speed is like for them.

And while I'm writing a very long answer... here's how you'd check in R if one value is prime.

isPrime <- function(x){
    div <- 2:ceiling(sqrt(x))
    !any(x %% div == 0)
}

Solution 2:

This is an implementation of the Sieve of Eratosthenes algorithm in R.

sieve <- function(n)
{
   n <- as.integer(n)
   if(n > 1e6) stop("n too large")
   primes <- rep(TRUE, n)
   primes[1] <- FALSE
   last.prime <- 2L
   for(i in last.prime:floor(sqrt(n)))
   {
      primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
      last.prime <- last.prime + min(which(primes[(last.prime+1):n]))
   }
   which(primes)
}

 sieve(1000000)

Solution 3:

Prime Numbers in R

The OP asked to generate all prime numbers below one billion. All of the answers provided thus far are either not capable of doing this, will take a long a time to execute, or currently not available in R (see the answer by @Charles). The package RcppAlgos (I am the author) is capable of generating the requested output in just over 1 second using only one thread. It is based off of the segmented sieve of Eratosthenes by Kim Walisch.

RcppAlgos

library(RcppAlgos)
system.time(primeSieve(1e9))  ## using 1 thread
  user  system elapsed 
 1.099   0.077   1.176

Using Multiple Threads

And in recent versions (i.e. >= 2.3.0), we can utilize multiple threads for even faster generation. For example, now we can generate the primes up to 1 billion in under half a second!

system.time(primeSieve(10^9, nThreads = 8))
  user  system elapsed 
 2.046   0.048   0.375

Summary of Available Packages in R for Generating Primes

library(schoolmath)
library(primefactr)
library(sfsmisc)
library(primes)
library(numbers)
library(spuRs)
library(randtoolbox)
library(matlab)
## and 'sieve' from @John

Before we begin, we note that the problems pointed out by @Henrik in schoolmath still exists. Observe:

## 1 is NOT a prime number
schoolmath::primes(start = 1, end = 20)
[1]  1  2  3  5  7 11 13 17 19   

## This should return 1, however it is saying that 52
##  "prime" numbers less than 10^4 are divisible by 7!!
sum(schoolmath::primes(start = 1, end = 10^4) %% 7L == 0)
[1] 52

The point is, don't use schoolmath for generating primes at this point (no offense to the author... In fact, I have filed an issue with the maintainer).

Let's look at randtoolbox as it appears to be incredibly efficient. Observe:

library(microbenchmark)
## the argument for get.primes is for how many prime numbers you need
## whereas most packages get all primes less than a certain number
microbenchmark(priRandtoolbox = get.primes(78498),
              priRcppAlgos = RcppAlgos::primeSieve(10^6), unit = "relative")
Unit: relative
          expr      min       lq     mean   median       uq       max neval
priRandtoolbox  1.00000  1.00000 1.000000 1.000000 1.000000 1.0000000   100
  priRcppAlgos 12.79832 12.55065 6.493295 7.355044 7.363331 0.3490306   100

A closer look reveals that it is essentially a lookup table (found in the file randtoolbox.c from the source code).

#include "primes.h"

void reconstruct_primes()
{
    int i;
    if (primeNumber[2] == 1)
        for (i = 2; i < 100000; i++)
            primeNumber[i] = primeNumber[i-1] + 2*primeNumber[i];
}

Where primes.h is a header file that contains an array of "halves of differences between prime numbers". Thus, you will be limited by the number of elements in that array for generating primes (i.e. the first one hundred thousand primes). If you are only working with smaller primes (less than 1,299,709 (i.e. the 100,000th prime)) and you are working on a project that requires the nth prime, randtoolbox is the way to go.

Below, we perform benchmarks on the rest of the packages.

Primes up to One Million

microbenchmark(priRcppAlgos = RcppAlgos::primeSieve(10^6),
               priNumbers = numbers::Primes(10^6),
               priSpuRs = spuRs::primesieve(c(), 2:10^6),
               priPrimes = primes::generate_primes(1, 10^6),
               priPrimefactr = primefactr::AllPrimesUpTo(10^6),
               priSfsmisc = sfsmisc::primes(10^6),
               priMatlab = matlab::primes(10^6),
               priJohnSieve = sieve(10^6),
               unit = "relative")
Unit: relative
          expr        min        lq      mean     median        uq       max neval
  priRcppAlgos   1.000000   1.00000   1.00000   1.000000   1.00000   1.00000   100
    priNumbers  21.550402  23.19917  26.67230  23.140031  24.56783  53.58169   100
      priSpuRs 232.682764 223.35847 233.65760 235.924538 236.09220 212.17140   100
     priPrimes  46.591868  43.64566  40.72524  39.106107  39.60530  36.47959   100
 priPrimefactr  39.609560  40.58511  42.64926  37.835497  38.89907  65.00466   100
    priSfsmisc   9.271614  10.68997  12.38100   9.761438  11.97680  38.12275   100
     priMatlab  21.756936  24.39900  27.08800  23.433433  24.85569  49.80532   100
  priJohnSieve  10.630835  11.46217  12.55619  10.792553  13.30264  38.99460   100

Primes up to Ten Million

microbenchmark(priRcppAlgos = RcppAlgos::primeSieve(10^7),
               priNumbers = numbers::Primes(10^7),
               priSpuRs = spuRs::primesieve(c(), 2:10^7),
               priPrimes = primes::generate_primes(1, 10^7),
               priPrimefactr = primefactr::AllPrimesUpTo(10^7),
               priSfsmisc = sfsmisc::primes(10^7),
               priMatlab = matlab::primes(10^7),
               priJohnSieve = sieve(10^7),
               unit = "relative", times = 20)
Unit: relative
          expr       min        lq      mean    median        uq       max neval
  priRcppAlgos   1.00000   1.00000   1.00000   1.00000   1.00000   1.00000    20
    priNumbers  30.57896  28.91780  31.26486  30.47751  29.81762  40.43611    20
      priSpuRs 533.99400 497.20484 490.39989 494.89262 473.16314 470.87654    20
     priPrimes 125.04440 114.71349 112.30075 113.54464 107.92360 103.74659    20
 priPrimefactr  52.03477  50.32676  52.28153  51.72503  52.32880  59.55558    20
    priSfsmisc  16.89114  16.44673  17.48093  16.64139  18.07987  22.88660    20
     priMatlab  30.13476  28.30881  31.70260  30.73251  32.92625  41.21350    20
  priJohnSieve  18.25245  17.95183  19.08338  17.92877  18.35414  32.57675    20

Primes up to One Hundred Million

For the next two benchmarks, we only consider RcppAlgos, numbers, sfsmisc, matlab, and the sieve function by @John.

microbenchmark(priRcppAlgos = RcppAlgos::primeSieve(10^8),
               priNumbers = numbers::Primes(10^8),
               priSfsmisc = sfsmisc::primes(10^8),
               priMatlab = matlab::primes(10^8),
               priJohnSieve = sieve(10^8),
               unit = "relative", times = 20)
Unit: relative
         expr      min       lq     mean   median       uq      max neval
 priRcppAlgos  1.00000  1.00000  1.00000  1.00000  1.00000  1.00000    20
   priNumbers 35.64097 33.75777 32.83526 32.25151 31.74193 31.95457    20
   priSfsmisc 21.68673 20.47128 20.01984 19.65887 19.43016 19.51961    20
    priMatlab 35.34738 33.55789 32.67803 32.21343 31.56551 31.65399    20
 priJohnSieve 23.28720 22.19674 21.64982 21.27136 20.95323 21.31737    20

Primes up to One Billion

N.B. We must remove the condition if(n > 1e8) stop("n too large") in the sieve function.

## See top section
## system.time(primeSieve(10^9))
##  user  system elapsed 
## 1.099   0.077   1.176      ## RcppAlgos single-threaded

## gc()
system.time(matlab::primes(10^9))
   user  system elapsed 
 31.780  12.456  45.549        ## ~39x slower than RcppAlgos

## gc()
system.time(numbers::Primes(10^9))
   user  system elapsed 
 32.252   9.257  41.441        ## ~35x slower than RcppAlgos

## gc()
system.time(sieve(10^9))
  user  system elapsed 
26.266   3.906  30.201         ## ~26x slower than RcppAlgos

## gc()
system.time(sfsmisc::primes(10^9))
  user  system elapsed 
24.292   3.389  27.710         ## ~24x slower than RcppAlgos

From these comparison, we see that RcppAlgos scales much better as n gets larger.

 _________________________________________________________
|            |   1e6   |   1e7    |   1e8     |    1e9    |
|            |---------|----------|-----------|-----------
| RcppAlgos  |   1.00  |   1.00   |    1.00   |    1.00   |
|   sfsmisc  |   9.76  |  16.64   |   19.66   |   23.56   |
| JohnSieve  |  10.79  |  17.93   |   21.27   |   25.68   |
|   numbers  |  23.14  |  30.48   |   32.25   |   34.86   |
|    matlab  |  23.43  |  30.73   |   32.21   |   38.73   |
 ---------------------------------------------------------

The difference is even more dramatic when we utilize multiple threads:

microbenchmark(ser = primeSieve(1e6),
               par = primeSieve(1e6, nThreads = 8), unit = "relative")
Unit: relative
expr      min       lq     mean   median       uq      max neval
 ser 1.741342 1.492707 1.481546 1.512804 1.432601 1.275733   100
 par 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000   100

microbenchmark(ser = primeSieve(1e7),
               par = primeSieve(1e7, nThreads = 8), unit = "relative")
Unit: relative
 expr      min      lq     mean   median       uq      max neval
  ser 2.632054 2.50671 2.405262 2.418097 2.306008 2.246153   100
  par 1.000000 1.00000 1.000000 1.000000 1.000000 1.000000   100

microbenchmark(ser = primeSieve(1e8),
               par = primeSieve(1e8, nThreads = 8), unit = "relative", times = 20)
Unit: relative
 expr      min       lq     mean   median       uq      max neval
  ser 2.914836 2.850347 2.761313 2.709214 2.755683 2.438048    20
  par 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000    20

microbenchmark(ser = primeSieve(1e9),
               par = primeSieve(1e9, nThreads = 8), unit = "relative", times = 10)
Unit: relative
 expr      min       lq     mean   median       uq      max neval
  ser 3.081841 2.999521 2.980076 2.987556 2.961563 2.841023    10
  par 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000    10

And multiplying the table above by the respective median times for the serial results:

 _____________________________________________________________
|                |   1e6   |   1e7    |   1e8     |    1e9    |
|                |---------|----------|-----------|-----------
| RcppAlgos-Par  |   1.00  |   1.00   |    1.00   |    1.00   |
| RcppAlgos-Ser  |   1.51  |   2.42   |    2.71   |    2.99   |
|     sfsmisc    |  14.76  |  40.24   |   53.26   |   70.39   |
|   JohnSieve    |  16.32  |  43.36   |   57.62   |   76.72   |
|     numbers    |  35.01  |  73.70   |   87.37   |  104.15   |
|      matlab    |  35.44  |  74.31   |   87.26   |  115.71   |
 -------------------------------------------------------------

Primes Over a Range

microbenchmark(priRcppAlgos = RcppAlgos::primeSieve(10^9, 10^9 + 10^6),
               priNumbers = numbers::Primes(10^9, 10^9 + 10^6),
               priPrimes = primes::generate_primes(10^9, 10^9 + 10^6),
               unit = "relative", times = 20)
Unit: relative
         expr      min       lq    mean   median       uq      max neval
 priRcppAlgos   1.0000   1.0000   1.000   1.0000   1.0000   1.0000    20
   priNumbers 115.3000 112.1195 106.295 110.3327 104.9106  81.6943    20
    priPrimes 983.7902 948.4493 890.243 919.4345 867.5775 708.9603    20

Primes up to 10 billion in Under 6 Seconds

##  primes less than 10 billion
system.time(tenBillion <- RcppAlgos::primeSieve(10^10, nThreads = 8))
  user  system elapsed 
26.077   2.063   5.602

length(tenBillion)
[1] 455052511

## Warning!!!... Large object created
tenBillionSize <- object.size(tenBillion)
print(tenBillionSize, units = "Gb")
3.4 Gb

Primes Over a Range of Very Large Numbers:

Prior to version 2.3.0, we were simply using the same algorithm for numbers of every magnitude. This is okay for smaller numbers when most of the sieving primes have at least one multiple in each segment (Generally, the segment size is limited by the size of L1 Cache ~32KiB). However, when we are dealing with larger numbers, the sieving primes will contain many numbers that will have fewer than one multiple per segment. This situation creates a lot of overhead, as we are performing many worthless checks that pollutes the cache. Thus, we observe much slower generation of primes when the numbers are very large. Observe for version 2.2.0 (See Installing older version of R package):

## Install version 2.2.0
## packageurl <- "http://cran.r-project.org/src/contrib/Archive/RcppAlgos/RcppAlgos_2.2.0.tar.gz"
## install.packages(packageurl, repos=NULL, type="source")

system.time(old <- RcppAlgos::primeSieve(1e15, 1e15 + 1e9))
 user  system elapsed 
7.932   0.134   8.067

And now using the cache friendly improvement originally developed by Tomás Oliveira, we see drastic improvements:

## Reinstall current version from CRAN
## install.packages("RcppAlgos"); library(RcppAlgos)
system.time(cacheFriendly <- primeSieve(1e15, 1e15 + 1e9))
 user  system elapsed 
2.258   0.166   2.424   ## Over 3x faster than older versions

system.time(primeSieve(1e15, 1e15 + 1e9, nThreads = 8))
 user  system elapsed 
4.852   0.780   0.911   ##  Over 8x faster using multiple threads

Take Away

There are many great packages available for generating primes
If you are looking for speed in general, there is no match to RcppAlgos::primeSieve, especially for larger numbers.
If you are working with small primes, look no further than randtoolbox::get.primes.
If you need primes in a range, the packages numbers, primes, & RcppAlgos are the way to go.
The importance of good programming practices cannot be overemphasized (e.g. vectorization, using correct data types, etc.). This is most aptly demonstrated by the pure base R solution provided by @John. It is concise, clear, and very efficient.

Solution 4:

Best way that I know of to generate all primes (without getting into crazy math) is to use the Sieve of Eratosthenes.

It is pretty straightforward to implement and allows you calculate primes without using division or modulus. The only downside is that it is memory intensive, but various optimizations can be made to improve memory (ignoring all even numbers for instance).

Solution 5:

This method should be Faster and simpler.

allPrime <- function(n) {
  primes <- rep(TRUE, n)
  primes[1] <- FALSE
  for (i in 1:sqrt(n)) {
    if (primes[i]) primes[seq(i^2, n, i)] <- FALSE
  }
  which(primes)
}

0.12 second on my computer for n = 1e6

I implemented this in function AllPrimesUpTo in package primefactr.