How do Rpy2, pyrserve and PypeR compare?

I would like to access R from within a Python program. I am aware of Rpy2, pyrserve and PypeR.

What are the advantages or disadvantages of these three options?


I know one of the 3 better than the others, but in the order given in the question:

rpy2:

  • C-level interface between Python and R (R running as an embedded process)
  • R objects exposed to Python without the need to copy the data over
  • Conversely, Python's numpy arrays can be exposed to R without making a copy
  • Low-level interface (close to the R C-API) and high-level interface (for convenience)
  • In-place modification for vectors and arrays possible
  • R callback functions can be implemented in Python
  • Possible to have anonymous R objects with a Python label
  • Python pickling possible
  • Full customization of R's behavior with its console (so possible to implement a full R GUI)
  • MSWindows with limited support

pyrserve:

  • native Python code (will/should/may work with CPython, Jython, IronPython)
  • use R's Rserve
  • advantages and inconveniences linked to remote computation and to RServe

pyper:

  • native Python code (will/should/may work with CPython, Jython, IronPython)
  • use of pipes to have Python communicate with R (with the advantages and inconveniences linked to it)

edit: Windows support for rpy2


From the paper in the Journal of Statistical Software on PypeR:

RPy presents a simple and efficient way of accessing R from Python. It is robust and very convenient for frequent interaction operations between Python and R. This package allows Python programs to pass Python objects of basic data types to R functions and return the results in Python objects. Such features make it an attractive solution for the cases in which Python and R interact frequently. However, there are still limitations of this package as listed below.
Performance:
RPy may not behave very well for large-size data sets or for computation-intensive duties. A lot of time and memory are inevitably consumed in producing the Python copy of the R data because in every round of a conversation RPy converts the returned value of an R expression into a Python object of basic types or NumPy array. RPy2, a recently developed branch of RPy, uses Python objects to refer to R objects instead of copying them back into Python objects. This strategy avoids frequent data conversions and improves speed. However, memory consumption remains a problem. [...] When we were implementing WebArray (Xia et al. 2005), an online platform for microarray data analysis, a job consumed roughly one quarter more computational time if running R through RPy instead of through R's command-line user interface. Therefore, we decided to run R in Python through pipes in subsequent developments, e.g., WebArrayDB (Xia et al. 2009), which retained the same performance as achieved when running R independently. We do not know the exact reason for such a difference in performance, but we noticed that RPy directly uses the shared library of R to run R scripts. In contrast, running R through pipes means running the R interpreter directly.
Memory:
R has been denounced for its uneconomical use of memory. The memory used by large- size R objects is rarely released after these objects are deleted. Sometimes the only way to release memory from R is to quit R. RPy module wraps R in a Python object. However, the R library will stay in memory even if the Python object is deleted. In other words, memory used by R cannot be released until the host Python script is terminated.
Portability:
As a module with extensions written in C, the RPy source package has to be compiled with a specific R version on POSIX (Portable Operating System Interface for Unix) systems, and the R must be compiled with the shared library enabled. Also, the binary distributions for Windows are bound to specic combinations of different versions of Python/R, so it is quite frequent that a user has difficulty in finding a distribution that ts the user's software environment.