Virtual environment in R?
I've found several posts about best practice, reproducibility and workflow in R, for example:
- How to increase longer term reproducibility of research (particularly using R and Sweave)
- Complete substantive examples of reproducible research using R
One of the major preoccupations is ensuring portability of code, in the sense that moving it to a new machine (possibly running a different OS) is relatively straightforward and gives the same results.
Coming from a Python background, I'm used to the concept of a virtual environment. When coupled with a simple list of required packages, this goes some way to ensuring that the installed packages and libraries are available on any machine without too much fuss. Sure, it's no guarantee - different OSes have their own foibles and peculiarities - but it gets you 95% of the way there.
Does such a thing exist within R? Even if it's not as sophisticated. For example simply maintaining a plain text list of required packages and a script that will install any that are missing?
I'm about to start using R in earnest for the first time, probably in conjunction with Sweave, and would ideally like to start in the best way possible! Thanks for your thoughts.
Solution 1:
I'm going to use the comment posted by @cboettig in order to resolve this question.
Packrat
Packrat is a dependency management system for R. Gives you three important advantages (all of them focused in your portability needs)
Isolated : Installing a new or updated package for one project won’t break your other projects, and vice versa. That’s because packrat gives each project its own private package library.
Portable: Easily transport your projects from one computer to another, even across different platforms. Packrat makes it easy to install the packages your project depends on.
Reproducible: Packrat records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.
What's next?
Walkthrough guide: http://rstudio.github.io/packrat/walkthrough.html
Most common commands: http://rstudio.github.io/packrat/commands.html
Using Packrat with RStudio: http://rstudio.github.io/packrat/rstudio.html
Limitations and caveats: http://rstudio.github.io/packrat/limitations.html
Update: Packrat has been soft-deprecated and is now superseded by renv, so you might want to check this package instead.
Solution 2:
The Anaconda package manager conda
supports creating R environments.
conda create -n r-environment r-essentials r-base
conda activate r-environment
I have had a great experience using conda
to maintain different Python installations, both user specific and several versions for the same user. I have tested R with conda
and the jupyter-notebook
and it works great. At least for my needs, which includes RNA-sequencing analyses using the DEseq2
and related packages, as well as data.table
and dplyr
. There are many bioconductor packages available in conda
via bioconda and according to the comments on this SO question, it seems like install.packages()
might work as well.
Solution 3:
It looks like there is another option from RStudio devs, renv. It's available on CRAN and supersedes Packrat.
In short, you use renv::init()
to initialize your project library, and use renv::snapshot()
/ renv::restore()
to save and load the state of your library.
I prefer this option to conda r-enviroments because here everything is stored in the file renv.lock
, which can be committed to a Git repo and distributed to the team.