How to organize large R programs?
Solution 1:
The standard answer is to use packages -- see the Writing R Extensions manual as well as different tutorials on the web.
It gives you
- a quasi-automatic way to organize your code by topic
- strongly encourages you to write a help file, making you think about the interface
- a lot of sanity checks via
R CMD check
- a chance to add regression tests
- as well as a means for namespaces.
Just running source()
over code works for really short snippets. Everything else should be in a package -- even if you do not plan to publish it as you can write internal packages for internal repositories.
As for the 'how to edit' part, the R Internals manual has excellent R coding standards in Section 6. Otherwise, I tend to use defaults in Emacs' ESS mode.
Update 2008-Aug-13: David Smith just blogged about the Google R Style Guide.
Solution 2:
I like putting different functionality in their own files.
But I don't like R's package system. It's rather hard to use.
I prefer a lightweight alternative, to place a file's functions inside an environment (what every other language calls a "namespace") and attach it. For example, I made a 'util' group of functions like so:
util = new.env()
util$bgrep = function [...]
util$timeit = function [...]
while("util" %in% search())
detach("util")
attach(util)
This is all in a file util.R. When you source it, you get the environment 'util' so you can call util$bgrep()
and such; but furthermore, the attach()
call makes it so just bgrep()
and such work directly. If you didn't put all those functions in their own environment, they'd pollute the interpreter's top-level namespace (the one that ls()
shows).
I was trying to simulate Python's system, where every file is a module. That would be better to have, but this seems OK.