R and object oriented programming

Object oriented programming in one way or another is very much possible in R. However, unlike for example Python, there are many ways to achieve object orientation:

  • The R.oo package
  • S3 and S4 classes
  • Reference classes
  • the proto package

My question is:

What major differences distinguish these ways of OO programming in R?

Ideally the answers here will serve as a reference for R programmers trying to decide which OO programming methods best suits their needs.

As such, I am asking for detail, presented in an objective manner, based on experience, and backed with facts and reference. Bonus points for clarifying how these methods map to standard OO practices.


Solution 1:

S3 classes

  • Not really objects, more of a naming convention
  • Based around the . syntax: E.g. for print, print calls print.lm print.anova, etc. And if not found,print.default

S4 classes

  • Can dispatch on multiple arguments
  • More complicated to implement than S3

Reference classes

  • Primarily useful to avoid making copies of large objects (pass by reference)
  • Description of reasons to use RefClasses

proto

  • ggplot2 was originally written in proto, but will eventually be rewritten using S3.
  • Neat concept (prototypes, not classes), but seems tricky in practice
  • Next version of ggplot2 seems to be moving away from it
  • Description of the concept and implementation

R6 classes

  • By-reference
  • Does not depend on S4 classes
  • "Creating an R6 class is similar to the reference class, except that there’s no need to separate the fields and methods, and you can’t specify the types of the fields."

Solution 2:

Edit on 3/8/12: The answer below responds to a piece of the originally posted question which has since been removed. I've copied it below, to provide context for my answer:

How do the different OO methods map to the more standard OO methods used in e.g. Java or Python?


My contribution relates to your second question, about how R's OO methods map to more standard OO methods. As I've thought about this in the past, I've returned again and again to two passages, one by Friedrich Leisch, and the other by John Chambers. Both do a good job of articulating why OO-like programming in R has a different flavor than in many other languages.

First, Friedrich Leisch, from "Creating R Packages: A Tutorial" (warning: PDF):

S is rare because it is both interactive and has a system for object-orientation. Designing classes clearly is programming, yet to make S useful as an interactive data analysis environment, it makes sense that it is a functional language. In "real" object-oriented programming (OOP) languages like C++ or Java class and method definitions are tightly bound together, methods are part of classes (and hence objects). We want incremental and interactive additions like user-defined methods for pre-defined classes. These additions can be made at any point in time, even on the fly at the command line prompt while we analyze a data set. S tries to make a compromise between object orientation and interactive use, and although compromises are never optimal with respect to all goals they try to reach, they often work surprisingly well in practice.

The other passage comes from John Chambers' superb book "Software for Data Analysis". (Link to quoted passage):

The OOP programming model differs from the S language in all but the first point, even though S and some other functional languages support classes and methods. Method definitions in an OOP system are local to the class; there is no requirement that the same name for a method means the same thing for an unrelated class. In contrast, method definitions in R do not reside in a class definition; conceptually, they are associated with the generic function. Class definitions enter in determining method selection, directly or through inheritance. Programmers used to the OOP model are sometimes frustrated or confused that their programming does not transfer to R directly, but it cannot. The functional use of methods is more complicated but also more attuned to having meaningful functions, and can't be reduced to the OOP version.

Solution 3:

S3 and S4 seem to be the official (i.e. built in) approaches for OO programming. I have begun using a combination of S3 with functions embedded in constructor function/method. My goal was to have a object$method() type syntax so that I have semi-private fields. I say semi-private because there is no way of really hiding them (as far as I know). Here is a simple example that doesn't actually do anything:

#' Constructor
EmailClass <- function(name, email) {
    nc = list(
        name = name,
        email = email,
        get = function(x) nc[[x]],
        set = function(x, value) nc[[x]] <<- value,
        props = list(),
        history = list(),
        getHistory = function() return(nc$history),
        getNumMessagesSent = function() return(length(nc$history))
    )
    #Add a few more methods
    nc$sendMail = function(to) {
        cat(paste("Sending mail to", to, 'from', nc$email))
        h <- nc$history
        h[[(length(h)+1)]] <- list(to=to, timestamp=Sys.time())
        assign('history', h, envir=nc)
    }
    nc$addProp = function(name, value) {
        p <- nc$props
        p[[name]] <- value
        assign('props', p, envir=nc)
    }
    nc <- list2env(nc)
    class(nc) <- "EmailClass"
    return(nc)
}

#' Define S3 generic method for the print function.
print.EmailClass <- function(x) {
    if(class(x) != "EmailClass") stop();
    cat(paste(x$get("name"), "'s email address is ", x$get("email"), sep=''))
}

And some test code:

    test <- EmailClass(name="Jason", "[email protected]")
    test$addProp('hello', 'world')
    test$props
    test
    class(test)
    str(test)
    test$get("name")
    test$get("email")
    test$set("name", "Heather")
    test$get("name")
    test
    test$sendMail("[email protected]")
    test$getHistory()
    test$sendMail("[email protected]")
    test$getNumMessagesSent()

    test2 <- EmailClass("Nobody", "[email protected]")
    test2
    test2$props
    test2$getHistory()
    test2$sendMail('[email protected]')

Here is a link to a blog post I wrote about this approach: http://bryer.org/2012/object-oriented-programming-in-r I would welcome comments, criticisms, and suggestions to this approach as I am not convinced myself if this is the best approach. However, for the problem I was trying to solve it has worked great. Specifically, for the makeR package (http://jbryer.github.com/makeR) I did not want users to change data fields directly because I needed to ensure that an XML file that represented my object's state would stay in sync. This worked perfectly as long as the users adhere to the rules I outline in the documentation.