How to define the subset operators for a S4 class?
I am having trouble figuring out the proper way to define the [
, $
, and [[
subset operators for an S4 class.
Can anyone provide me with a basic example of defining these three for an S4 class?
Discover the generic so that we know what we are aiming for
> getGeneric("[")
standardGeneric for "[" defined from package "base"
function (x, i, j, ..., drop = TRUE)
standardGeneric("[", .Primitive("["))
<bytecode: 0x32e25c8>
<environment: 0x32d7a50>
Methods may be defined for arguments: x, i, j, drop
Use showMethods("[") for currently available ones.
Define a simple class
setClass("A", representation=representation(slt="numeric"))
and implement a method
setMethod("[", c("A", "integer", "missing", "ANY"),
## we won't support subsetting on j; dispatching on 'drop' doesn't
## make sense (to me), so in rebellion we'll quietly ignore it.
function(x, i, j, ..., drop=TRUE)
{
## less clever: update slot, return instance
## x@slt = x@slt[i]
## x
## clever: by default initialize is a copy constructor, too
initialize(x, slt=x@slt[i])
})
In action:
> a = new("A", slt=1:5)
> a[3:1]
An object of class "A"
Slot "slt":
[1] 3 2 1
There are different strategies for supporting the (implicitly) many signatures, for instance you'd likely also want to support logical and character index values, possibly for both i and j. The most straight-forward is a "facade" pattern where each method does some preliminary coercion to a common type of subset index, e.g., integer
to allow for re-ordering and repetition of index entries, and then uses callGeneric
to invoke a single method that does the work of subsetting the class.
There are no conceptual differences for [[
, other than wanting to respect the semantics of returning the content rather than another instance of the object as implied by [
. For $
we have
> getGeneric("$")
standardGeneric for "$" defined from package "base"
function (x, name)
standardGeneric("$", .Primitive("$"))
<bytecode: 0x31fce40>
<environment: 0x31f12b8>
Methods may be defined for arguments: x
Use showMethods("$") for currently available ones.
and
setMethod("$", "A",
function(x, name)
{
## 'name' is a character(1)
slot(x, name)
})
with
> a$slt
[1] 1 2 3 4 5
I would do as @Martin_Morgan suggested for the operators you mentioned. I would add a couple of points though:
1) I would be careful about defining a $
operator to access an S4 slot (unless you intend to access a column from a data frame which is stored in a specific slot?). The general suggestion is to write accessor functions like getMySlot()
and setMySlot()
to get the information you need. You can use the @
operator to access data from those slots, although get and set are best as a user interface. Using $
could be confusing for the user, who would probably expect a data.frame. See this S4 tutorial by Christophe Genolini for an in-depth discussion of these issues. If this is not how you intended to use $
, disregard my suggestion (but the tutorial is still a great resource!).
2) If you are defining [
and [[
to inherit from another class, like vector, you will also want to define el()
(equivalent to [][[1L]]
, or the first element from a subset []
) and length()
. I am currently writing a class to inherit from numeric, and numeric methods will automatically try to use these functions from your class. If the class is for a more limited or your own personal use, this may not be a problem.
I apologize, I would have left this as a comment, but I'm new to SO and I don't have the rep yet!