A comprehensive survey of the types of things in R; 'mode' and 'class' and 'typeof' are insufficient

Solution 1:

I agree that the type system in R is rather weird. The reason for it being that way is that it has evolved over (a long) time...

Note that you missed one more type-like function, storage.mode, and one more class-like function, oldClass.

So, mode and storage.mode are the old-style types (where storage.mode is more accurate), and typeof is the newer, even more accurate version.

mode(3L)                  # numeric
storage.mode(3L)          # integer
storage.mode(`identical`) # function
storage.mode(`if`)        # function
typeof(`identical`)       # closure
typeof(`if`)              # special

Then class is a whole different story. class is mostly just the class attribute of an object (that's exactly what oldClass returns). But when the class attribute is not set, the class function makes up a class from the object type and the dim attribute.

oldClass(3L) # NULL
class(3L) # integer
class(structure(3L, dim=1)) # array
class(structure(3L, dim=c(1,1))) # matrix
class(list()) # list
class(structure(list(1), dim=1)) # array
class(structure(list(1), dim=c(1,1))) # matrix
class(structure(list(1), dim=1, class='foo')) # foo

Finally, the class can return more than one string, but only if the class attribute is like that. The first string value is then kind of the main class, and the following ones are what it inherits from. The made-up classes are always of length 1.

# Here "A" inherits from "B", which inherits from "C"
class(structure(1, class=LETTERS[1:3])) # "A" "B" "C"

# an ordered factor:
class(ordered(3:1)) # "ordered" "factor"

Solution 2:

Here's some code to determine what the four type functions, class, mode, typeof, and storage.mode return for each of the kinds of R object.

library(methods)
library(dplyr)
library(xml2)

setClass("dummy", representation(x="numeric", y="numeric"))

types <- list(
  "logical vector" = logical(),
  "integer vector" = integer(),
  "numeric vector" = numeric(),
  "complex vector" = complex(),
  "character vector" = character(),
  "raw vector" = raw(),
  factor = factor(),
  "logical matrix" = matrix(logical()),
  "numeric matrix" = matrix(numeric()),
  "logical array" = array(logical(8), c(2, 2, 2)),
  "numeric array" = array(numeric(8), c(2, 2, 2)),
  list = list(),
  pairlist = .Options,
  "data frame" = data.frame(),
  "closure function" = identity,
  "builtin function" = `+`,
  "special function" = `if`,
  environment = new.env(),
  null = NULL,
  formula = y ~ x,
  expression = expression(),
  call = call("identity"),
  name = as.name("x"),
  "paren in expression" = expression((1))[[1]],
  "brace in expression" = expression({1})[[1]],
  "S3 lm object" = lm(dist ~ speed, cars),
  "S4 dummy object" = new("dummy", x = 1:10, y = rnorm(10)),
  "external pointer" = read_xml("<foo><bar /></foo>")$node
)

type_info <- Map(
  function(x, nm)
  {
    data_frame(
      "spoken type" = nm,
      class = class(x), 
      mode  = mode(x),
      typeof = typeof(x),
      storage.mode = storage.mode(x)
    )
  },
  types,
  names(types)
) %>% bind_rows

knitr::kable(type_info)

Here's the output:

|spoken type         |class       |mode        |typeof      |storage.mode |
|:-------------------|:-----------|:-----------|:-----------|:------------|
|logical vector      |logical     |logical     |logical     |logical      |
|integer vector      |integer     |numeric     |integer     |integer      |
|numeric vector      |numeric     |numeric     |double      |double       |
|complex vector      |complex     |complex     |complex     |complex      |
|character vector    |character   |character   |character   |character    |
|raw vector          |raw         |raw         |raw         |raw          |
|factor              |factor      |numeric     |integer     |integer      |
|logical matrix      |matrix      |logical     |logical     |logical      |
|numeric matrix      |matrix      |numeric     |double      |double       |
|logical array       |array       |logical     |logical     |logical      |
|numeric array       |array       |numeric     |double      |double       |
|list                |list        |list        |list        |list         |
|pairlist            |pairlist    |pairlist    |pairlist    |pairlist     |
|data frame          |data.frame  |list        |list        |list         |
|closure function    |function    |function    |closure     |function     |
|builtin function    |function    |function    |builtin     |function     |
|special function    |function    |function    |special     |function     |
|environment         |environment |environment |environment |environment  |
|null                |NULL        |NULL        |NULL        |NULL         |
|formula             |formula     |call        |language    |language     |
|expression          |expression  |expression  |expression  |expression   |
|call                |call        |call        |language    |language     |
|name                |name        |name        |symbol      |symbol       |
|paren in expression |(           |(           |language    |language     |
|brace in expression |{           |call        |language    |language     |
|S3 lm object        |lm          |list        |list        |list         |
|S4 dummy object     |dummy       |S4          |S4          |S4           |
|external pointer    |externalptr |externalptr |externalptr |externalptr  |

The types of objects available in R are discussed in the R Language Definition manual. There are a few types not mentioned here: you can't test for objects of type "promise", "...", and "ANY", and "bytecode" and "weakref" are only available at the C-level.

The table of available types in the R source is here.

Solution 3:

Does everything in R have (exactly one) class ?

Exactly one is definitely not right:

> x <- 3
> class(x) <- c("hi","low")
> class(x)
[1] "hi"  "low"

Everything has (at least one) class.

Does everything in R have (exactly one) mode ?

Not certain but I suspect so.

What, if anything, does 'typeof' tell us?

typeof gives the internal type of an object. Possible values according to ?typeof are:

The vector types "logical", "integer", "double", "complex", "character", "raw" and "list", "NULL", "closure" (function), "special" and "builtin" (basic functions and operators), "environment", "S4" (some S4 objects) and others that are unlikely to be seen at user level ("symbol", "pairlist", "promise", "language", "char", "...", "any", "expression", "externalptr", "bytecode" and "weakref").

mode relies on typeof. From ?mode:

Modes have the same set of names as types (see typeof) except that types "integer" and "double" are returned as "numeric". types "special" and "builtin" are returned as "function". type "symbol" is called mode "name". type "language" is returned as "(" or "call".

What other information is needed to fully describe an entity? (Where is the 'listness' stored, for example?)

A list has class list:

> y <- list(3)
> class(y)
[1] "list"

Do you mean vectorization? length should be sufficient for most purposes:

> z <- 3
> class(z)
[1] "numeric"
> length(z)
[1] 1

Think of 3 as a numeric vector of length 1, rather than as some primitive numeric type.

Conclusion

You can get by just fine with class and length. By the time you need the other stuff, you likely won't have to ask what they're for :-)

Solution 4:

Adding to one of your sub-questions :

  • What other information is needed to fully describe an entity?

In addition to class, mode, typeof, attributes, str, and so on, is() is also worth noting.

is(1)
[1] "numeric" "vector"

While useful, it is also unsatisfactory. In this example, 1 is more than just that; it is also atomic, finite, and a double. The following function should show all that an object is according to all available is.(...) functions:

what.is <- function(x, show.all=FALSE) {

  # set the warn option to -1 to temporarily ignore warnings
  op <- options("warn")
  options(warn = -1)
  on.exit(options(op))

  list.fun <- grep(methods(is), pattern = "<-", invert = TRUE, value = TRUE)
  result <- data.frame(test=character(), value=character(), 
                       warning=character(), stringsAsFactors = FALSE)

  # loop over all "is.(...)" functions and store the results
  for(fun in list.fun) {
    res <- try(eval(call(fun,x)),silent=TRUE)
    if(class(res)=="try-error") {
      next() # ignore tests that yield an error
    } else if (length(res)>1) {
      warn <- "*Applies only to the first element of the provided object"
      value <- paste(res,"*",sep="")
    } else {
      warn <- ""
      value <- res
    }
    result[nrow(result)+1,] <- list(fun, value, warn)
  }

  # sort the results
  result <- result[order(result$value,decreasing = TRUE),]
  rownames(result) <- NULL

  if(show.all)
    return(result)
  else
    return(result[which(result$value=="TRUE"),])
}

So now we get a more complete picture:

> what.is(1)
        test value warning
1  is.atomic  TRUE        
2  is.double  TRUE        
3  is.finite  TRUE        
4 is.numeric  TRUE        
5  is.vector  TRUE 

> what.is(CO2)
           test value warning
1 is.data.frame  TRUE        
2       is.list  TRUE        
3     is.object  TRUE        
4  is.recursive  TRUE 

You also get more information with the argument show.all=TRUE. I am not pasting any example here as the results are over 50 lines long.

Finally, this is meant as a complementary source of information, not as a replacement for any of the other functions mentionned earlier.

EDIT

To include even more "is" functions, as per @Erdogan's comment, you could add this bit to the function:

  # right after 
  # list.fun <- grep(methods(is), pattern = "<-", invert = TRUE, value = TRUE)
  list.fun.2 <- character()

  packs <- c('base', 'utils', 'methods') # include more packages if needed

  for (pkg in packs) {
    library(pkg, character.only = TRUE)
    objects <- grep("^is.+\\w$", ls(envir = as.environment(paste('package', pkg, sep = ':'))),
                    value = TRUE)
    objects <- grep("<-", objects, invert = TRUE, value = TRUE)
    if (length(objects) > 0) 
      list.fun.2 <- append(list.fun.2, objects[sapply(objects, function(x) class(eval(parse(text = x))) == "function")])
  }

  list.fun <- union(list.fun.1, list.fun.2)  

  # ...and continue with the rest
  result <- data.frame(test=character(), value=character(), 
                       warning=character(), stringsAsFactors = FALSE)
  # and so on...