A comprehensive survey of the types of things in R; 'mode' and 'class' and 'typeof' are insufficient
Solution 1:
I agree that the type system in R is rather weird. The reason for it being that way is that it has evolved over (a long) time...
Note that you missed one more type-like function, storage.mode
, and one more class-like function, oldClass
.
So, mode
and storage.mode
are the old-style types (where storage.mode
is more accurate), and typeof
is the newer, even more accurate version.
mode(3L) # numeric
storage.mode(3L) # integer
storage.mode(`identical`) # function
storage.mode(`if`) # function
typeof(`identical`) # closure
typeof(`if`) # special
Then class
is a whole different story. class
is mostly just the class
attribute of an object (that's exactly what oldClass
returns). But when the class attribute is not set, the class
function makes up a class from the object type and the dim attribute.
oldClass(3L) # NULL
class(3L) # integer
class(structure(3L, dim=1)) # array
class(structure(3L, dim=c(1,1))) # matrix
class(list()) # list
class(structure(list(1), dim=1)) # array
class(structure(list(1), dim=c(1,1))) # matrix
class(structure(list(1), dim=1, class='foo')) # foo
Finally, the class can return more than one string, but only if the class attribute is like that. The first string value is then kind of the main class, and the following ones are what it inherits from. The made-up classes are always of length 1.
# Here "A" inherits from "B", which inherits from "C"
class(structure(1, class=LETTERS[1:3])) # "A" "B" "C"
# an ordered factor:
class(ordered(3:1)) # "ordered" "factor"
Solution 2:
Here's some code to determine what the four type functions, class, mode, typeof, and storage.mode return for each of the kinds of R object.
library(methods)
library(dplyr)
library(xml2)
setClass("dummy", representation(x="numeric", y="numeric"))
types <- list(
"logical vector" = logical(),
"integer vector" = integer(),
"numeric vector" = numeric(),
"complex vector" = complex(),
"character vector" = character(),
"raw vector" = raw(),
factor = factor(),
"logical matrix" = matrix(logical()),
"numeric matrix" = matrix(numeric()),
"logical array" = array(logical(8), c(2, 2, 2)),
"numeric array" = array(numeric(8), c(2, 2, 2)),
list = list(),
pairlist = .Options,
"data frame" = data.frame(),
"closure function" = identity,
"builtin function" = `+`,
"special function" = `if`,
environment = new.env(),
null = NULL,
formula = y ~ x,
expression = expression(),
call = call("identity"),
name = as.name("x"),
"paren in expression" = expression((1))[[1]],
"brace in expression" = expression({1})[[1]],
"S3 lm object" = lm(dist ~ speed, cars),
"S4 dummy object" = new("dummy", x = 1:10, y = rnorm(10)),
"external pointer" = read_xml("<foo><bar /></foo>")$node
)
type_info <- Map(
function(x, nm)
{
data_frame(
"spoken type" = nm,
class = class(x),
mode = mode(x),
typeof = typeof(x),
storage.mode = storage.mode(x)
)
},
types,
names(types)
) %>% bind_rows
knitr::kable(type_info)
Here's the output:
|spoken type |class |mode |typeof |storage.mode |
|:-------------------|:-----------|:-----------|:-----------|:------------|
|logical vector |logical |logical |logical |logical |
|integer vector |integer |numeric |integer |integer |
|numeric vector |numeric |numeric |double |double |
|complex vector |complex |complex |complex |complex |
|character vector |character |character |character |character |
|raw vector |raw |raw |raw |raw |
|factor |factor |numeric |integer |integer |
|logical matrix |matrix |logical |logical |logical |
|numeric matrix |matrix |numeric |double |double |
|logical array |array |logical |logical |logical |
|numeric array |array |numeric |double |double |
|list |list |list |list |list |
|pairlist |pairlist |pairlist |pairlist |pairlist |
|data frame |data.frame |list |list |list |
|closure function |function |function |closure |function |
|builtin function |function |function |builtin |function |
|special function |function |function |special |function |
|environment |environment |environment |environment |environment |
|null |NULL |NULL |NULL |NULL |
|formula |formula |call |language |language |
|expression |expression |expression |expression |expression |
|call |call |call |language |language |
|name |name |name |symbol |symbol |
|paren in expression |( |( |language |language |
|brace in expression |{ |call |language |language |
|S3 lm object |lm |list |list |list |
|S4 dummy object |dummy |S4 |S4 |S4 |
|external pointer |externalptr |externalptr |externalptr |externalptr |
The types of objects available in R are discussed in the R Language Definition manual. There are a few types not mentioned here: you can't test for objects of type "promise", "...", and "ANY", and "bytecode" and "weakref" are only available at the C-level.
The table of available types in the R source is here.
Solution 3:
Does everything in R have (exactly one) class ?
Exactly one is definitely not right:
> x <- 3
> class(x) <- c("hi","low")
> class(x)
[1] "hi" "low"
Everything has (at least one) class.
Does everything in R have (exactly one) mode ?
Not certain but I suspect so.
What, if anything, does 'typeof' tell us?
typeof
gives the internal type of an object. Possible values according to ?typeof
are:
The vector types "logical", "integer", "double", "complex", "character", "raw" and "list", "NULL", "closure" (function), "special" and "builtin" (basic functions and operators), "environment", "S4" (some S4 objects) and others that are unlikely to be seen at user level ("symbol", "pairlist", "promise", "language", "char", "...", "any", "expression", "externalptr", "bytecode" and "weakref").
mode
relies on typeof. From ?mode
:
Modes have the same set of names as types (see typeof) except that types "integer" and "double" are returned as "numeric". types "special" and "builtin" are returned as "function". type "symbol" is called mode "name". type "language" is returned as "(" or "call".
What other information is needed to fully describe an entity? (Where is the 'listness' stored, for example?)
A list has class list:
> y <- list(3)
> class(y)
[1] "list"
Do you mean vectorization? length
should be sufficient for most purposes:
> z <- 3
> class(z)
[1] "numeric"
> length(z)
[1] 1
Think of 3
as a numeric vector of length 1, rather than as some primitive numeric type.
Conclusion
You can get by just fine with class
and length
. By the time you need the other stuff, you likely won't have to ask what they're for :-)
Solution 4:
Adding to one of your sub-questions :
- What other information is needed to fully describe an entity?
In addition to class
, mode
, typeof
, attributes
, str
, and so on, is()
is also worth noting.
is(1)
[1] "numeric" "vector"
While useful, it is also unsatisfactory. In this example, 1
is more than just that; it is also atomic, finite, and a double. The following function should show all that an object is according to all available is.(...)
functions:
what.is <- function(x, show.all=FALSE) {
# set the warn option to -1 to temporarily ignore warnings
op <- options("warn")
options(warn = -1)
on.exit(options(op))
list.fun <- grep(methods(is), pattern = "<-", invert = TRUE, value = TRUE)
result <- data.frame(test=character(), value=character(),
warning=character(), stringsAsFactors = FALSE)
# loop over all "is.(...)" functions and store the results
for(fun in list.fun) {
res <- try(eval(call(fun,x)),silent=TRUE)
if(class(res)=="try-error") {
next() # ignore tests that yield an error
} else if (length(res)>1) {
warn <- "*Applies only to the first element of the provided object"
value <- paste(res,"*",sep="")
} else {
warn <- ""
value <- res
}
result[nrow(result)+1,] <- list(fun, value, warn)
}
# sort the results
result <- result[order(result$value,decreasing = TRUE),]
rownames(result) <- NULL
if(show.all)
return(result)
else
return(result[which(result$value=="TRUE"),])
}
So now we get a more complete picture:
> what.is(1)
test value warning
1 is.atomic TRUE
2 is.double TRUE
3 is.finite TRUE
4 is.numeric TRUE
5 is.vector TRUE
> what.is(CO2)
test value warning
1 is.data.frame TRUE
2 is.list TRUE
3 is.object TRUE
4 is.recursive TRUE
You also get more information with the argument show.all=TRUE
. I am not pasting any example here as the results are over 50 lines long.
Finally, this is meant as a complementary source of information, not as a replacement for any of the other functions mentionned earlier.
EDIT
To include even more "is" functions, as per @Erdogan's comment, you could add this bit to the function:
# right after
# list.fun <- grep(methods(is), pattern = "<-", invert = TRUE, value = TRUE)
list.fun.2 <- character()
packs <- c('base', 'utils', 'methods') # include more packages if needed
for (pkg in packs) {
library(pkg, character.only = TRUE)
objects <- grep("^is.+\\w$", ls(envir = as.environment(paste('package', pkg, sep = ':'))),
value = TRUE)
objects <- grep("<-", objects, invert = TRUE, value = TRUE)
if (length(objects) > 0)
list.fun.2 <- append(list.fun.2, objects[sapply(objects, function(x) class(eval(parse(text = x))) == "function")])
}
list.fun <- union(list.fun.1, list.fun.2)
# ...and continue with the rest
result <- data.frame(test=character(), value=character(),
warning=character(), stringsAsFactors = FALSE)
# and so on...