The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe
R provides two different methods for accessing the elements of a list or data.frame: []
and [[]]
.
What is the difference between the two, and when should I use one over the other?
The R Language Definition is handy for answering these types of questions:
- http://cran.r-project.org/doc/manuals/R-lang.html#Indexing
R has three basic indexing operators, with syntax displayed by the following examples
x[i] x[i, j] x[[i]] x[[i, j]] x$a x$"a"For vectors and matrices the
[[
forms are rarely used, although they have some slight semantic differences from the[
form (e.g. it drops any names or dimnames attribute, and that partial matching is used for character indices). When indexing multi-dimensional structures with a single index,x[[i]]
orx[i]
will return thei
th sequential element ofx
.For lists, one generally uses
[[
to select any single element, whereas[
returns a list of the selected elements.The
[[
form allows only a single element to be selected using integer or character indices, whereas[
allows indexing by vectors. Note though that for a list, the index can be a vector and each element of the vector is applied in turn to the list, the selected component, the selected component of that component, and so on. The result is still a single element.
The significant differences between the two methods are the class of the objects they return when used for extraction and whether they may accept a range of values, or just a single value during assignment.
Consider the case of data extraction on the following list:
foo <- list( str='R', vec=c(1,2,3), bool=TRUE )
Say we would like to extract the value stored by bool from foo and use it inside an if()
statement. This will illustrate the differences between the return values of []
and [[]]
when they are used for data extraction. The []
method returns objects of class list (or data.frame if foo was a data.frame) while the [[]]
method returns objects whose class is determined by the type of their values.
So, using the []
method results in the following:
if( foo[ 'bool' ] ){ print("Hi!") }
Error in if (foo["bool"]) { : argument is not interpretable as logical
class( foo[ 'bool' ] )
[1] "list"
This is because the []
method returned a list and a list is not valid object to pass directly into an if()
statement. In this case we need to use [[]]
because it will return the "bare" object stored in 'bool' which will have the appropriate class:
if( foo[[ 'bool' ]] ){ print("Hi!") }
[1] "Hi!"
class( foo[[ 'bool' ]] )
[1] "logical"
The second difference is that the []
operator may be used to access a range of slots in a list or columns in a data frame while the [[]]
operator is limited to accessing a single slot or column. Consider the case of value assignment using a second list, bar()
:
bar <- list( mat=matrix(0,nrow=2,ncol=2), rand=rnorm(1) )
Say we want to overwrite the last two slots of foo with the data contained in bar. If we try to use the [[]]
operator, this is what happens:
foo[[ 2:3 ]] <- bar
Error in foo[[2:3]] <- bar :
more elements supplied than there are to replace
This is because [[]]
is limited to accessing a single element. We need to use []
:
foo[ 2:3 ] <- bar
print( foo )
$str
[1] "R"
$vec
[,1] [,2]
[1,] 0 0
[2,] 0 0
$bool
[1] -0.6291121
Note that while the assignment was successful, the slots in foo kept their original names.
Double brackets accesses a list element, while a single bracket gives you back a list with a single element.
lst <- list('one','two','three')
a <- lst[1]
class(a)
## returns "list"
a <- lst[[1]]
class(a)
## returns "character"