Row names & column names in R

Do the following function pairs generate exactly the same results?

Pair 1) names() & colnames()

Pair 2) rownames() & row.names()


Solution 1:

As Oscar Wilde said

Consistency is the last refuge of the unimaginative.

R is more of an evolved rather than designed language, so these things happen. names() and colnames() work on a data.frame but names() does not work on a matrix:

R> DF <- data.frame(foo=1:3, bar=LETTERS[1:3])
R> names(DF)
[1] "foo" "bar"
R> colnames(DF)
[1] "foo" "bar"
R> M <- matrix(1:9, ncol=3, dimnames=list(1:3, c("alpha","beta","gamma")))
R> names(M)
NULL
R> colnames(M)
[1] "alpha" "beta"  "gamma"
R> 

Solution 2:

Just to expand a little on Dirk's example:

It helps to think of a data frame as a list with equal length vectors. That's probably why names works with a data frame but not a matrix.

The other useful function is dimnames which returns the names for every dimension. You will notice that the rownames function actually just returns the first element from dimnames.

Regarding rownames and row.names: I can't tell the difference, although rownames uses dimnames while row.names was written outside of R. They both also seem to work with higher dimensional arrays:

>a <- array(1:5, 1:4)
> a[1,,,]
> rownames(a) <- "a"
> row.names(a)
[1] "a"
> a
, , 1, 1    
  [,1] [,2]
a    1    2

> dimnames(a)
[[1]]
[1] "a"

[[2]]
NULL

[[3]]
NULL

[[4]]
NULL

Solution 3:

I think that using colnames and rownames makes the most sense; here's why.

Using names has several disadvantages. You have to remember that it means "column names", and it only works with data frame, so you'll need to call colnames whenever you use matrices. By calling colnames, you only have to remember one function. Finally, if you look at the code for colnames, you will see that it calls names in the case of a data frame anyway, so the output is identical.

rownames and row.names return the same values for data frame and matrices; the only difference that I have spotted is that where there aren't any names, rownames will print "NULL" (as does colnames), but row.names returns it invisibly. Since there isn't much to choose between the two functions, rownames wins on the grounds of aesthetics, since it pairs more prettily withcolnames. (Also, for the lazy programmer, you save a character of typing.)

Solution 4:

And another expansion:

# create dummy matrix
set.seed(10)
m <- matrix(round(runif(25, 1, 5)), 5)
d <- as.data.frame(m)

If you want to assign new column names you can do following on data.frame:

# an identical effect can be achieved with colnames()   
names(d) <- LETTERS[1:5]
> d
  A B C D E
1 3 2 4 3 4
2 2 2 3 1 3
3 3 2 1 2 4
4 4 3 3 3 2
5 1 3 2 4 3

If you, however run previous command on matrix, you'll mess things up:

names(m) <- LETTERS[1:5]
> m
     [,1] [,2] [,3] [,4] [,5]
[1,]    3    2    4    3    4
[2,]    2    2    3    1    3
[3,]    3    2    1    2    4
[4,]    4    3    3    3    2
[5,]    1    3    2    4    3
attr(,"names")
 [1] "A" "B" "C" "D" "E" NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA 
[20] NA  NA  NA  NA  NA  NA 

Since matrix can be regarded as two-dimensional vector, you'll assign names only to first five values (you don't want to do that, do you?). In this case, you should stick with colnames().

So there...