Row names & column names in R
Do the following function pairs generate exactly the same results?
Pair 1) names()
& colnames()
Pair 2) rownames()
& row.names()
Solution 1:
As Oscar Wilde said
Consistency is the last refuge of the unimaginative.
R is more of an evolved rather than designed language, so these things happen. names()
and colnames()
work on a data.frame
but names()
does not work on a matrix:
R> DF <- data.frame(foo=1:3, bar=LETTERS[1:3])
R> names(DF)
[1] "foo" "bar"
R> colnames(DF)
[1] "foo" "bar"
R> M <- matrix(1:9, ncol=3, dimnames=list(1:3, c("alpha","beta","gamma")))
R> names(M)
NULL
R> colnames(M)
[1] "alpha" "beta" "gamma"
R>
Solution 2:
Just to expand a little on Dirk's example:
It helps to think of a data frame as a list with equal length vectors. That's probably why names
works with a data frame but not a matrix.
The other useful function is dimnames
which returns the names for every dimension. You will notice that the rownames
function actually just returns the first element from dimnames
.
Regarding rownames
and row.names
: I can't tell the difference, although rownames
uses dimnames
while row.names
was written outside of R. They both also seem to work with higher dimensional arrays:
>a <- array(1:5, 1:4)
> a[1,,,]
> rownames(a) <- "a"
> row.names(a)
[1] "a"
> a
, , 1, 1
[,1] [,2]
a 1 2
> dimnames(a)
[[1]]
[1] "a"
[[2]]
NULL
[[3]]
NULL
[[4]]
NULL
Solution 3:
I think that using colnames
and rownames
makes the most sense; here's why.
Using names
has several disadvantages. You have to remember that it means "column names", and it only works with data frame, so you'll need to call colnames
whenever you use matrices. By calling colnames
, you only have to remember one function. Finally, if you look at the code for colnames
, you will see that it calls names
in the case of a data frame anyway, so the output is identical.
rownames
and row.names
return the same values for data frame and matrices; the only difference that I have spotted is that where there aren't any names, rownames
will print "NULL" (as does colnames
), but row.names
returns it invisibly. Since there isn't much to choose between the two functions, rownames
wins on the grounds of aesthetics, since it pairs more prettily withcolnames
. (Also, for the lazy programmer, you save a character of typing.)
Solution 4:
And another expansion:
# create dummy matrix
set.seed(10)
m <- matrix(round(runif(25, 1, 5)), 5)
d <- as.data.frame(m)
If you want to assign new column names you can do following on data.frame
:
# an identical effect can be achieved with colnames()
names(d) <- LETTERS[1:5]
> d
A B C D E
1 3 2 4 3 4
2 2 2 3 1 3
3 3 2 1 2 4
4 4 3 3 3 2
5 1 3 2 4 3
If you, however run previous command on matrix
, you'll mess things up:
names(m) <- LETTERS[1:5]
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 3 2 4 3 4
[2,] 2 2 3 1 3
[3,] 3 2 1 2 4
[4,] 4 3 3 3 2
[5,] 1 3 2 4 3
attr(,"names")
[1] "A" "B" "C" "D" "E" NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[20] NA NA NA NA NA NA
Since matrix can be regarded as two-dimensional vector, you'll assign names only to first five values (you don't want to do that, do you?). In this case, you should stick with colnames()
.
So there...