For each row return the column name of the largest value
One option using your data (for future reference, use set.seed()
to make examples using sample
reproducible):
DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))
colnames(DF)[apply(DF,1,which.max)]
[1] "V3" "V1" "V2"
A faster solution than using apply
might be max.col
:
colnames(DF)[max.col(DF,ties.method="first")]
#[1] "V3" "V1" "V2"
...where ties.method
can be any of "random"
"first"
or "last"
This of course causes issues if you happen to have two columns which are equal to the maximum. I'm not sure what you want to do in that instance as you will have more than one result for some rows. E.g.:
DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(7,6,4))
apply(DF,1,function(x) which(x==max(x)))
[[1]]
V2 V3
2 3
[[2]]
V1
1
[[3]]
V2
2
If you're interested in a data.table
solution, here's one. It's a bit tricky since you prefer to get the id for the first maximum. It's much easier if you'd rather want the last maximum. Nevertheless, it's not that complicated and it's fast!
Here I've generated data of your dimensions (26746 * 18).
Data
set.seed(45)
DF <- data.frame(matrix(sample(10, 26746*18, TRUE), ncol=18))
data.table
answer:
require(data.table)
DT <- data.table(value=unlist(DF, use.names=FALSE),
colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
setkey(DT, colid, value)
t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]
Benchmarking:
# data.table solution
system.time({
DT <- data.table(value=unlist(DF, use.names=FALSE),
colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
setkey(DT, colid, value)
t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]
})
# user system elapsed
# 0.174 0.029 0.227
# apply solution from @thelatemail
system.time(t2 <- colnames(DF)[apply(DF,1,which.max)])
# user system elapsed
# 2.322 0.036 2.602
identical(t1, t2)
# [1] TRUE
It's about 11 times faster on data of these dimensions, and data.table
scales pretty well too.
Edit: if any of the max ids is okay, then:
DT <- data.table(value=unlist(DF, use.names=FALSE),
colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
setkey(DT, colid, value)
t1 <- DT[J(unique(colid)), rowid, mult="last"]