For each row return the column name of the largest value

One option using your data (for future reference, use set.seed() to make examples using sample reproducible):

DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(9,6,4))

colnames(DF)[apply(DF,1,which.max)]
[1] "V3" "V1" "V2"

A faster solution than using apply might be max.col:

colnames(DF)[max.col(DF,ties.method="first")]
#[1] "V3" "V1" "V2"

...where ties.method can be any of "random" "first" or "last"

This of course causes issues if you happen to have two columns which are equal to the maximum. I'm not sure what you want to do in that instance as you will have more than one result for some rows. E.g.:

DF <- data.frame(V1=c(2,8,1),V2=c(7,3,5),V3=c(7,6,4))
apply(DF,1,function(x) which(x==max(x)))

[[1]]
V2 V3 
 2  3 

[[2]]
V1 
 1 

[[3]]
V2 
 2

If you're interested in a data.table solution, here's one. It's a bit tricky since you prefer to get the id for the first maximum. It's much easier if you'd rather want the last maximum. Nevertheless, it's not that complicated and it's fast!

Here I've generated data of your dimensions (26746 * 18).

Data

set.seed(45)
DF <- data.frame(matrix(sample(10, 26746*18, TRUE), ncol=18))

`data.table` answer:

require(data.table)
DT <- data.table(value=unlist(DF, use.names=FALSE), 
            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
setkey(DT, colid, value)
t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]

Benchmarking:

# data.table solution
system.time({
DT <- data.table(value=unlist(DF, use.names=FALSE), 
            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
setkey(DT, colid, value)
t1 <- DT[J(unique(colid), DT[J(unique(colid)), value, mult="last"]), rowid, mult="first"]
})
#   user  system elapsed 
#  0.174   0.029   0.227 

# apply solution from @thelatemail
system.time(t2 <- colnames(DF)[apply(DF,1,which.max)])
#   user  system elapsed 
#  2.322   0.036   2.602 

identical(t1, t2)
# [1] TRUE

It's about 11 times faster on data of these dimensions, and data.table scales pretty well too.

Edit: if any of the max ids is okay, then:

DT <- data.table(value=unlist(DF, use.names=FALSE), 
            colid = 1:nrow(DF), rowid = rep(names(DF), each=nrow(DF)))
setkey(DT, colid, value)
t1 <- DT[J(unique(colid)), rowid, mult="last"]

For each row return the column name of the largest value

Data

`data.table` answer:

Benchmarking:

Edit: if any of the max ids is okay, then:

Related

Recent Posts

For each row return the column name of the largest value

Data

data.table answer:

Benchmarking:

Edit: if any of the max ids is okay, then:

Related

Recent Posts

`data.table` answer: