Select values from different columns based on a variable containing column names [duplicate]

Solution 1:

An excuse to use the obscure .BY:

DT[, newval := .SD[[.BY[[1]]]], by=new]

   col1 col2 col3  new newval
1:    1    4   55 col1      1
2:    2    3   44 col2      3
3:    3   34   35 col2     34
4:    4   44   87 col3     87

How it works. This splits the data into groups based on the strings in new. The value of the string for each group is stored in newname = .BY[[1]]. We use this string to select the corresponding column of .SD via .SD[[newname]]. .SD stands for Subset of Data.

Alternatives. get(.BY[[1]]) should work just as well in place of .SD[[.BY[[1]]]]. According to a benchmark run by @David, the two ways are equally fast.

Solution 2:

We can match the 'new' column with the column names of the dataset to get the column index, cbind with the row index (1:nrow(df1)) and extract the corresponding elements of the dataset based on row/column index. It can be assigned to a new column.

df1$matched_value <- df1[-4][cbind(1:nrow(df1),match(df1$new, colnames(df1) ))]
df1
#  col1 col2 col3  new matched_value
#1    1    4   55 col1             1
#2    2    3   44 col2             3
#3    3   34   35 col2            34
#4    4   44   87 col3            87

NOTE: If the OP have a data.table, one option is convert to data.frame or use with=FALSE while subsetting.

 setDF(df1) #to convert to 'data.frame'.

Benchmarks

set.seed(45)
df2 <- data.frame(col1= sample(1:9, 20e6, replace=TRUE),
col2= sample(1:20, 20e6, replace=TRUE), 
col3= sample(1:40, 20e6, replace=TRUE),
col4=sample(1:30, 20e6, replace=TRUE),
new= sample(paste0('col', 1:4), 20e6, replace=TRUE), stringsAsFactors=FALSE)
system.time(df2$matched_value <- df2[-5][cbind(1:nrow(df2),match(df2$new, colnames(df2) ))])
#   user  system elapsed 
#  2.54    0.37    2.92

Select values from different columns based on a variable containing column names [duplicate]

Solution 1:

Solution 2:

Benchmarks

Related

Recent Posts