dplyr: how to reference columns by column index rather than column name using mutate?

Using dplyr, you can do something like this:

iris %>% head %>% mutate(sum=Sepal.Length + Sepal.Width) 
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
3          4.7         3.2          1.3         0.2  setosa 7.9
4          4.6         3.1          1.5         0.2  setosa 7.7
5          5.0         3.6          1.4         0.2  setosa 8.6
6          5.4         3.9          1.7         0.4  setosa 9.3

But above, I referenced the columns by their column names. How can I use 1 and 2 , which are the column indices to achieve the same result?

Here I have the following, but I feel it's not as elegant.

iris %>% head %>% mutate(sum=apply(select(.,1,2),1,sum))
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
3          4.7         3.2          1.3         0.2  setosa 7.9
4          4.6         3.1          1.5         0.2  setosa 7.7
5          5.0         3.6          1.4         0.2  setosa 8.6
6          5.4         3.9          1.7         0.4  setosa 9.3

Solution 1:

You can try:

iris %>% head %>% mutate(sum = .[[1]] + .[[2]])

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
3          4.7         3.2          1.3         0.2  setosa 7.9
4          4.6         3.1          1.5         0.2  setosa 7.7
5          5.0         3.6          1.4         0.2  setosa 8.6
6          5.4         3.9          1.7         0.4  setosa 9.3

Solution 2:

I'm a bit late to the game, but my personal strategy in cases like this is to write my own tidyverse-compliant function that will do exactly what I want. By tidyverse-compliant, I mean that the first argument of the function is a data frame and that the output is a vector that can be added to the data frame.

sum_cols <- function(x, col1, col2){
   x[[col1]] + x[[col2]]
}

iris %>%
  head %>%
  mutate(sum = sum_cols(x = ., col1 = 1, col2 = 2))

Solution 3:

What do you think about this version?
Inspired by @SavedByJesus's answer.

applySum <- function(df, ...) {
  assertthat::assert_that(...length() > 0, msg = "one or more column indexes are required")
  mutate(df, Sum = apply(as.data.frame(df[, c(...)]), 1, sum))
}

iris %>%
  head(2) %>%
  applySum(1, 2)
#
### output
#
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
#
### you can select and sum more then two columns by the same function
#
iris %>%
  head(2) %>%
  applySum(1, 2, 3, 4)
#
### output
#
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species  Sum
1          5.1         3.5          1.4         0.2  setosa 10.2
2          4.9         3.0          1.4         0.2  setosa  9.5