Sum across multiple columns with dplyr
My question involves summing up values across multiple columns of a data frame and creating a new column corresponding to this summation using dplyr
. The data entries in the columns are binary(0,1). I am thinking of a row-wise analog of the summarise_each
or mutate_each
function of dplyr
. Below is a minimal example of the data frame:
library(dplyr)
df=data.frame(
x1=c(1,0,0,NA,0,1,1,NA,0,1),
x2=c(1,1,NA,1,1,0,NA,NA,0,1),
x3=c(0,1,0,1,1,0,NA,NA,0,1),
x4=c(1,0,NA,1,0,0,NA,0,0,1),
x5=c(1,1,NA,1,1,1,NA,1,0,1))
> df
x1 x2 x3 x4 x5
1 1 1 0 1 1
2 0 1 1 0 1
3 0 NA 0 NA NA
4 NA 1 1 1 1
5 0 1 1 0 1
6 1 0 0 0 1
7 1 NA NA NA NA
8 NA NA NA 0 1
9 0 0 0 0 0
10 1 1 1 1 1
I could use something like:
df <- df %>% mutate(sumrow= x1 + x2 + x3 + x4 + x5)
but this would involve writing out the names of each of the columns. I have like 50 columns. In addition, the column names change at different iterations of the loop in which I want to implement this operation so I would like to try avoid having to give any column names.
How can I do that most efficiently? Any assistance would be greatly appreciated.
dplyr >= 1.0.0 using across
sum up each row using rowSums
(rowwise
works for any aggreation, but is slower)
df %>%
replace(is.na(.), 0) %>%
mutate(sum = rowSums(across(where(is.numeric))))
sum down each column
df %>%
summarise(across(everything(), ~ sum(., is.na(.), 0)))
dplyr < 1.0.0
sum up each row
df %>%
replace(is.na(.), 0) %>%
mutate(sum = rowSums(.[1:5]))
sum down each column using superseeded summarise_all
:
df %>%
replace(is.na(.), 0) %>%
summarise_all(funs(sum))
If you want to sum certain columns only, I'd use something like this:
library(dplyr)
df=data.frame(
x1=c(1,0,0,NA,0,1,1,NA,0,1),
x2=c(1,1,NA,1,1,0,NA,NA,0,1),
x3=c(0,1,0,1,1,0,NA,NA,0,1),
x4=c(1,0,NA,1,0,0,NA,0,0,1),
x5=c(1,1,NA,1,1,1,NA,1,0,1))
df %>% select(x3:x5) %>% rowSums(na.rm=TRUE) -> df$x3x5.total
head(df)
This way you can use dplyr::select
's syntax.