Take Sum of a Variable if Combination of Values in Two Other Columns are Unique [duplicate]
Solution 1:
We could either use the base R
method by first sort
ing the first two columns by row. We use apply
with MARGIN=1
to do that, transpose the output, convert to 'data.frame' to create 'df1', use the formula method of aggregate
to get the sum
of 'num_email' grouped by the first two columns of the transformed dataset.
df1 <- data.frame(t(apply(df[1:2], 1, sort)), df[3])
aggregate(num_email~., df1, FUN=sum)
# X1 X2 num_email
# 1 Beth Mable 2
# 2 Beth Susan 3
# 3 Mable Susan 1
Or using data.table
, we convert the first two columns to character
class, unname
to change the column names of the first two columns to the default 'V1', 'V2', and convert to 'data.table'. Using the lexicographic ordering of character columns, we create the logical index for i (V1 > V2
), assign (:=
) the columns that meet the condition by reversing the order of columns (.(V2, V1)
), and get the sum
of 'num_email' grouped by 'V1', 'V2'.
library(data.table)
dt = do.call(data.table, c(lapply(unname(df[1:2]), as.character), df[3]))
dt[V1 > V2, c("V1", "V2") := .(V2, V1)]
dt[, .(num_email = sum(num_email)), by= .(V1, V2)]
# V1 V2 num_email
# 1: Beth Mable 2
# 2: Beth Susan 3
# 3: Mable Susan 1
Or using dplyr
, we use mutate_each
to convert the columns to character
class, then reverse the order with pmin
and pmax
, group by 'V1', 'V2' and get the sum
of 'num_email'.
library(dplyr)
df %>%
mutate_each(funs(as.character), senders, receivers) %>%
mutate( V1 = pmin(senders, receivers),
V2 = pmax(senders, receivers) ) %>%
group_by(V1, V2) %>%
summarise(num_email=sum(num_email))
# V1 V2 num_email
# (chr) (chr) (dbl)
# 1 Beth Mable 2
# 2 Beth Susan 3
# 3 Mable Susan 1
NOTE: The data.table
solution was updated by @Frank.