Sum of rows based on column value
I want to sum rows that have the same value in one column:
> df <- data.frame("1"=c("a","b","a","c","c"), "2"=c(1,5,3,6,2), "3"=c(3,3,4,5,2))
> df
X1 X2 X3
1 a 1 3
2 b 5 3
3 a 3 4
4 c 6 5
5 c 2 2
For one column (X2), the data can be aggregated to get the sums of all rows that have the same X1 value:
> ddply(df, .(X1), summarise, X2=sum(X2))
X1 X2
1 a 4
2 b 5
3 c 8
How do I do the same for X3 and an arbitrary number of other columns except X1?
This is the result I want:
X1 X2 X3
1 a 4 7
2 b 5 3
3 c 8 7
Solution 1:
ddply(df, "X1", numcolwise(sum))
see ?numcolwise
for details and examples.
Solution 2:
aggregate
can easily do this with the formula interface:
aggregate(. ~ X1, data=df, FUN=sum)
## X1 X2 X3
## 1 a 4 7
## 2 b 5 3
## 3 c 8 7
Equivalently:
aggregate(cbind(X2, X3) ~ X1, data=df, FUN=sum)