Aggregate multiple columns at once [duplicate]
I have a data-frame likeso:
x <-
id1 id2 val1 val2 val3 val4
1 a x 1 9
2 a x 2 4
3 a y 3 5
4 a y 4 9
5 b x 1 7
6 b y 4 4
7 b x 3 9
8 b y 2 8
I wish to aggregate the above by id1 & id2. I want to be able to get the means for val1, val2, val3, val4 at the same time.
How do i do this?
This is what i currently have but it works just for 1 column:
agg <- aggregate(x$val1, list(id11 = x$id1, id2= x$id2), mean)
names(agg)[3] <- c("val1") # Rename the column
Also, how do i rename the columns which are outputted as means in the same statement given above
Solution 1:
We can use the formula method of aggregate
. The variables on the 'rhs' of ~
are the grouping variables while the .
represents all other variables in the 'df1' (from the example, we assume that we need the mean
for all the columns except the grouping), specify the dataset and the function (mean
).
aggregate(.~id1+id2, df1, mean)
Or we can use summarise_each
from dplyr
after grouping (group_by
)
library(dplyr)
df1 %>%
group_by(id1, id2) %>%
summarise_each(funs(mean))
Or using summarise
with across
(dplyr
devel version - ‘0.8.99.9000’
)
df1 %>%
group_by(id1, id2) %>%
summarise(across(starts_with('val'), mean))
Or another option is data.table
. We convert the 'data.frame' to 'data.table' (setDT(df1)
, grouped by 'id1' and 'id2', we loop through the subset of data.table (.SD
) and get the mean
.
library(data.table)
setDT(df1)[, lapply(.SD, mean), by = .(id1, id2)]
data
df1 <- structure(list(id1 = c("a", "a", "a", "a", "b", "b",
"b", "b"
), id2 = c("x", "x", "y", "y", "x", "y", "x", "y"),
val1 = c(1L,
2L, 3L, 4L, 1L, 4L, 3L, 2L), val2 = c(9L, 4L, 5L, 9L, 7L, 4L,
9L, 8L)), .Names = c("id1", "id2", "val1", "val2"),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8"))
Solution 2:
You could try:
agg <- aggregate(list(x$val1, x$val2, x$val3, x$val4), by = list(x$id1, x$id2), mean)