dplyr - using mutate() like rowmeans()
I can't find the answer anywhere.
I would like to calculate new variable of data frame which is based on mean of rows.
For example:
data <- data.frame(id=c(101,102,103), a=c(1,2,3), b=c(2,2,2), c=c(3,3,3))
I want to use mutate to make variable d which is mean of a,b and c. And I would like to be able to make that by selecting columns in way d=mean(a,b,c), and also I need to use range of variables (like in dplyr) d=mean(a:c).
And of course
mutate(data, c=mean(a,b))
or
mutate(data, c=rowMeans(a,b))
doesn't work.
Can you give me some tip?
Regards
Solution 1:
You're looking for
data %>%
rowwise() %>%
mutate(c=mean(c(a,b)))
# id a b c
# (dbl) (dbl) (dbl) (dbl)
# 1 101 1 2 1.5
# 2 102 2 2 2.0
# 3 103 3 2 2.5
or
library(purrr)
data %>%
rowwise() %>%
mutate(c=lift_vd(mean)(a,b))
Solution 2:
dplyr is badly suited to operate on this kind of data because it assumes tidy data format and — for the problem in question — your data is untidy.
You can of course tidy it first:
tidy_data = tidyr::gather(data, name, value, -id)
Which looks like this:
id name value
1 101 a 1
2 102 a 2
3 103 a 3
4 101 b 2
5 102 b 2
6 103 b 2
…
And then:
tidy_data %>% group_by(id) %>% summarize(mean = mean(value))
name mean
(fctr) (dbl)
1 a 2
2 b 2
3 c 3
Of course this discards the original data. You could use mutate
instead of summarize
to avoid this. Finally, you can then un-tidy your data again:
tidy_data %>%
group_by(id) %>%
mutate(mean = mean(value)) %>%
tidyr::spread(name, value)
id mean a b c
(dbl) (dbl) (dbl) (dbl) (dbl)
1 101 2.000000 1 2 3
2 102 2.333333 2 2 3
3 103 2.666667 3 2 3
Alternatively, you could summarise and then merge the result with the original table:
tidy_data %>%
group_by(id) %>%
summarize(mean = mean(value)) %>%
inner_join(data, by = 'id')
The result is the same in either case. I conceptually prefer the second variant.