dplyr summarise_each with na.rm
Is there a way to instruct dplyr
to use summarise_each
with na.rm=TRUE
? I would like to take the mean of variables with summarise_each("mean")
but I don't know how to specify it to ignore missing values.
Following the links in the doc, it seems you can use funs(mean(., na.rm = TRUE))
:
library(dplyr)
by_species <- iris %>% group_by(Species)
by_species %>% summarise_each(funs(mean(., na.rm = TRUE)))
update
the current dplyr version strongly suggests the use of across
instead of the more specified functions summarise_all
etc.
Translating the below syntax (naming the functions in a named list) into across
could look like this:
library(dplyr)
ggplot2::msleep %>%
select(vore, sleep_total, sleep_rem) %>%
group_by(vore) %>%
summarise(across(everything(), .f = list(mean = mean, max = max, sd = sd), na.rm = TRUE))
#> # A tibble: 5 x 7
#> vore sleep_total_mean sleep_total_max sleep_total_sd sleep_rem_mean
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 carni 10.4 19.4 4.67 2.29
#> 2 herbi 9.51 16.6 4.88 1.37
#> 3 inse~ 14.9 19.9 5.92 3.52
#> 4 omni 10.9 18 2.95 1.96
#> 5 <NA> 10.2 13.7 3.00 1.88
#> # ... with 2 more variables: sleep_rem_max <dbl>, sleep_rem_sd <dbl>
older answer
summarise_each
is deprecated now, here an option with summarise_all
.
- One can still specify
na.rm = TRUE
within thefuns
argument (cf @flodel 's answer: just replacesummarise_each
withsummarise_all
). - But you can also add
na.rm = TRUE
after thefuns
argument.
That is useful when you want to call more than only one function, e.g.:
edit
the funs()
argument is now (soft)deprecated, thanks to comment @Mikko. One can use the suggestions that are given by the warning, see below in the code. na.rm
can still be specified as additional argument within summarise_all
.
I used ggplot2::msleep
because it contains NAs and shows this better.
library(dplyr)
ggplot2::msleep %>%
select(vore, sleep_total, sleep_rem) %>%
group_by(vore) %>%
summarise_all(funs(mean, max, sd), na.rm = TRUE)
#> Warning: funs() is soft deprecated as of dplyr 0.8.0
#> Please use a list of either functions or lambdas:
#>
#> # Simple named list:
#> list(mean = mean, median = median)
#>
#> # Auto named with `tibble::lst()`:
#> tibble::lst(mean, median)
#>
#> # Using lambdas
#> list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
Take for instance mtcars
data set
library(dplyr)
You can always use summarise
to avoid long syntax:
mtcars %>%
group_by(cyl) %>%
summarise(mean_mpg = mean(mpg, na.rm=T),
sd_mpg = sd(mpg, na.rm = T))