How to parametrize function calls in dplyr 0.7?
The release of dplyr 0.7 includes a major overhaul of programming with dplyr. I read this document carefully, and I am trying to understand how it will impact my use of dplyr.
Here is a common idiom I use when building reporting and aggregation functions with dplyr:
my_report <- function(data, grouping_vars) {
data %>%
group_by_(.dots=grouping_vars) %>%
summarize(x_mean=mean(x), x_median=median(x), ...)
}
Here, grouping_vars
is a vector of strings.
I like this idiom because I can pass in string vectors from other places, say a file or a Shiny app's reactive UI, but it's also not too bad for interactive work either.
However, in the new programming with dplyr vignette, I see no examples of how something like this can be done with the new dplyr. I only see examples of how passing strings is no longer the correct approach, and I have to use quosures instead.
I'm happy to adopt quosures, but how exactly do I get from strings to the quosures expected by dplyr here? It doesn't seem feasible to expect the entire R ecosystem to provide quosures to dplyr - lots of times we're going to get strings and they'll have to be converted.
Here is an example showing what you're now supposed to do, and how my old idiom doesn't work:
library(dplyr)
grouping_vars <- quo(am)
mtcars %>%
group_by(!!grouping_vars) %>%
summarise(mean_cyl=mean(cyl))
#> # A tibble: 2 × 2
#> am mean_cyl
#> <dbl> <dbl>
#> 1 0 6.947368
#> 2 1 5.076923
grouping_vars <- "am"
mtcars %>%
group_by(!!grouping_vars) %>%
summarise(mean_cyl=mean(cyl))
#> # A tibble: 1 × 2
#> `"am"` mean_cyl
#> <chr> <dbl>
#> 1 am 6.1875
Solution 1:
dplyr
will have a specialized group_by function group_by_at
to deal with multiple grouping variables. It would be much easier to use the new member of the _at
family:
# using the pre-release 0.6.0
cols <- c("am","gear")
mtcars %>%
group_by_at(.vars = cols) %>%
summarise(mean_cyl=mean(cyl))
# Source: local data frame [4 x 3]
# Groups: am [?]
#
# am gear mean_cyl
# <dbl> <dbl> <dbl>
# 1 0 3 7.466667
# 2 0 4 5.000000
# 3 1 4 4.500000
# 4 1 5 6.000000
The .vars
argument accepts both character/numeric vector or column names generated by vars
:
.vars
A list of columns generated by vars(), or a character vector of column names, or a numeric vector of column positions.
Solution 2:
Here's the quick and dirty reference I wrote for myself.
# install.packages("rlang")
library(tidyverse)
dat <- data.frame(cat = sample(LETTERS[1:2], 50, replace = TRUE),
cat2 = sample(LETTERS[3:4], 50, replace = TRUE),
value = rnorm(50))
Representing column names with strings
Convert strings to symbol objects using rlang::sym
and rlang::syms
.
summ_var <- "value"
group_vars <- c("cat", "cat2")
summ_sym <- rlang::sym(summ_var) # capture a single symbol
group_syms <- rlang::syms(group_vars) # creates list of symbols
dat %>%
group_by(!!!group_syms) %>% # splice list of symbols into a function call
summarize(summ = sum(!!summ_sym)) # slice single symbol into call
If you use !!
or !!!
outside of dplyr
functions you will get an error.
The usage of rlang::sym
and rlang::syms
is identical inside functions.
summarize_by <- function(df, summ_var, group_vars) {
summ_sym <- rlang::sym(summ_var)
group_syms <- rlang::syms(group_vars)
df %>%
group_by(!!!group_syms) %>%
summarize(summ = sum(!!summ_sym))
}
We can then call summarize_by
with string arguments.
summarize_by(dat, "value", c("cat", "cat2"))
Using non-standard evaluation for column/variable names
summ_quo <- quo(value) # capture a single variable for NSE
group_quos <- quos(cat, cat2) # capture list of variables for NSE
dat %>%
group_by(!!!group_quos) %>% # use !!! with both quos and rlang::syms
summarize(summ = sum(!!summ_quo)) # use !! both quo and rlang::sym
Inside functions use enquo
rather than quo
. quos
is okay though!?
summarize_by <- function(df, summ_var, ...) {
summ_quo <- enquo(summ_var) # can only capture a single value!
group_quos <- quos(...) # captures multiple values, also inside functions!?
df %>%
group_by(!!!group_quos) %>%
summarize(summ = sum(!!summ_quo))
}
And then our function call is
summarize_by(dat, value, cat, cat2)
Solution 3:
If you want to group by possibly more than one column, you can use quos
grouping_vars <- quos(am, gear)
mtcars %>%
group_by(!!!grouping_vars) %>%
summarise(mean_cyl=mean(cyl))
# am gear mean_cyl
# <dbl> <dbl> <dbl>
# 1 0 3 7.466667
# 2 0 4 5.000000
# 3 1 4 4.500000
# 4 1 5 6.000000
Right now, it doesn't seem like there's a great way to turn strings into quos. Here's one way that does work though
cols <- c("am","gear")
grouping_vars <- rlang::parse_quosures(paste(cols, collapse=";"))
mtcars %>%
group_by(!!!grouping_vars) %>%
summarise(mean_cyl=mean(cyl))
# am gear mean_cyl
# <dbl> <dbl> <dbl>
# 1 0 3 7.466667
# 2 0 4 5.000000
# 3 1 4 4.500000
# 4 1 5 6.000000