Non-standard evaluation (NSE) in dplyr's filter_ & pulling data from MySQL

I'd like to pull some data from a sql server with a dynamic filter. I'm using the great R package dplyr in the following way:

#Create the filter
filter_criteria = ~ column1 %in% some_vector
#Connect to the database
connection <- src_mysql(dbname <- "mydbname", 
             user <- "myusername", 
             password <- "mypwd", 
             host <- "myhost") 
#Get data
data <- connection %>%
 tbl("mytable") %>% #Specify which table
 filter_(.dots = filter_criteria) %>% #non standard evaluation filter
 collect() #Pull data

This piece of code works fine but now I'd like to loop it somehow on all the columns of my table, thus I'd like to write the filter as:

#Dynamic filter
i <- 2 #With a loop on this i for instance
which_column <- paste0("column",i)
filter_criteria <- ~ which_column %in% some_vector

And then reapply the first code with the updated filter.

Unfortunately this approach doesn't give the expected results. In fact it does not give any error but doesn't even pull any result into R. In particular, I looked a bit into the SQL query generated by the two pieces of code and there is one important difference.

While the first, working, code generates a query of the form:

SELECT ... FROM ... WHERE 
`column1` IN ....

(` sign in the column name), the second one generates a query of the form:

SELECT ... FROM ... WHERE 
'column1' IN ....

(' sign in the column name)

Does anyone have any suggestion on how to formulate the filtering condition to make it work?


It's not really related to SQL. This example in R does not work either:

df <- data.frame(
     v1 = sample(5, 10, replace = TRUE),
     v2 = sample(5,10, replace = TRUE)
)
df %>% filter_(~ "v1" == 1)

It does not work because you need to pass to filter_ the expression ~ v1 == 1 — not the expression ~ "v1" == 1.

To solve the problem, simply use the quoting operator quo and the dequoting operator !!

library(dplyr)
which_column = quot(v1)
df %>% filter(!!which_column == 1)

An alternative solution, with dplyr version 0.5.0 (probably implemented earlier than that), it is possible to pass a composed string as the .dots argument, which I find more readable than the lazyeval::interp solution:

df <- data.frame(
     v1 = sample(5, 10, replace = TRUE),
     v2 = sample(5,10, replace = TRUE)
)

which_col <- "v1"
which_val <- 1
df %>% filter_(.dots= paste0(which_col, "== ", which_val))

  v1 v2
1  1  1
2  1  2
3  1  4

UPDATE for dplyr 0.6 and later:

packageVersion("dplyr")
# [1] ‘0.5.0.9004’

df %>% filter(UQ(rlang::sym(which_col))==which_val)
#OR
df %>% filter((!!rlang::sym(which_col))==which_val)

(Similar to @Matthew 's response for dplyr 0.6, but I assume that which_col is a string variable.)

2nd UPDATE: Edwin Thoen created a nice cheatsheet for tidy evaluation: https://edwinth.github.io/blog/dplyr-recipes/


Here's a slightly less verbose solution and one which uses the typical behavior of the extract function, '[' in selecting a column by character value rather than converting it to a language element:

df %>% filter(., '['(., which_column)==1 )

set.seed(123)
df <- data.frame(
      v1 = sample(5, 10, replace = TRUE),
      v2 = sample(5,10, replace = TRUE)
 )
which_column <- "v1"
df %>% filter(., '['(., which_column)==1)
#  v1 v2
#1  1  5