dplyr lazy query mutate column using str_extract

Solution 1:

As @IceCreanToucan states, str_extract is not on dbplyr's list of translations. Hence it will not be able to execute this code on the database. (I assume you are using dbplyr as it is the main package for having dplyr commands translated into SQL).

We can test this as follows:

library(dbplyr)
library(dplyr)
library(stringr)

data(starwars)

# pick your simulated connection type (there are many options, not just what I have shown here)
remote_df = tbl_lazy(starwars, con = simulate_mssql())
remote_df = tbl_lazy(starwars, con = simulate_mysql())
remote_df = tbl_lazy(starwars, con = simulate_postgres())

remote_df %>%
  mutate(substring_col = str_extract(name, "Luke")) %>%
  show_query()

show_query() should return the SQL that our mutate has been translated into. But instead I receive a clear message: "Error: str_extract() is not available in this SQL variant". This makes it clear translation is not defined.

However, there is a translation defined for grep and grepl (etc.) so the following should work:

remote_df %>%
  mutate(substring_col = grepl("Luke", name)) %>%
  show_query()

But it will return you slightly different output.