Dplyr join on by=(a = b), where a and b are variables containing strings?

You can use

myfn <- function(xname, yname) {
    data(iris)
    inner_join(iris, iris, by=setNames(yname, xname))
}

The suggested syntax in the ?inner_join documentation of

by = c("a"="b")   # same as by = c(a="b")

is slightly misleading because both those values aren't proper character values. You're actually created a named character vector. To dynamically set the values to the left of the equals sign is different from those on the right. You can use setNames() to set the names of the vector dynamically.


I know I'm late to the party, but how about:

myfn <- function(byvar) {
  data(iris)
  inner_join(iris, iris, by=byvar)
}

This way you can do what you want with:

myfn(c("Sepal.Length"="Sepal.Width"))

I like MrFlick's answer and fber's addendum, but I prefer structure. For me setNames feels as something at the end of a pipe, not as an on-the-fly constructor. On another note, both setNames and structure enable the use of variables in the function call.

myfn <- function(xnames, ynames) {
  data(iris)
  inner_join(iris, iris, by = structure(names = xnames, .Data = ynames))
}

x <- "Sepal.Length"

myfn(x, "Sepal.Width")

A named vector argument would run into problems here:

myfn <- function(byvars) {
  data(iris)
  inner_join(iris, iris, by = byvars)
}

x <- "Sepal.Length"

myfn(c(x = "Sepal.Width"))

You could solve that, though, by using setNames or structure in the function call.


I faced a nearly identical challenge as @Peter, but needed to pass multiple different sets of by = join parameters at one time. I chose to use the map() function from the tidyverse package, purrr.

This is the subset of the tidyverse that I used.

library(magrittr)
library(dplyr)
library(rlang)
library(purrr)

First, I adapted myfn to use map() for the case posted by Peter. 42's comment and Felipe Gerard's answer made it clear that the by argument can take a named vector. map() requires a list over which to iterate.

    myfn_2 <- function(xname, yname) {
      by_names <- list(setNames(nm = xname, yname ))

      data(iris)

      # map() returns a single-element list. We index to retrieve dataframe.

      map( .x = by_names, 
           .f = ~inner_join(x = iris, 
                            y = iris, 
                            by = .x)) %>% 
        `[[`(1)
    }

myfn_2("Sepal.Length", "Sepal.Width")

I found that I didn't need quo_name() / !! in building the function.

Then, I adapted the function to take a list of by parameters. For each by_i in by_grps, we could extend x and y to add named values on which to join.

by_grps <- list(  by_1 = list(x = c("Sepal.Length"), y = c("Sepal.Width")), 
                  by_2 = list(x = c("Sepal.Width"), y = c("Petal.Width"))
                )

myfn_3 <- function(by_grps_list, nm_dataset) {
  by_named_vectors_list <- lapply(by_grps_list, 
                                  function(by_grp) setNames(object = by_grp$y,
                                                            nm = by_grp$x))
  map(.x = by_named_vectors_list, 
      .f = ~inner_join(nm_dataset, nm_dataset, by = .x))
}

myfn_3(by_grps, iris)