How to detect free variable names in R functions [duplicate]

Suppose I have a function:

f <- function() {
  x + 1
}

Here x is a free variable since its value is not defined within function f. Is there a way that I can obtain the variable name, say x, from a defined function, say f?

I am asking this question while maintaining others' old R codes. There are a lot of free variables used, and that makes debugging hard.

Any suggestions are welcomed as well.


Solution 1:

The codetools package has functions for this purpose, eg findGlobals

findGlobals(f, merge=FALSE)[['variables']]
# [1] "x"

if we redefine the function to have a named argument x then no variables are returned.

f2 <- function(x){
  x+1
}
findGlobals(f2, merge=FALSE)[['variables']]
# character(0)

Solution 2:

This is a rough stab at it.

find_vars <- function(f, vars=list(found=character(), defined=names(formals(f)))) {
    if( is.function(f) ) {
        # function, begin search on body
        return(find_vars(body(f), vars))
    } else if (is.call(f) && deparse(f[[1]]) == "<-") {
        # assignment with <- operator
        if (is.recursive(f[[2]])) {
           if (is.call(f[[2]]) && deparse(f[[2]][[1]]) == "$") {
               vars$defined <- unique( c(vars$defined, deparse(f[[2]][[1]])) )  
           } else {
               warning(paste("unable to determine assignments variable in", deparse(f[[2]])))
           }
        } else {
            vars$defined <- unique( c(vars$defined, deparse(f[[2]])) )  
        }
        vars <- find_vars(f[[3]], vars)
    } else if (is.call(f) && deparse(f[[1]]) == "$") {
        # assume "b" is ok in a$b
        vars <- find_vars(f[[2]], vars)
    } else if (is.call(f) && deparse(f[[1]]) == "~") {
        #skip formulas
    } else if (is.recursive(f)) {
        # compound object, iterate through sub-parts
        v <- lapply(as.list(f)[-1], find_vars, vars)
        vars$defined <- unique( c(vars$defined, unlist(sapply(v, `[[`, "defined"))) )
        vars$found <- unique( c(vars$found, unlist(sapply(v, `[[`, "found"))) )
    } else if (is(f, "name")) {
        # standard variable name/symbol
        vars$found <- unique( c(vars$found, deparse(f)))
    }
    vars
}

find_free <- function(f) {
    r <- find_vars(f)
    return(setdiff(r$found, r$defined))
}

Then you could use it like

f <- function() {
  z <- x + 1
  z
}
find_free(f)
# [1] "x"

I'm sure there are many possibilities for a false positives and I didn't do any special coding for functions with non standard evaluation. For example

g <- function(df) {
  with(df, mpg + disp)
}
g(head(mtcars))
# [1] 181 181 131 279 379 243

but

find_free(g)
# [1] "mpg"  "disp"

I already put in a special branch for the $ operator and formulas; you could put in a special branch for functions that have non standard evaluation like with() or subset() or whatever you like. It depends on what your code ends up looking like.

This assumes all assignment is happening via a standard <-. There are other ways to assign variables (ie, assign()) that would go undetected. We also ignore all function calls. So if you call myfun(1), it will not report myfun as being a free variable even though it may potentially be a "free function" defined else where in the code.

So this may not be perfect, but it should act as a decent screen for potential problems.