Is there an R function to escape a string for regex characters

I've written an R version of Perl's quotemeta function:

library(stringr)
quotemeta <- function(string) {
  str_replace_all(string, "(\\W)", "\\\\\\1")
}

I always use the perl flavor of regexps, so this works for me. I don't know whether it works for the "normal" regexps in R.

Edit: I found the source explaining why this works. It's in the Quoting Metacharacters section of the perlre manpage:

This was once used in a common idiom to disable or quote the special meanings of regular expression metacharacters in a string that you want to use for a pattern. Simply quote all non-"word" characters:

$pattern =~ s/(\W)/\\$1/g;

As you can see, the R code above is a direct translation of this same substitution (after a trip through backslash hell). The manpage also says (emphasis mine):

Unlike some other regular expression languages, there are no backslashed symbols that aren't alphanumeric.

which reinforces my point that this solution is only guaranteed for PCRE.


Apparently there is a function called escapeRegex in the Hmisc package. The function itself has the following definition for an input value of 'string':

gsub("([.|()\\^{}+$*?]|\\[|\\])", "\\\\\\1", string)

My previous answer:

I'm not sure if there is a built in function but you could make one to do what you want. This basically just creates a vector of the values you want to replace and a vector of what you want to replace them with and then loops through those making the necessary replacements.

re.escape <- function(strings){
    vals <- c("\\\\", "\\[", "\\]", "\\(", "\\)", 
              "\\{", "\\}", "\\^", "\\$","\\*", 
              "\\+", "\\?", "\\.", "\\|")
    replace.vals <- paste0("\\\\", vals)
    for(i in seq_along(vals)){
        strings <- gsub(vals[i], replace.vals[i], strings)
    }
    strings
}

Some output

> test.strings <- c("What the $^&(){}.*|?", "foo[bar]")
> re.escape(test.strings)
[1] "What the \\$\\^&\\(\\)\\{\\}\\.\\*\\|\\?"
[2] "foo\\[bar\\]"  

An easier way than @ryanthompson function is to simply prepend \\Q and postfix \\E to your string. See the help file ?base::regex.


Use the rex package

These days, I write all my regular expressions using rex. For your specific example, rex does exactly what you want:

library(rex)
library(assertthat)
x = "foo[bar]"
y = rex(x)
assert_that(y == "foo\\[bar\\]")

But of course, rex does a lot more than that. The question mentions building a regex, and that's exactly what rex is designed for. For example, suppose we wanted to match the exact string in x, with nothing before or after:

x = "foo[bar]"
y = rex(start, x, end)

Now y is ^foo\[bar\]$ and will only match the exact string contained in x.