Escaped Periods In R Regular Expressions

Unless I am missing something, this regex seems pretty straightforward:

grepl("Processor\.[0-9]+\..*Processor\.Time", names(web02))

However, it doesn't like the escaped periods, \. for which my intent is to be a literal period:

Error: '\.' is an unrecognized escape in character string starting "Processor\."

What am I misunderstanding about this regex syntax?


My R-Fu is weak to the point of being non-existent but I think I know what's up.

The string handling part of the R processor has to peek inside the strings to convert \n and related escape sequences into their character equivalents. R doesn't know what \. means so it complains. You want to get the escaped dot down into the regex engine so you need to get a single \ past the string mangler. The usual way of doing that sort of thing is to escape the escape:

grepl("Processor\\.[0-9]+\\..*Processor\\.Time", names(web02))

Embedding one language (regular expressions) inside another language (R) is usually a bit messy and more so when both languages use the same escaping syntax.


Instead of

\.

Try

\\.

You need to escape the backspace first.


The R-centric way of doing this is using the [::] notation, for example:

grepl("[:.:]", ".")
# [1] TRUE
grepl("[:.:]", "a")
# [1] FALSE

From the docs (?regex):

The metacharacters in extended regular expressions are . \ | ( ) [ { ^ $ * + ?, but note that whether these have a special meaning depends on the context.

[:punct:] Punctuation characters: ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~.