Escaped Periods In R Regular Expressions
Unless I am missing something, this regex seems pretty straightforward:
grepl("Processor\.[0-9]+\..*Processor\.Time", names(web02))
However, it doesn't like the escaped periods, \.
for which my intent is to be a literal period:
Error: '\.' is an unrecognized escape in character string starting "Processor\."
What am I misunderstanding about this regex syntax?
My R-Fu is weak to the point of being non-existent but I think I know what's up.
The string handling part of the R processor has to peek inside the strings to convert \n
and related escape sequences into their character equivalents. R doesn't know what \.
means so it complains. You want to get the escaped dot down into the regex engine so you need to get a single \
past the string mangler. The usual way of doing that sort of thing is to escape the escape:
grepl("Processor\\.[0-9]+\\..*Processor\\.Time", names(web02))
Embedding one language (regular expressions) inside another language (R) is usually a bit messy and more so when both languages use the same escaping syntax.
Instead of
\.
Try
\\.
You need to escape the backspace first.
The R-centric way of doing this is using the [::]
notation, for example:
grepl("[:.:]", ".")
# [1] TRUE
grepl("[:.:]", "a")
# [1] FALSE
From the docs (?regex
):
The metacharacters in extended regular expressions are . \ | ( ) [ { ^ $ * + ?, but note that whether these have a special meaning depends on the context.
[:punct:] Punctuation characters: ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~.