Why do regex constructors need to be double escaped?
In the regex below, \s
denotes a space character. I imagine the regex parser, is going through the string and sees \
and knows that the next character is special.
But this is not the case as double escapes are required.
Why is this?
var res = new RegExp('(\\s|^)' + foo).test(moo);
Is there a concrete example of how a single escape could be mis-interpreted as something else?
You are constructing the regular expression by passing a string to the RegExp constructor.
\
is an escape character in string literals.
The \
is consumed by the string literal parsing…
const foo = "foo";
const string = '(\s|^)' + foo;
console.log(string);
… so the data you pass to the RegEx compiler is a plain s
and not \s
.
You need to escape the \
to express the \
as data instead of being an escape character itself.
Inside the code where you're creating a string, the backslash is a javascript escape character first, which means the escape sequences like \t
, \n
, \"
, etc. will be translated into their javascript counterpart (tab, newline, quote, etc.), and that will be made a part of the string. Double-backslash represents a single backslash in the actual string itself, so if you want a backslash in the string, you escape that first.
So when you generate a string by saying var someString = '(\\s|^)'
, what you're really doing is creating an actual string with the value (\s|^)
.
The Regex needs a string representation of \s
, which in JavaScript can be produced using the literal "\\s"
.
Here's a live example to illustrate why "\s"
is not enough:
alert("One backslash: \s\nDouble backslashes: \\s");
Note how an extra \
before \s
changes the output.