Extra backslash needed in PHP regexp pattern
When testing an answer for another user's question I found something I don't understand. The problem was to replace all literal \t
\n
\r
characters from a string with a single space.
Now, the first pattern I tried was:
/(?:\\[trn])+/
which surprisingly didn't work. I tried the same pattern in Perl and it worked fine. After some trial and error I found that PHP wants 3 or 4 backslashes for that pattern to match, as in:
/(?:\\\\[trn])+/
or
/(?:\\\[trn])+/
these patterns - to my surprise - both work. Why are these extra backslashes necessary?
You need 4 backslashes to represent 1 in regex because:
- 2 backslashes are used for unescaping in a string (
"\\\\" -> \\
) - 1 backslash is used for unescaping in the regex engine (
\\ -> \
)
From the PHP doc,
escaping any other character will result in the backslash being printed too1
Hence for \\\[
,
- 1 backslash is used for unescaping the
\
, one stay because\[
is invalid ("\\\[" -> \\[
) - 1 backslash is used for unescaping in the regex engine (
\\[ -> \[
)
Yes it works, but not a good practice.
Its works in perl because you pass that directly as regex pattern /(?:\\[trn])+/
but in php, you need to pass as string, so need extra escaping for backslash itself.
"/(?:\\\\[trn])+/"
The regex \ to match a single backslash would become '/\\\\/' as a PHP preg string