Right way to escape backslash [ \ ] in PHP regex?

// PHP 5.4.1

// Either three or four \ can be used to match a '\'.
echo preg_match( '/\\\/', '\\' );        // 1
echo preg_match( '/\\\\/', '\\' );       // 1

// Match two backslashes `\\`.
echo preg_match( '/\\\\\\/', '\\\\' );   // Warning: No ending delimiter '/' found
echo preg_match( '/\\\\\\\/', '\\\\' );  // 1
echo preg_match( '/\\\\\\\\/', '\\\\' ); // 1

// Match one backslash using a character class.
echo preg_match( '/[\\]/', '\\' );       // 0
echo preg_match( '/[\\\]/', '\\' );      // 1  
echo preg_match( '/[\\\\]/', '\\' );     // 1

When using three backslashes to match a '\' the pattern below is interpreted as match a '\' followed by an 's'.

echo preg_match( '/\\\\s/', '\\ ' );    // 0  
echo preg_match( '/\\\\s/', '\\s' );    // 1  

When using four backslashes to match a '\' the pattern below is interpreted as match a '\' followed by a space character.

echo preg_match( '/\\\\\s/', '\\ ' );   // 1
echo preg_match( '/\\\\\s/', '\\s' );   // 0

The same applies if inside a character class.

echo preg_match( '/[\\\\s]/', ' ' );   // 0 
echo preg_match( '/[\\\\\s]/', ' ' );  // 1 

None of the above results are affected by enclosing the strings in double instead of single quotes.

Conclusions:
Whether inside or outside a bracketed character class, a literal backslash can be matched using just three backslashes '\\\' unless the next character in the pattern is also backslashed, in which case the literal backslash must be matched using four backslashes.

Recommendation:
Always use four backslashes '\\\\' in a regex pattern when seeking to match a backslash.

Escape sequences.


To avoid this kind of unclear code you can use \x5c Like this :)

echo preg_replace( '/\x5c\w+\.php$/i', '<b>${0}</b>', __FILE__ );

The thing is, you're using a character class, [], so it doesn't matter how many literal backslashes are embedded in it, it'll be treated as a single backslash.

e.g. the following two regexes:

/[a]/
/[aa]/

are for all intents and purposes identical as far as the regex engine is concerned. Character classes take a list of characters and "collapse" them down to match a single character, along the lines of "for the current character being considered, is it any of the characters listed inside the []?". If you list two backslashes in the class, then it'll be "is the char a blackslash or is it a backslash?".


I've studied this years ago. That's because 1st backslash escapes the 2nd one and they together form a 'true baclkslash' character in pattern and this true one escapes the 3rd one. So it magically makes 3 backslashes work.

However, normal suggestion is to use 4 backslashes instead of the ambiguous 3 backslashes.

If I'm wrong about anything, please feel free to correct me.