What does the "[^][]" regex mean?
[^][]
is a character class that means all characters except [
and ]
.
You can avoid escaping [
and ]
special characters since it is not ambiguous for the PCRE, the regex engine used in preg_
functions.
Since [^]
is incorrect in PCRE, the only way for the regex to parse is that ]
is inside the character class which will be closed later. The same with the [
that follows. It can not reopen a character class (except a POSIX character class [:alnum:]
) inside a character class. Then the last ]
is clear; it is the end of the character class. However, a [
outside a character class must be escaped since it is parsed as the beginning of a character class.
In the same way, you can write []]
or [[]
or [^[]
without escaping the [
or ]
in the character class.
Note: since PHP 7.3, you can use the inline xx modifier that allows blank characters to be ignored even inside character classes. This way you can write these classes in a less ambigous way like that: (?xx) [^ ][ ] [ ] ] [ [ ] [^ [ ]
.
You can use this syntax with several regex flavour: PCRE (PHP, R), Perl, Python, Java, .NET, GO, awk, Tcl (if you delimit your pattern with curly brackets, thanks Donal Fellows), ...
But not with: Ruby, JavaScript (except for IE < 9), ...
As m.buettner noted, [^]]
is not ambiguous because ]
is the first character, [^a]]
is seen as all that is not a a
followed by a ]
. To have a
and ]
, you must write: [^a\]]
or [^]a]
In particular case of JavaScript, the specification allow []
as a regex token that never matches (in other words, []
will always fail) and [^]
as a regex that matches any character. Then [^]]
is seen as any character followed by a ]
. The actual implementation varies, but modern browser generally sticks to the definition in the specification.
Pattern details:
\[ # literal [
(?: # open a non capturing group
[^][] # a character that is not a ] or a [
| # OR
(?R) # the whole pattern (here is the recursion)
)* # repeat zero or more time
\] # a literal ]
In your pattern example, you don't need to escape the last ]
But you can do the same with this pattern a little bit optimized, and more useful cause reusable as subpattern (with the (?-1)
): (\[(?:[^][]+|(?-1))*+])
( # open the capturing group
\[ # a literal [
(?: # open a non-capturing group
[^][]+ # all characters but ] or [ one or more time
| # OR
(?-1) # the last opened capturing group (recursion)
# (the capture group where you are)
)*+ # repeat the group zero or more time (possessive)
] # literal ] (no need to escape)
) # close the capturing group
or better: (\[[^][]*(?:(?-1)[^][]*)*+])
that avoids the cost of an alternation.