preg_match_all for nested element

This is a kind of BB codes. Any idea how to match all elements like [LI]text[/LI] and [UL]text[/UL]?

preg_match_all("/(\[UL].*\[\/UL])|(\[LI].*\[\/LI])/", '[UL][LI]sadas[/LI][/UL]', $match);

Want to receive something like:

0 => "[UL][LI]sadas[/LI][/UL]"
1 => "[UL][LI]sadas[/LI][/UL]"
2 => "[LI]sadas[/LI]"    <--- This is not captured now.

Basically it is about: How to get this [LI]text[/LI] part and not loose [UL]text[/UL] part?


To do that you need 2 things:

  • a recursive subpattern (a subpattern in a capture group that refers to itself)
  • to put this recursive pattern inside a lookahead assertion (because an assertion doesn't consume characters, and, with this trick, you can match several times the same substrings)

~(?=(\[(\w+)]([^[]*(?:(?1)[^[]*)*?)\[/\2]))~

(?=...) is the lookahead assertion. (the current position is followed by ...)
(\[(\w+)]([^[]*(?:(?1)[^[]*)*?)\[/\2]) is the capture group 1.
(?1) refers to the subpattern inside the capture group 1.
\2 refers to the match of the capture group 2 (the tag name).

demo