preg_match_all for nested element
This is a kind of BB codes. Any idea how to match all elements like [LI]text[/LI] and [UL]text[/UL]?
preg_match_all("/(\[UL].*\[\/UL])|(\[LI].*\[\/LI])/", '[UL][LI]sadas[/LI][/UL]', $match);
Want to receive something like:
0 => "[UL][LI]sadas[/LI][/UL]"
1 => "[UL][LI]sadas[/LI][/UL]"
2 => "[LI]sadas[/LI]" <--- This is not captured now.
Basically it is about: How to get this [LI]text[/LI] part and not loose [UL]text[/UL] part?
To do that you need 2 things:
- a recursive subpattern (a subpattern in a capture group that refers to itself)
- to put this recursive pattern inside a lookahead assertion (because an assertion doesn't consume characters, and, with this trick, you can match several times the same substrings)
~(?=(\[(\w+)]([^[]*(?:(?1)[^[]*)*?)\[/\2]))~
(?=...)
is the lookahead assertion. (the current position is followed by ...)(\[(\w+)]([^[]*(?:(?1)[^[]*)*?)\[/\2])
is the capture group 1.(?1)
refers to the subpattern inside the capture group 1.\2
refers to the match of the capture group 2 (the tag name).
demo