Any decent PHP parser written in PHP? [closed]

I do lots of work manipulating and analyzing PHP code. Normally I just use the Tokenizer to do this. For most applications this is sufficient. But sometimes parsing using a lexer just isn't reliable enough (obviously).

Thus I am looking for some PHP parser written in PHP. I found hnw/PhpParser and kumatch/stagehand-php-parser. Both are created by an automated conversion of zend_language_parser.y to a .y file with PHP instead of C (and then compiled to a LALR(1) parser). But this automated conversion just can't be worked with.

So, is there any decent PHP parser written in PHP? (I need one for PHP 5.2 and one for 5.3. But just one of them would be a good starting point, too.)


Solution 1:

After no complete and stable parser was found here I decided to write one myself. Here is the result:

PHP-Parser: A PHP parser written in PHP

The project supports parsing code written for any PHP version between PHP 5.2 and PHP 8.0.

Apart from the parser itself the library provides some related components:

  • Compilation of the AST back to PHP ("pretty printing")
  • Infrastructure for traversing and changing the AST
  • Serialization to and from XML (as well as dumping in a human readable form)
  • Resolution of namespaced names (aliases etc.)

For an usage overview see the "Usage of basic components" section of the documentation.

Solution 2:

This isn't going to be a great option for you, as it violates the pure-PHP constraint, but:

A while ago, the php-internals folks decided that they would switch to Lemon as their parsing technology. There's a branch in the PHP svn repo that contains the required changes.

They decided not to continue with this, as they found that their Lemon solution is about 10-15% slower. But, the branch is still there.

There's an older Lemon parser written as a PHP extension. You might be able to work with it. There's also this PEAR package. There's also this other lemon package (via this blog post about PGN).

Of course, even if you get it working, I'm not sure what you'd do with the data, or what the data even looks like.

Another wacky option would be peeking at Quercus, a PHP implementation in Java. They'd have to have written a parser, maybe it might be worth investigating.