Is there a good Python library that can parse C++? [closed]
Not an answer as such, but just to demonstrate how hard parsing C++ correctly actually is. My favorite demo:
template<bool> struct a_t;
template<> struct a_t<true> {
template<int> struct b {};
};
template<> struct a_t<false> {
enum { b };
};
typedef a_t<sizeof(void*)==sizeof(int)> a;
enum { c, d };
int main() {
a::b<c>d; // declaration or expression?
}
This is perfectly valid, standard-compliant C++, but the exact meaning of commented line depends on your implementation. If sizeof(void*)==sizeof(int)
(typical on 32-bit platforms), it is a declaration of local variable d
of type a::b<c>
. If the condition doesn't hold, then it is a no-op expression ((a::b < c) > d)
. Adding a constructor for a::b
will actually let you expose the difference via presence/absence of side effects.
C++ is notoriously hard to parse. Most people who try to do this properly end up taking apart a compiler. In fact this is (in part) why LLVM started: Apple needed a way they could parse C++ for use in XCode that matched the way the compiler parsed it.
That's why there are projects like GCC_XML which you could combine with a python xml library.
Some non-compiler projects that seem to do a pretty good job at parsing C++ are:
- Eclipse CDT
- OpenGrok
- Doxygen
For many years I've been using pygccxml, which is a very nice Python wrapper around GCC-XML. It's a very full featured package that forms the basis of some well used code-generation tools out there such as py++ which is from the same author.
You won't find a drop-in Python library to do this. Parsing C++ is fiddly, and few parsers have been written that aren't part of a compiler. You can find a good summary of the issues here.
The best bet might be clang, as its C++ support is well-established. Though this is not a Python solution, it sounds as though it would be amenable to re-use within a Python wrapper, given the emphasis on encapsulation and good design in its development.
Pycparser is a complete and functional parser for ANSI C. Perhaps you can extend it to c++ :-)