get human readable AST from c++ code

In order to get a better understanding of some of the details of the C++ language and grammer, I would love to be able to write a small C++ program, and see the AST that a compiler generates from that.

It looks like clang had this feature in the past (-emit-asm), but it has removed.

Is there an easy way to do this today?


Solution 1:

Here's two examples, a simple one and a nasty one (C++'s "most vexing parse").

A simple Fibonacci program from http://talkbinary.com/programming/c/fibonacci-in-c/ parsed as C++ code:

int fib(int n) {
if ( n == 0 || n == 1 ) 
    return n;

int fib1 = 0; 
int fib2 = 1;
int fib = 0;

for ( int i = 2; i < n; i++ ) 
{
    fib = fib1 + fib2;
    fib1 = fib2;
    fib2 = fib;
}

return fib;
}

Our DMS Software Reengineering Toolkit (with full C++11/17 parser) produces the following AST:

C:\DMS\Domains\Cpp\GCC4\Tools\Parser\Source>run ..\DomainParser ++AST "C:\temp\fibonacci.cpp"
Cpp~GCC4 Domain Parser Version 2.5.15
Copyright (C) 1996-2013 Semantic Designs, Inc; All Rights Reserved; SD Confidential
Powered by DMS (R) Software Reengineering Toolkit
118 tree nodes in tree.
(translation_unit@Cpp~GCC4=2#4dc3a00^0 Line 1 Column 1 File C:/temp/fibonacci.cpp
 (function_definition@Cpp~GCC4=1615#4dc39e0^1#4dc3a00:1 Line 1 Column 1 File C:/temp/fibonacci.cpp
  (function_head@Cpp~GCC4=1627#4d89de0^1#4dc39e0:1 Line 1 Column 1 File C:/temp/fibonacci.cpp
   (simple_type_specifier@Cpp~GCC4=1109#4d89920^1#4d89de0:1 Line 1 Column 1 File C:/temp/fibonacci.cpp
   |('int'@Cpp~GCC4=2760#4d898c0^1#4d89920:1[Keyword:0] Line 1 Column 1 File C:/temp/fibonacci.cpp)'int'
   )simple_type_specifier
   (noptr_declarator@Cpp~GCC4=1401#4d89a80^1#4d89de0:2 Line 1 Column 5 File C:/temp/fibonacci.cpp
   |(IDENTIFIER@Cpp~GCC4=2645#4d89900^1#4d89a80:1[`fib'] Line 1 Column 5 File C:/temp/fibonacci.cpp)IDENTIFIER
   |('('@Cpp~GCC4=2885#4d89ac0^1#4d89a80:2[Keyword:0] Line 1 Column 8 File C:/temp/fibonacci.cpp)'('
   |(parameter_declaration@Cpp~GCC4=1590#4d89e00^1#4d89a80:3 Line 1 Column 9 File C:/temp/fibonacci.cpp
   | (simple_type_specifier@Cpp~GCC4=1109#4d89b20^1#4d89e00:1 Line 1 Column 9 File C:/temp/fibonacci.cpp
   |  ('int'@Cpp~GCC4=2760#4d89d60^1#4d89b20:1[Keyword:0] Line 1 Column 9 File C:/temp/fibonacci.cpp)'int'
   | )simple_type_specifier
   | (IDENTIFIER@Cpp~GCC4=2645#4d89b00^1#4d89e00:2[`n'] Line 1 Column 13 File C:/temp/fibonacci.cpp)IDENTIFIER
   |)parameter_declaration
   |(')'@Cpp~GCC4=2886#4d89bc0^1#4d89a80:4[Keyword:0] Line 1 Column 14 File C:/temp/fibonacci.cpp)')'
   |(function_qualifiers@Cpp~GCC4=1417#4d89dc0^1#4d89a80:5 Line 1 Column 16 File C:/temp/fibonacci.cpp)function_qualifiers
   )noptr_declarator
  )function_head
  (compound_statement@Cpp~GCC4=872#4dc39c0^1#4dc39e0:2 Line 1 Column 16 File C:/temp/fibonacci.cpp
   ('{'@Cpp~GCC4=2938#4d89f20^1#4dc39c0:1[Keyword:0] Line 1 Column 16 File C:/temp/fibonacci.cpp)'{'
   (statement_seq@Cpp~GCC4=876#4dc3060^1#4dc39c0:2 Line 2 Column 5 File C:/temp/fibonacci.cpp
   |(statement_seq@Cpp~GCC4=876#4dc3920^1#4dc3060:1 Line 2 Column 5 File C:/temp/fibonacci.cpp
   | (statement_seq@Cpp~GCC4=876#4dc2880^1#4dc3920:1 Line 2 Column 5 File C:/temp/fibonacci.cpp
   |  (statement_seq@Cpp~GCC4=876#4dc2700^1#4dc2880:1 Line 2 Column 5 File C:/temp/fibonacci.cpp
   |   (statement_seq@Cpp~GCC4=876#4dc2640^1#4dc2700:1 Line 2 Column 5 File C:/temp/fibonacci.cpp
   |   |(selection_statement@Cpp~GCC4=892#4dc25c0^1#4dc2640:1 Line 2 Column 5 File C:/temp/fibonacci.cpp
   |   | ('if'@Cpp~GCC4=2753#4d89f40^1#4dc25c0:1[Keyword:0] Line 2 Column 5 File C:/temp/fibonacci.cpp)'if'
   |   | ('('@Cpp~GCC4=2885#4d89f60^1#4dc25c0:2[Keyword:0] Line 2 Column 8 File C:/temp/fibonacci.cpp)'('
   |   | (logical_or_expression@Cpp~GCC4=763#4dc2220^1#4dc25c0:3 Line 2 Column 10 File C:/temp/fibonacci.cpp
   |   |  (equality_expression@Cpp~GCC4=696#4d89fa0^1#4dc2220:1 Line 2 Column 10 File C:/temp/fibonacci.cpp
   |   |   (IDENTIFIER@Cpp~GCC4=2645#4d89d80^1#4d89fa0:1[`n'] Line 2 Column 10 File C:/temp/fibonacci.cpp)IDENTIFIER
   |   |   ('=='@Cpp~GCC4=2918#4d89da0^1#4d89fa0:2[Keyword:0] Line 2 Column 12 File C:/temp/fibonacci.cpp)'=='
   |   |   (INT_LITERAL@Cpp~GCC4=2809#4d89fe0^1#4d89fa0:3[0] Line 2 Column 15 File C:/temp/fibonacci.cpp)INT_LITERAL
   |   |  )equality_expression
   |   |  ('||'@Cpp~GCC4=2922#4d89f80^1#4dc2220:2[Keyword:0] Line 2 Column 17 File C:/temp/fibonacci.cpp)'||'
   |   |  (equality_expression@Cpp~GCC4=696#4dc2200^1#4dc2220:3 Line 2 Column 20 File C:/temp/fibonacci.cpp
   |   |   (IDENTIFIER@Cpp~GCC4=2645#4dc2180^1#4dc2200:1[`n'] Line 2 Column 20 File C:/temp/fibonacci.cpp)IDENTIFIER
   |   |   ('=='@Cpp~GCC4=2918#4dc21a0^1#4dc2200:2[Keyword:0] Line 2 Column 22 File C:/temp/fibonacci.cpp)'=='
   |   |   (INT_LITERAL@Cpp~GCC4=2809#4dc21c0^1#4dc2200:3[1] Line 2 Column 25 File C:/temp/fibonacci.cpp)INT_LITERAL
   |   |  )equality_expression
   |   | )logical_or_expression
   |   | (')'@Cpp~GCC4=2886#4dc21e0^1#4dc25c0:4[Keyword:0] Line 2 Column 27 File C:/temp/fibonacci.cpp)')'
   |   | (jump_statement@Cpp~GCC4=983#4dc2440^1#4dc25c0:5 Line 3 Column 9 File C:/temp/fibonacci.cpp
   |   |  ('return'@Cpp~GCC4=2778#4dc2340^1#4dc2440:1[Keyword:0] Line 3 Column 9 File C:/temp/fibonacci.cpp)'return'
   |   |  (IDENTIFIER@Cpp~GCC4=2645#4dc2360^1#4dc2440:2[`n'] Line 3 Column 16 File C:/temp/fibonacci.cpp)IDENTIFIER
   |   |  (';'@Cpp~GCC4=2937#4dc2400^1#4dc2440:3[Keyword:0] Line 3 Column 17 File C:/temp/fibonacci.cpp)';'
   |   | )jump_statement
   |   |)selection_statement
   |   |(simple_declaration@Cpp~GCC4=1033#4dc2620^1#4dc2640:2 Line 5 Column 5 File C:/temp/fibonacci.cpp
   |   | (simple_type_specifier@Cpp~GCC4=1109#4dc23c0^1#4dc2620:1 Line 5 Column 5 File C:/temp/fibonacci.cpp
   |   |  ('int'@Cpp~GCC4=2760#4dc2520^1#4dc23c0:1[Keyword:0] Line 5 Column 5 File C:/temp/fibonacci.cpp)'int'
   |   | )simple_type_specifier
   |   | (init_declarator@Cpp~GCC4=1380#4dc23a0^1#4dc2620:2 Line 5 Column 9 File C:/temp/fibonacci.cpp
   |   |  (IDENTIFIER@Cpp~GCC4=2645#4dc2460^1#4dc23a0:1[`fib1'] Line 5 Column 9 File C:/temp/fibonacci.cpp)IDENTIFIER
   |   |  (initializer@Cpp~GCC4=1639#4dc2380^1#4dc23a0:2 Line 5 Column 14 File C:/temp/fibonacci.cpp
   |   |   ('='@Cpp~GCC4=2893#4dc23e0^1#4dc2380:1[Keyword:0] Line 5 Column 14 File C:/temp/fibonacci.cpp)'='
   |   |   (INT_LITERAL@Cpp~GCC4=2809#4dc2680^1#4dc2380:2[0] Line 5 Column 16 File C:/temp/fibonacci.cpp)INT_LITERAL
   |   |  )initializer
   |   | )init_declarator
   |   | (';'@Cpp~GCC4=2937#4dc26a0^1#4dc2620:3[Keyword:0] Line 5 Column 17 File C:/temp/fibonacci.cpp)';'
   |   |)simple_declaration
   |   )statement_seq
   |   (simple_declaration@Cpp~GCC4=1033#4dc25a0^1#4dc2700:2 Line 6 Column 5 File C:/temp/fibonacci.cpp
   |   |(simple_type_specifier@Cpp~GCC4=1109#4dc26c0^1#4dc25a0:1 Line 6 Column 5 File C:/temp/fibonacci.cpp
   |   | ('int'@Cpp~GCC4=2760#4dc2600^1#4dc26c0:1[Keyword:0] Line 6 Column 5 File C:/temp/fibonacci.cpp)'int'
   |   |)simple_type_specifier
   |   |(init_declarator@Cpp~GCC4=1380#4dc2560^1#4dc25a0:2 Line 6 Column 9 File C:/temp/fibonacci.cpp
   |   | (IDENTIFIER@Cpp~GCC4=2645#4dc2660^1#4dc2560:1[`fib2'] Line 6 Column 9 File C:/temp/fibonacci.cpp)IDENTIFIER
   |   | (initializer@Cpp~GCC4=1639#4dc2540^1#4dc2560:2 Line 6 Column 14 File C:/temp/fibonacci.cpp
   |   |  ('='@Cpp~GCC4=2893#4dc26e0^1#4dc2540:1[Keyword:0] Line 6 Column 14 File C:/temp/fibonacci.cpp)'='
   |   |  (INT_LITERAL@Cpp~GCC4=2809#4dc2740^1#4dc2540:2[1] Line 6 Column 16 File C:/temp/fibonacci.cpp)INT_LITERAL
   |   | )initializer
   |   |)init_declarator
   |   |(';'@Cpp~GCC4=2937#4dc2760^1#4dc25a0:3[Keyword:0] Line 6 Column 17 File C:/temp/fibonacci.cpp)';'
   |   )simple_declaration
   |  )statement_seq
   |  (simple_declaration@Cpp~GCC4=1033#4dc2860^1#4dc2880:2 Line 7 Column 5 File C:/temp/fibonacci.cpp
   |   (simple_type_specifier@Cpp~GCC4=1109#4dc27c0^1#4dc2860:1 Line 7 Column 5 File C:/temp/fibonacci.cpp
   |   |('int'@Cpp~GCC4=2760#4dc2580^1#4dc27c0:1[Keyword:0] Line 7 Column 5 File C:/temp/fibonacci.cpp)'int'
   |   )simple_type_specifier
   |   (init_declarator@Cpp~GCC4=1380#4dc2820^1#4dc2860:2 Line 7 Column 9 File C:/temp/fibonacci.cpp
   |   |(IDENTIFIER@Cpp~GCC4=2645#4dc2720^1#4dc2820:1[`fib'] Line 7 Column 9 File C:/temp/fibonacci.cpp)IDENTIFIER
   |   |(initializer@Cpp~GCC4=1639#4dc2800^1#4dc2820:2 Line 7 Column 13 File C:/temp/fibonacci.cpp
   |   | ('='@Cpp~GCC4=2893#4dc27e0^1#4dc2800:1[Keyword:0] Line 7 Column 13 File C:/temp/fibonacci.cpp)'='
   |   | (INT_LITERAL@Cpp~GCC4=2809#4dc28c0^1#4dc2800:2[0] Line 7 Column 15 File C:/temp/fibonacci.cpp)INT_LITERAL
   |   |)initializer
   |   )init_declarator
   |   (';'@Cpp~GCC4=2937#4dc28e0^1#4dc2860:3[Keyword:0] Line 7 Column 16 File C:/temp/fibonacci.cpp)';'
   |  )simple_declaration
   | )statement_seq
   | (iteration_statement@Cpp~GCC4=941#4dc3980^1#4dc3920:2 Line 9 Column 5 File C:/temp/fibonacci.cpp
   |  ('for'@Cpp~GCC4=2749#4dc2840^1#4dc3980:1[Keyword:0] Line 9 Column 5 File C:/temp/fibonacci.cpp)'for'
   |  ('('@Cpp~GCC4=2885#4dc28a0^1#4dc3980:2[Keyword:0] Line 9 Column 9 File C:/temp/fibonacci.cpp)'('
   |  (simple_declaration@Cpp~GCC4=1033#4dc2a20^1#4dc3980:3 Line 9 Column 11 File C:/temp/fibonacci.cpp
   |   (simple_type_specifier@Cpp~GCC4=1109#4dc2940^1#4dc2a20:1 Line 9 Column 11 File C:/temp/fibonacci.cpp
   |   |('int'@Cpp~GCC4=2760#4dc2900^1#4dc2940:1[Keyword:0] Line 9 Column 11 File C:/temp/fibonacci.cpp)'int'
   |   )simple_type_specifier
   |   (init_declarator@Cpp~GCC4=1380#4dc29e0^1#4dc2a20:2 Line 9 Column 15 File C:/temp/fibonacci.cpp
   |   |(IDENTIFIER@Cpp~GCC4=2645#4dc2920^1#4dc29e0:1[`i'] Line 9 Column 15 File C:/temp/fibonacci.cpp)IDENTIFIER
   |   |(initializer@Cpp~GCC4=1639#4dc29c0^1#4dc29e0:2 Line 9 Column 17 File C:/temp/fibonacci.cpp
   |   | ('='@Cpp~GCC4=2893#4dc2960^1#4dc29c0:1[Keyword:0] Line 9 Column 17 File C:/temp/fibonacci.cpp)'='
   |   | (INT_LITERAL@Cpp~GCC4=2809#4dc2a40^1#4dc29c0:2[2] Line 9 Column 19 File C:/temp/fibonacci.cpp)INT_LITERAL
   |   |)initializer
   |   )init_declarator
   |   (';'@Cpp~GCC4=2937#4dc29a0^1#4dc2a20:3[Keyword:0] Line 9 Column 20 File C:/temp/fibonacci.cpp)';'
   |  )simple_declaration
   |  (relational_expression@Cpp~GCC4=684#4dc2b60^1#4dc3980:4 Line 9 Column 22 File C:/temp/fibonacci.cpp
   |   (IDENTIFIER@Cpp~GCC4=2645#4dc2a00^1#4dc2b60:1[`i'] Line 9 Column 22 File C:/temp/fibonacci.cpp)IDENTIFIER
   |   ('<'@Cpp~GCC4=2899#4dc2a80^1#4dc2b60:2[Keyword:0] Line 9 Column 24 File C:/temp/fibonacci.cpp)'<'
   |   (IDENTIFIER@Cpp~GCC4=2645#4dc2aa0^1#4dc2b60:3[`n'] Line 9 Column 26 File C:/temp/fibonacci.cpp)IDENTIFIER
   |  )relational_expression
   |  (';'@Cpp~GCC4=2937#4dc2b40^1#4dc3980:5[Keyword:0] Line 9 Column 27 File C:/temp/fibonacci.cpp)';'
   |  (postfix_expression@Cpp~GCC4=406#4dc2b20^1#4dc3980:6 Line 9 Column 29 File C:/temp/fibonacci.cpp
   |   (IDENTIFIER@Cpp~GCC4=2645#4dc2e60^1#4dc2b20:1[`i'] Line 9 Column 29 File C:/temp/fibonacci.cpp)IDENTIFIER
   |   ('++'@Cpp~GCC4=2897#4dc2ae0^1#4dc2b20:2[Keyword:0] Line 9 Column 30 File C:/temp/fibonacci.cpp)'++'
   |  )postfix_expression
   |  (')'@Cpp~GCC4=2886#4dc2b00^1#4dc3980:7[Keyword:0] Line 9 Column 33 File C:/temp/fibonacci.cpp)')'
   |  (compound_statement@Cpp~GCC4=872#4dc38e0^1#4dc3980:8 Line 10 Column 5 File C:/temp/fibonacci.cpp
   |   ('{'@Cpp~GCC4=2938#4dc2b80^1#4dc38e0:1[Keyword:0] Line 10 Column 5 File C:/temp/fibonacci.cpp)'{'
   |   (statement_seq@Cpp~GCC4=876#4dc3900^1#4dc38e0:2 Line 11 Column 9 File C:/temp/fibonacci.cpp
   |   |(statement_seq@Cpp~GCC4=876#4dc2f20^1#4dc3900:1 Line 11 Column 9 File C:/temp/fibonacci.cpp
   |   | (expression_statement@Cpp~GCC4=869#4dc3080^1#4dc2f20:1 Line 11 Column 9 File C:/temp/fibonacci.cpp
   |   |  (assignment_expression@Cpp~GCC4=809#4dc33a0^1#4dc3080:1 Line 11 Column 9 File C:/temp/fibonacci.cpp
   |   |   (IDENTIFIER@Cpp~GCC4=2645#4dc2ba0^1#4dc33a0:1[`fib'] Line 11 Column 9 File C:/temp/fibonacci.cpp)IDENTIFIER
   |   |   ('='@Cpp~GCC4=2893#4dc2bc0^1#4dc33a0:2[Keyword:0] Line 11 Column 13 File C:/temp/fibonacci.cpp)'='
   |   |   (additive_expression@Cpp~GCC4=593#4dc3000^1#4dc33a0:3 Line 11 Column 15 File C:/temp/fibonacci.cpp
   |   |   |(IDENTIFIER@Cpp~GCC4=2645#4dc2be0^1#4dc3000:1[`fib1'] Line 11 Column 15 File C:/temp/fibonacci.cpp)IDENTIFIER
   |   |   |('+'@Cpp~GCC4=2902#4dc2e80^1#4dc3000:2[Keyword:0] Line 11 Column 20 File C:/temp/fibonacci.cpp)'+'
   |   |   |(IDENTIFIER@Cpp~GCC4=2645#4dc2ea0^1#4dc3000:3[`fib2'] Line 11 Column 22 File C:/temp/fibonacci.cpp)IDENTIFIER
   |   |   )additive_expression
   |   |  )assignment_expression
   |   |  (';'@Cpp~GCC4=2937#4dc2ec0^1#4dc3080:2[Keyword:0] Line 11 Column 26 File C:/temp/fibonacci.cpp)';'
   |   | )expression_statement
   |   | (expression_statement@Cpp~GCC4=869#4dc2f00^1#4dc2f20:2 Line 12 Column 9 File C:/temp/fibonacci.cpp
   |   |  (assignment_expression@Cpp~GCC4=809#4dc3740^1#4dc2f00:1 Line 12 Column 9 File C:/temp/fibonacci.cpp
   |   |   (IDENTIFIER@Cpp~GCC4=2645#4dc2ee0^1#4dc3740:1[`fib1'] Line 12 Column 9 File C:/temp/fibonacci.cpp)IDENTIFIER
   |   |   ('='@Cpp~GCC4=2893#4dc3400^1#4dc3740:2[Keyword:0] Line 12 Column 14 File C:/temp/fibonacci.cpp)'='
   |   |   (IDENTIFIER@Cpp~GCC4=2645#4dc3340^1#4dc3740:3[`fib2'] Line 12 Column 16 File C:/temp/fibonacci.cpp)IDENTIFIER
   |   |  )assignment_expression
   |   |  (';'@Cpp~GCC4=2937#4dc2fa0^1#4dc2f00:2[Keyword:0] Line 12 Column 20 File C:/temp/fibonacci.cpp)';'
   |   | )expression_statement
   |   |)statement_seq
   |   |(expression_statement@Cpp~GCC4=869#4dc2fe0^1#4dc3900:2 Line 13 Column 9 File C:/temp/fibonacci.cpp
   |   | (assignment_expression@Cpp~GCC4=809#4dc3940^1#4dc2fe0:1 Line 13 Column 9 File C:/temp/fibonacci.cpp
   |   |  (IDENTIFIER@Cpp~GCC4=2645#4dc38a0^1#4dc3940:1[`fib2'] Line 13 Column 9 File C:/temp/fibonacci.cpp)IDENTIFIER
   |   |  ('='@Cpp~GCC4=2893#4dc37a0^1#4dc3940:2[Keyword:0] Line 13 Column 14 File C:/temp/fibonacci.cpp)'='
   |   |  (IDENTIFIER@Cpp~GCC4=2645#4dc3700^1#4dc3940:3[`fib'] Line 13 Column 16 File C:/temp/fibonacci.cpp)IDENTIFIER
   |   | )assignment_expression
   |   | (';'@Cpp~GCC4=2937#4dc38c0^1#4dc2fe0:2[Keyword:0] Line 13 Column 19 File C:/temp/fibonacci.cpp)';'
   |   |)expression_statement
   |   )statement_seq
   |   ('}'@Cpp~GCC4=2939#4dc2fc0^1#4dc38e0:3[Keyword:0] Line 14 Column 5 File C:/temp/fibonacci.cpp)'}'
   |  )compound_statement
   | )iteration_statement
   |)statement_seq
   |(jump_statement@Cpp~GCC4=983#4dc3040^1#4dc3060:2 Line 16 Column 5 File C:/temp/fibonacci.cpp
   | ('return'@Cpp~GCC4=2778#4dc3960^1#4dc3040:1[Keyword:0] Line 16 Column 5 File C:/temp/fibonacci.cpp)'return'
   | (IDENTIFIER@Cpp~GCC4=2645#4dc3500^1#4dc3040:2[`fib'] Line 16 Column 12 File C:/temp/fibonacci.cpp)IDENTIFIER
   | (';'@Cpp~GCC4=2937#4dc3540^1#4dc3040:3[Keyword:0] Line 16 Column 15 File C:/temp/fibonacci.cpp)';'
   |)jump_statement
   )statement_seq
   ('}'@Cpp~GCC4=2939#4dc3560^1#4dc39c0:3[Keyword:0] Line 17 Column 1 File C:/temp/fibonacci.cpp)'}'
  )compound_statement
 )function_definition
)translation_unit
Exiting with final status 0.

C:\DMS\Domains\Cpp\GCC4\Tools\Parser\Source>

DMS can produce an XML version of this.

For a truly vexing example of C++:

template<bool> struct a_t;

template<> struct a_t<true> {
    template<int> struct b {};
};

template<> struct a_t<false> {
   enum { b };
};

typedef a_t<sizeof(void*)==sizeof(int)> a;

enum { c, d };
int main() {
    a::b<c>d; // declaration or expression?
}

... DMS produces the following tree, with multiple interpretations of the last statement (drawn with Dot; your browser probably has a "view image" to let you see a larger version):

Parse of difficult C++ program

The examples shown happen to be for the GCC dialect of C++. DMS can also parse Microsoft Visual Studio C++ (as of June 2015, handles C++14, and all of MSVC 2013, we're checking compatibility with MSVC 2015).

[Edit June 2018: Now handles C++17. The number of dark corners in the standard is just stunning.]

Getting a parse tree by itself isn't generally very useful. You also need other artifacts such as a symbol table and control/data flow facts. The DMS C++ front end produces all of this as tool-accessible data attributes of the AST. (Using those symbol tables, the C++ front end resolves which of the sub-parses in the above example is valid, and removes the invalid subtree.)

Sometimes it is useful to parse (well-formed) substrings of a language. The DMS parsing engines can do that; you don't have parse an entire compilation unit.

Solution 2:

clang still has that functionality:

The commands are -ast-dump and -ast-dump-xml

Note: -ast-dump-xml will only work when you build clang in debug mode.

http://clang.llvm.org/docs/IntroductionToTheClangAST.html

For example:

## cat test.cpp 
int main()
{
return 0;
}

##clang++ -cc1 -ast-dump-xml test.cpp                                       
<TranslationUnit ptr="0x4e42660">
 <Typedef ptr="0x4e42bd0" name="__builtin_va_list" typeptr="0x0">
  <PointerType ptr="0x4e42b90" canonical="0x4e42b90">
   <BuiltinType ptr="0x4e426f0" canonical="0x4e426f0"/>
  </PointerType>
 </Typedef>
 <Function ptr="0x4e42c70" name="main" returnzero="true" prototype="true">
  <FunctionProtoType ptr="0x4e42c20" canonical="0x4e42c20">
   <BuiltinType ptr="0x4e42750" canonical="0x4e42750"/>
   <parameters/>
  </FunctionProtoType>
  <Stmt>
CompoundStmt 0x4e42d78 <test.cpp:2:1, line:4:1>
`-ReturnStmt 0x4e42d58 <line:3:1, col:8>
  `-IntegerLiteral 0x4e42d38 <col:8> 'int' 0

  </Stmt>
 </Function>
</TranslationUnit>

Solution 3:

Probably the best way is to write a program on your own with libclang. Check the API Documentation of libclang, specially the C++ AST introspection part.