Compiling a simple parser with Boost.Spirit
Part of a simple skeleton utility I'm hacking on I have a grammar for triggering substitutions in text. I thought it a wonderful way to get comfortable with Boost.Spirit, but the template errors are a joy of a unique kind.
Here is the code in its entirety:
#include <iostream>
#include <iterator>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace bsq = boost::spirit::qi;
namespace {
template<typename Iterator>
struct skel_grammar : public bsq::grammar<Iterator> {
skel_grammar();
private:
bsq::rule<Iterator> macro_b;
bsq::rule<Iterator> macro_e;
bsq::rule<Iterator, bsq::ascii::space_type> id;
bsq::rule<Iterator> macro;
bsq::rule<Iterator> text;
bsq::rule<Iterator> start;
};
template<typename Iterator>
skel_grammar<Iterator>::skel_grammar() : skel_grammar::base_type(start)
{
text = bsq::no_skip[+(bsq::char_ - macro_b)[bsq::_val += bsq::_1]];
macro_b = bsq::lit("<<");
macro_e = bsq::lit(">>");
macro %= macro_b >> id >> macro_e;
id %= -(bsq::ascii::alpha | bsq::char_('_'))
>> +(bsq::ascii::alnum | bsq::char_('_'));
start = *(text | macro);
}
} // namespace
int main(int argc, char* argv[])
{
std::string input((std::istreambuf_iterator<char>(std::cin)),
std::istreambuf_iterator<char>());
skel_grammar<std::string::iterator> grammar;
bool r = bsq::parse(input.begin(), input.end(), grammar);
std::cout << std::boolalpha << r << '\n';
return 0;
}
What's wrong with this code?
Solution 1:
Mmm. I feel that we have discussed a few more details in chat than have been reflected in the question as it is.
Let me entertain you with my 'toy' implementation, complete with test cases, of a grammar that will recognize <<macros>>
like this, including nested expansion of the same.
Notable features:
- Expansion is done using a callback (
process()
), giving you maximum flexibility (you could use a look up table, cause parsing to fail depending on the macro content, or even have sideeffects independent of the output - the parser is optimized to favour streaming mode. Look at
spirit::istream_iterator
on how to parse input in streaming mode (Stream-based Parsing Made Easy). This has the obvious benefits if your input stream is 10 GB, and contains only 4 macros - it is the difference between crawling performance (or running out of memory) and just scaling.- note that the demo still writes to a string buffer (via
oss
). You could, however, easily, hook the output directly tostd::cout
or, say, anstd::ofstream
instance
- note that the demo still writes to a string buffer (via
- Expansion is done eagerly, so you can have nifty effects using indirect macros. See the testcases
- I even demoed a simplistic way to support escaping the
<<
or>>
delimiters (#define SUPPORT_ESCAPES
)
Without further ado:
The Code
Note due to laziness, I require -std==c++0x
, but only when SUPPORT_ESCAPES
is defined
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
namespace qi = boost::spirit::qi;
namespace phx= boost::phoenix;
namespace fsn= boost::fusion;
namespace
{
#define SUPPORT_ESCAPES
static bool process(std::string& macro)
{
if (macro == "error") {
return false; // fail the parse
}
if (macro == "hello") {
macro = "bye";
} else if (macro == "bye") {
macro = "We meet again";
} else if (macro == "sideeffect") {
std::cerr << "this is a side effect while parsing\n";
macro = "(done)";
} else if (std::string::npos != macro.find('~')) {
std::reverse(macro.begin(), macro.end());
macro.erase(std::remove(macro.begin(), macro.end(), '~'));
} else {
macro = std::string("<<") + macro + ">>"; // this makes the unsupported macros appear unchanged
}
return true;
}
template<typename Iterator, typename OutIt>
struct skel_grammar : public qi::grammar<Iterator>
{
struct fastfwd {
template<typename,typename> struct result { typedef bool type; };
template<typename R, typename O>
bool operator()(const R&r,O& o) const
{
#ifndef SUPPORT_ESCAPES
o = std::copy(r.begin(),r.end(),o);
#else
auto f = std::begin(r), l = std::end(r);
while(f!=l)
{
if (('\\'==*f) && (l == ++f))
break;
*o++ = *f++;
}
#endif
return true; // false to fail the parse
}
} copy;
skel_grammar(OutIt& out) : skel_grammar::base_type(start)
{
using namespace qi;
#ifdef SUPPORT_ESCAPES
rawch = ('\\' >> char_) | char_;
#else
# define rawch qi::char_
#endif
macro = ("<<" >> (
(*(rawch - ">>" - "<<") [ _val += _1 ])
% macro [ _val += _1 ] // allow nests
) >>
">>")
[ _pass = phx::bind(process, _val) ];
start =
raw [ +(rawch - "<<") ] [ _pass = phx::bind(copy, _1, phx::ref(out)) ]
% macro [ _pass = phx::bind(copy, _1, phx::ref(out)) ]
;
BOOST_SPIRIT_DEBUG_NODE(start);
BOOST_SPIRIT_DEBUG_NODE(macro);
# undef rawch
}
private:
#ifdef SUPPORT_ESCAPES
qi::rule<Iterator, char()> rawch;
#endif
qi::rule<Iterator, std::string()> macro;
qi::rule<Iterator> start;
};
}
int main(int argc, char* argv[])
{
std::string input =
"Greeting is <<hello>> world!\n"
"Side effects are <<sideeffect>> and <<other>> vars are untouched\n"
"Empty <<>> macros are ok, as are stray '>>' pairs.\n"
"<<nested <<macros>> (<<hello>>?) work>>\n"
"The order of expansion (evaluation) is _eager_: '<<<<hello>>>>' will expand to the same as '<<bye>>'\n"
"Lastly you can do algorithmic stuff too: <<!esrever ~ni <<hello>>>>\n"
#ifdef SUPPORT_ESCAPES // bonus: escapes
"You can escape \\<<hello>> (not expanded to '<<hello>>')\n"
"Demonstrate how it <<avoids <\\<nesting\\>> macros>>.\n"
#endif
;
std::ostringstream oss;
std::ostream_iterator<char> out(oss);
skel_grammar<std::string::iterator, std::ostream_iterator<char> > grammar(out);
std::string::iterator f(input.begin()), l(input.end());
bool r = qi::parse(f, l, grammar);
std::cout << "parse result: " << (r?"success":"failure") << "\n";
if (f!=l)
std::cout << "unparsed remaining: '" << std::string(f,l) << "'\n";
std::cout << "Streamed output:\n\n" << oss.str() << '\n';
return 0;
}
The Test Output
this is a side effect while parsing
parse result: success
Streamed output:
Greeting is bye world!
Side effects are (done) and <<other>> vars are untouched
Empty <<>> macros are ok, as are stray '>>' pairs.
<<nested <<macros>> (bye?) work>>
The order of expansion (evaluation) is _eager_: 'We meet again' will expand to the same as 'We meet again'
Lastly you can do algorithmic stuff too: eyb in reverse!
You can escape <<hello>> (not expanded to 'bye')
Demonstrate how it <<avoids <<nesting>> macros>>.
There is quite a lot of functionality hidden there to grok. I suggest you look at the test cases and the process()
callback alongside each other to see what is going on.
Cheers & HTH :)