Generating Spirit parser expressions from a variadic list of alternative parser expressions

Solution 1:

Thank you for a quick hint! I've just tried your code and unless I do something wrong ... I get this output: Syntax error:abc 8.81 Parsed:-a atoken Syntax error:-b btoken Syntax error:-c ctoken Syntax error:-d dtoken – G. Civardi 2 hours ago

Okay, so, I couldn't leave it alone :/

Turns out there was Undefined Behaviour involved, because of the way in which parser expressions were being passed to expandBitwise and being copied: Boost Proto expression templates weren't designed to be copied as they may contain references to temporaries, whose lifetime ends at the end of their containing full-expression.

See for more background, the discussion at Zero to 60 MPH in 2 seconds!

After a long (long) time of tweaking with rule_.alias() and boost::proto::deepcopy I have reached the following solution (which, incidentally, doesn't need a helper function at all, anymore):

template<typename ...Tail>
void mparse(const std::string& line,Tail& ...tail)
{
    auto parser = boost::fusion::fold(
                boost::tie(ph::bind(&TStruct::rule_, arg1)(tail)...),
                qi::eps(false),
                deepcopy_(arg2 | arg1)
            );

    auto f=begin(line), l=end(line);

    if( qi::phrase_parse(f, l, parser, ascii::space ) )
        std::cout << "Parsed:" << line << std::endl;
    else
        std::cout << "Syntax error:" << line << std::endl;

    if (f!=l)
        std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}

The protection against UB is the deepcopy_() invocation, which is a trivial polymorphic callable adaptor for boost::proto::deepcopy:

struct DeepCopy
{
    template<typename E> struct result { typedef typename boost::proto::result_of::deep_copy<E>::type type; };

    template<typename E>
        typename result<E>::type
        operator()(E const& expr) const {
            return boost::proto::deep_copy(expr);
        }
};

static const ph::function<DeepCopy> deepcopy_;

With this code, lo and behold, the output becomes:

Syntax error:abc 8.81
Remaining unparsed: 'abc 8.81'
Parsed:-a atoken
Parsed:-b btoken
Parsed:-c ctoken
Parsed:-d dtoken
Bye

As a bonus, the code now allows you to use Spirit's builtin debug() capabilities (uncomment that line):

<-d>
  <try>abc 8.81</try>
  <fail/>
</-d>
<-c>
  <try>abc 8.81</try>
  <fail/>
</-c>
<-b>
  <try>abc 8.81</try>
  <fail/>
</-b>
<-a>
  <try>abc 8.81</try>
  <fail/>
</-a>
Syntax error:abc 8.81
Remaining unparsed: 'abc 8.81'

Tested with

  • Boost 1_54_0
  • GCC 4.7.2, 4.8.x, Clang 3.2
  • Note the #defines which are significant.

FULL CODE

#define BOOST_RESULT_OF_USE_DECLTYPE
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/fusion/adapted/boost_tuple.hpp>
#include <boost/fusion/include/fold.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace qi    = boost::spirit::qi;
namespace ph    = boost::phoenix;
namespace ascii = boost::spirit::ascii;
using namespace ph::arg_names;

typedef qi::rule<std::string::const_iterator,ascii::space_type> mrule_t;
typedef qi::rule<std::string::const_iterator,std::string() >    wrule_t;

struct TStruct
{
    mrule_t     rule_;
    template<typename T,typename R>
    TStruct( T& rVar,const std::string&name, R& rule ) :
        rule_( qi::lit(name) >> rule[ ph::ref(rVar) = qi::_1 ] )
    { 
        rule_.name(name);
        // debug(rule_);
    }
};

struct DeepCopy
{
    template<typename E> struct result { typedef typename boost::proto::result_of::deep_copy<E>::type type; };

    template<typename E>
        typename result<E>::type
        operator()(E const& expr) const {
            return boost::proto::deep_copy(expr);
        }
};

static const ph::function<DeepCopy> deepcopy_;

template<typename ...Tail>
void mparse(const std::string& line,Tail& ...tail)
{
    auto parser = boost::fusion::fold(
                boost::tie(ph::bind(&TStruct::rule_, arg1)(tail)...),
                qi::eps(false),
                deepcopy_(arg2 | arg1)
            );

    auto f=begin(line), l=end(line);

    if( qi::phrase_parse(f, l, parser, ascii::space ) )
        std::cout << "Parsed:" << line << std::endl;
    else
        std::cout << "Syntax error:" << line << std::endl;

    if (f!=l)
        std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}

int main()
{
    wrule_t rword=+~ascii::space;

    std::string par1,par2,par3,par4;

    TStruct r1( par1, "-a", rword );
    TStruct r2( par2, "-b", rword );
    TStruct r3( par3, "-c", rword );
    TStruct r4( par4, "-d", rword );

    mparse("abc 8.81"   ,r1,r2,r3,r4);
    mparse("-a atoken"  ,r1,r2,r3,r4);
    mparse("-b btoken"  ,r1,r2,r3,r4);
    mparse("-c ctoken"  ,r1,r2,r3,r4);
    mparse("-d dtoken"  ,r1,r2,r3,r4);

    std::cout << "Bye\n";
}

Solution 2:

You accidentally returned the TStruct type from the expandBitwise helper. Fix it like so:

template<typename T>
auto expandBitwise(T const& t) -> decltype(t.rule_)
{
    return t.rule_;
}

template<typename T,typename ...Tail>
auto expandBitwise(T const& t,Tail const&... tail) -> decltype(t.rule_)
{
    return t.rule_ | expandBitwise(tail...);
}

If you want to expose attributes, the return type deduction rules become more involved. Basically, what you're doing is replicating the EDSL part of Spirit.


Let's swap stories...

Clippy: It looks like you are trying to write a commandline argument parser. Would you like help with that?

Implementing the DSL mechanics for your option parser could be done more systematically by creating a new Proto Domain and actually creating the terminals. This would somehow appeal to me now.

Alternatively you could take this from another angle completely, using the Nabialek Trick. This happens to be an approach I played with just a few weeks ago, and I'll share with you the design I had come up with: https://gist.github.com/sehe/2a556a8231606406fe36#file-test-cpp

The important part is, where the grammar is "fixed":

start    = -argument % '\0';
unparsed = as_string  [ +~nul ] [ std::cerr << phx::val("ignoring unparsed argument: '") << _1 << "'\n" ];
argument = ('-' >> +shortopt) | ("--" >> longopt) >> -unparsed | unparsed;

The trick being in:

shortopt = shortNames [_a = _1] >> lazy(_a);
longopt  = longNames  [_a = _1] >> lazy(_a);

Where shortNames and longNames are qi::symbols tables of parsers, built dynamically, based on a variadic list of CliOptions and CliFlags (I pass them as a tuple, because I wanted to store the result inside the CliOption struct as well).

The qi::lazy(_a) invokes the parser that was stored in the symbol table.

As a bonus, my CliOptions parser has a feature to generate "Usage" information as well. The builders for parse expressions as well as usage informations are extensible.

int main(int argc, char* argv[])
{
    using CliParsing::make_option;

    typedef std::string::const_iterator It;

    auto config = std::make_tuple(
        make_option('a', "absolutely", "absolutely"),
        make_option('b', "borked"    , "borked")    ,
        make_option('c', "completion", "completion"),
        make_option('d', "debug",      "turn on debugging"),
        make_option('e', "",           "no long name")  ,
        //make_option('f', "flungeons" , "flungeons") ,
        //make_option('g', "goofing"   , "")   ,
        //make_option('m', "monitor",    "monitoring level"),
        make_option('t', "testing"   , "testing flags"),
        make_option('\0',"file"      , "with a filename (no short name)"),

        make_option('y', "assume-yes", "always assume yes"),
        make_option('v', "verbose",    "increase verbosity level"),
        make_option('i', "increment",  "stepsize to increment with", 5)
        );

    CliParsing::OptionGrammar<It> parser(config);

    using namespace phx::arg_names;
    const auto cmdline = std::accumulate(argv+1, argv+argc, std::string(), arg1 + arg2 + '\0');

    bool ok = qi::parse(begin(cmdline), end(cmdline), parser);

    std::cout << "Parse success " << std::boolalpha << ok << "\n";
    std::cout << parser.getUsage();

    return ok? 0 : 255;
}

When invoked with some random arguments -i 3 --completion -t --file=SOME.TXT -b huh?, prints:

short form option --increment parsed
ignoring unparsed argument: '3'
long form switch --completion parsed
short form switch --testing parsed
long form switch --file parsed
ignoring unparsed argument: '=SOME.TXT'
short form switch --borked parsed
ignoring unparsed argument: 'huh?'

Parse success true
 --absolutely (-a)
    absolutely (flag)
 --borked (-b)
    borked (flag)
 --completion (-c)
    completion (flag)
 --debug (-d)
    turn on debugging (flag)
 -e
    no long name (flag)
 --testing (-t)
    testing flags (flag)
 --file
    with a filename (no short name) (flag)
 --assume-yes (-y)
    always assume yes (flag)
 --verbose (-v)
    increase verbosity level (flag)
 --increment (-i)
    stepsize to increment with (option with value; default '5')

As you can see, not all options have been implemented yet (notably, -- to mark the end of the option list).