boost::spirit::qi duplicate parsing on the output

I have this very simple parser using Boost::Spirit:

rule<std::string::iterator, std::string()> zeroTo255 = (string("25") >> char_('0', '5'))
    | (char_('2') >> char_('0', '4') >> digit)
    | (char_('1') >> repeat[2](digit))
    | (char_('1', '9') >> digit) | digit;

When I try to parse

std::string o{"1"};
std::string s;
parse(o.begin(), o.end(), zeroTo255, s);
std::cout << o << ": " << s << std::endl;

I have as output

1: 111

I'm obviously doing something wrong, but what?


Solution 1:

qi::hold is one way about it, as correctly mentioned by @Andrzej

I think I have a few observations that might help, as well as a better solution.


The point is that Spirit will not require 'temp' storage for attributes by design. In fact, it can't really assume the attribute be copyable in the first place. This is the reason here (imagine parsing everything into a single std::vector<> and copying for each parser step?).

On a more essential level, it looks to me as if it is not the attribute handling that is backwards here, but the parser expression itself: It fails to state the intent, and incurs all kinds of complexity dealing with number representations when... really it shouldn't.

My take on it would be

rule<std::string::iterator, std::string()> zeroTo255, alternatively;

alternatively %= raw [ uint_ [ _pass = (_1 <= 255) ] ];

You see: you let Spirit parse a number, and indeed just verify the range, which is what you wanted to do in the first place.

The second thing that strikes me as a-typical, is the fact that the rule exposes a std::string attribute, instead of unsigned char e.g. Why is that?

Assuming this was a conscious design decision, you can have it your way by judicious use of

  • negative lookahead (!parser) - which doesn't affect attributes
  • positive lookahead (&parser) - which doesn't affect attributes
  • Get acquainted with qi::as_string, qi::raw, qi::lexeme and qi::no_skip
  • semantic actions (don't rely on automatic rules)

Here's what minimal change to your original rule would have worked:

zeroTo255 = raw [ 
          ("25" >> char_("0-5"))
        | ('2' >> char_("0-4") >> digit)
        | ('1' >> digit >> digit)
        | (char_("1-9") >> digit) 
        | digit
    ];

This has roughly the same effect as the code using qi::hold but not the performance drawback of _hold_ing attribute values.

Hope this helps.

Full sample: Live on http://liveworkspace.org/code/4v4CQW$0:

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace qi = boost::spirit::qi;

int main()
{
    using namespace qi;
    rule<std::string::iterator, std::string()> zeroTo255, alternatively;

    zeroTo255 = raw [ 
              ("25" >> char_("0-5"))
            | ('2' >> char_("0-4") >> digit)
            | ('1' >> digit >> digit)
            | (char_("1-9") >> digit) 
            | digit
        ];

    alternatively %= raw [ uint_ [ _pass = (_1 <= 255) ] ];

    for (auto& input : std::vector<std::string> { "255", "249", "178", "30", "4" })
    {
        std::string output;
        std::cout << "zeroTo255:\t" << std::boolalpha 
                  << parse(std::begin(input), std::end(input), zeroTo255, output) 
                  << ": " << output << std::endl;

        output.clear();
        std::cout << "alternatively:\t" << std::boolalpha 
                  << parse(std::begin(input), std::end(input), alternatively, output) 
                  << ": " << output << std::endl;
    }

}

Output

zeroTo255:      true: 255
alternatively:  true: 255
zeroTo255:      true: 249
alternatively:  true: 249
zeroTo255:      true: 178
alternatively:  true: 178
zeroTo255:      true: 30
alternatively:  true: 30
zeroTo255:      true: 4
alternatively:  true: 4

Solution 2:

I faced a similar problem once. This is the particular way the alternative operator in Spirit works. Your example should work if you use additional directive "hold".

rule<std::string::iterator, std::string()> zeroTo255 
= hold[string("25") >> char_('0', '5')]
| hold[char_('2') >> char_('0', '4') >> digit]
| hold[char_('1') >> repeat[2](digit)]
| hold[char_('1', '9') >> digit] | digit;

For details of this behavior see this thread.