How does {m}{n} ("exactly n times" twice) work?

Solution 1:

IEEE-Standard 1003.1 says:

The behavior of multiple adjacent duplication symbols ( '*' and intervals) produces undefined results.

So every implementation can do as it pleases, just don't rely on anything specific...

Solution 2:

When I input your regex in RegexBuddy using the Java regex syntax, it displays following message

Quantifiers must be preceded by a token that can be repeated «{2}»

Changing the regex to explicitly use a grouping ^(\d{1}){2} solves that error and works as you expect.


I assume that the java regex engine simply neglects the error/expression and works with what has been compiled so far.

Edit

The reference to the IEEE-Standard in @piet.t's answer seems to support that assumption.

Edit 2 (kudos to @fncomp)

For completeness, one would typically use (?:)to avoid capturing the group. The complete regex then becomes ^(?:\d{1}){2}

Solution 3:

Scientific approach:
click on the patterns to see the example on regexplanet.com, and click on the green Java button.

  • You've already showed \d{1}{2} matches "1", and doesn't match "12", so we know it isn't interpreted as (?:\d{1}){2}.
  • Still, 1 is a boring number, and {1} might be optimized away, lets try something more interesting:
    \d{2}{3}. This still only matches two characters (not six), {3} is ignored.
  • Ok. There's an easy way to see what a regex engine does. Does it capture?
    Lets try (\d{1})({2}). Oddly, this works. The second group, $2, captures the empty string.
  • So why do we need the first group? How about ({1})? Still works.
  • And just {1}? No problem there.
    It looks like Java is being a little weird here.
  • Great! So {1} is valid. We know Java expands * and + to {0,0x7FFFFFFF} and {1,0x7FFFFFFF}, so will * or + work? No:

    Dangling meta character '+' near index 0
    +
    ^

    The validation must come before * and + are expanded.

I didn't find anything in the spec that explains that, it looks like a quantifier must come at least after a character, brackets, or parentheses.

Most of these patterns are considered invalid by other regex flavors, and for a good reason - they do not make sense.