What is the difference between `Greedy` and `Reluctant` regular expression quantifiers?

From the Pattern javadocs:

Greedy quantifiers:
X?      X, once or not at all  
X*      X, zero or more times  
X+      X, one or more times  
X{n}    X, exactly n times  
X{n,}   X, at least n times  
X{n,m}  X, at least n but not more than m times

Reluctant quantifiers:
X??     X, once or not at all  
X*?     X, zero or more times  
X+?     X, one or more times  
X{n}?   X, exactly n times  
X{n,}?  X, at least n times  
X{n,m}? X, at least n but not more than m times

The description of what they do is the same...so, what is the difference?

I would really appreciate some examples.

I am coding in Java, but I hear this concept is the same for most modern regex implementations.

A greedy operator always try to "grab" as much of the input as possible, while a reluctant quantifier will match as little of the input as possible and still create a match.

Example:

"The red fox jumped over the red fence"
/(.*)red/ => \1 = "The red fox jumped over the "
/(.*?)red/ => \1 = "The "

"aaa"
/a?a*/ => \1 = "a", \2 = "aa"
/a??a*/ => \1 = "", \2 = "aaa"

"Mr. Doe, John"
/^(?:Mrs?.)?.*\b(.*)$/ => \1 = "John"
/^(?:Mrs?.)?.*?\b(.*)$/ => \1 = "Doe, John"

From this link, where the tutorial author acknowledges the spirit of your question:

At first glance it may appear that the quantifiers X?, X?? and X?+ do exactly the same thing, since they all promise to match "X, once or not at all". There are subtle implementation differences which will be explained near the end of this section.

They go on to put together examples and offer the explanation:

Greedy quantifiers are considered "greedy" because they force the matcher to read in, or eat, the entire input string prior to attempting the first match. If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from. Depending on the quantifier used in the expression, the last thing it will try matching against is 1 or 0 characters.

The reluctant quantifiers, however, take the opposite approach: They start at the beginning of the input string, then reluctantly eat one character at a time looking for a match. The last thing they try is the entire input string.

And for extra credit, the possessive explanation:

Finally, the possessive quantifiers always eat the entire input string, trying once (and only once) for a match. Unlike the greedy quantifiers, possessive quantifiers never back off, even if doing so would allow the overall match to succeed.

What is the difference between `Greedy` and `Reluctant` regular expression quantifiers?

Related

Recent Posts