My regular expression matches too much. How can I tell it to match the smallest possible pattern? [duplicate]
I have this RegEx:
('.+')
It has to match character literals like in C. For example, if I have 'a' b 'a'
it should match the a's and the '
's around them.
However, it also matches the b
also (it should not), probably because it is, strictly speaking, also between '
's.
Here is a screenshot of how it goes wrong (I use this for syntax highlighting):
I'm fairly new to regular expressions. How can I tell the regex not to match this?
Solution 1:
It is being greedy and matching the first apostrophe and the last one and everything in between.
This should match anything that isn't an apostrophe.
('[^']+')
Another alternative is to try non-greedy matches.
('.+?')
Solution 2:
Have you tried a non-greedy version, e.g. ('.+?')
?
There are usually two modes of matching (or two sets of quantifiers), maximal (greedy) and minimal (non-greedy). The first will result in the longest possible match, the latter in the shortest. You can read about it (although in perl context) in the Perl Cookbook (Section 6.15).
Solution 3:
Try:
('[^']+')
The ^ means include every character except the ones in the square brackets. This way, it won't match 'a' b 'a'
because there's a '
in between, so instead it'll give both instances of 'a'