Regular expression to match a word or its prefix
Solution 1:
Square brackets are meant for character class, and you're actually trying to match any one of: s
, |
, s
(again), e
, a
, s
(again), o
and n
.
Use parentheses instead for grouping:
(s|season)
or non-capturing group:
(?:s|season)
Note: Non-capture groups tell the engine that it doesn't need to store the match, while the other one (capturing group does). For small stuff, either works, for 'heavy duty' stuff, you might want to see first if you need the match or not. If you don't, better use the non-capture group to allocate more memory for calculation instead of storing something you will never need to use.
Solution 2:
Use this live online example to test your pattern:
Above screenshot taken from this live example: https://regex101.com/r/cU5lC2/1
Matching any whole word on the commandline.
I'll be using the phpsh interactive shell on Ubuntu 12.10 to demonstrate the PCRE regex engine through the method known as preg_match
Start phpsh, put some content into a variable, match on word.
el@apollo:~/foo$ phpsh
php> $content1 = 'badger'
php> $content2 = '1234'
php> $content3 = '$%^&'
php> echo preg_match('(\w+)', $content1);
1
php> echo preg_match('(\w+)', $content2);
1
php> echo preg_match('(\w+)', $content3);
0
The preg_match method used the PCRE engine within the PHP language to analyze variables: $content1
, $content2
and $content3
with the (\w)+
pattern.
$content1 and $content2 contain at least one word, $content3 does not.
Match a specific words on the commandline without word bountaries
el@apollo:~/foo$ phpsh
php> $gun1 = 'dart gun';
php> $gun2 = 'fart gun';
php> $gun3 = 'darty gun';
php> $gun4 = 'unicorn gun';
php> echo preg_match('(dart|fart)', $gun1);
1
php> echo preg_match('(dart|fart)', $gun2);
1
php> echo preg_match('(dart|fart)', $gun3);
1
php> echo preg_match('(dart|fart)', $gun4);
0
Variables gun1
and gun2
contain the string dart
or fart
which is correct, but gun3 contains darty
and still matches, that is the problem. So onto the next example.
Match specific words on the commandline with word boundaries:
Word Boundaries can be force matched with \b
, see:
Regex Visual Image acquired from http://jex.im/regulex and https://github.com/JexCheng/regulex Example:
el@apollo:~/foo$ phpsh
php> $gun1 = 'dart gun';
php> $gun2 = 'fart gun';
php> $gun3 = 'darty gun';
php> $gun4 = 'unicorn gun';
php> echo preg_match('(\bdart\b|\bfart\b)', $gun1);
1
php> echo preg_match('(\bdart\b|\bfart\b)', $gun2);
1
php> echo preg_match('(\bdart\b|\bfart\b)', $gun3);
0
php> echo preg_match('(\bdart\b|\bfart\b)', $gun4);
0
The \b
asserts that we have a word boundary, making sure " dart " is matched, but " darty " isn't.