How should I write a regex to match a specific word?
I've been trying to get a specific regex working but I can't get it to do what I need.
Basically, I want it to look for ROCKET. The regex should match ROCKET in upper or lower cases, and with or without punctuation, but not when part of another word. So, the regex would trigger on any of these:
rocket
RoCKEt
hi Rocket
This is a rocket.
ROCKET's engine
but NOT trigger on ROCKET when it is found in something like
Rocketeer
Sprocket
I've been trying to get it right using a regex generator online but I can't get it to match exactly.
I suggest bookmarking the MSDN Regular Expression Quick Reference
you want to achieve a case insensitive match for the word "rocket" surrounded by non-alphanumeric characters. A regex that would work would be:
\W*((?i)rocket(?-i))\W*
What it will do is look for zero or more (*) non-alphanumeric (\W) characters, followed by a case insensitive version of rocket ( (?i)rocket(?-i) ), followed again by zero or more (*) non-alphanumeric characters (\W). The extra parentheses around the rocket-matching term assigns the match to a separate group. The word rocket will thus be in match group 1.
UPDATE 1:
Matt said in the comment that this regex is to be used in python. Python has a slightly different syntax. To achieve the same result in python, use this regex and pass the re.IGNORECASE
option to the compile
or match
function.
\W*(rocket)\W*
On Regex101 this can be simulated by entering "i" in the textbox next to the regex input.
UPDATE 2 Ismael has mentioned, that the regex is not quite correct, as it might match "1rocket1". He posted a much better solution, namely
(?:^|\W)rocket(?:$|\W)
I think the look-aheads are overkill in this case, and you would be better off using word boundaries with the ignorecase
option,
\brocket\b
In other words, in python:
>>> x="rocket's"
>>> y="rocket1."
>>> c=re.compile(r"\brocket\b",re.I) # with the ignorecase option
>>> c.findall(y)
[]
>>> c.findall(x)
['rocket']
With grep
and sed
, you can use \<rocket\>
.
With grep
, the -i
option will make it case-insensitive (ignore case):
grep -i '\<rocket\>'
I don't know any way to make all sed
regexes case-insensitive,
but there's always the caveman way:
sed -n '/\<[Rr][Oo][Cc][Kk][Ee][Tt]\>/p'