Match a whole word in a string using dynamic regex
Why not use a word boundary?
match_string = r'\b' + word + r'\b'
match_string = r'\b{}\b'.format(word)
match_string = rf'\b{word}\b' # Python 3.7+ required
If you have a list of words (say, in a words
variable) to be matched as a whole word, use
match_string = r'\b(?:{})\b'.format('|'.join(words))
match_string = rf'\b(?:{"|".join(words)})\b' # Python 3.7+ required
In this case, you will make sure the word is only captured when it is surrounded by non-word characters. Also note that \b
matches at the string start and end. So, no use adding 3 alternatives.
Sample code:
import re
strn = "word hereword word, there word"
search = "word"
print re.findall(r"\b" + search + r"\b", strn)
And we found our 3 matches:
['word', 'word', 'word']
NOTE ON "WORD" BOUNDARIES
When the "words" are in fact chunks of any chars you should re.escape
them before passing to the regex pattern:
match_string = r'\b{}\b'.format(re.escape(word)) # a single escaped "word" string passed
match_string = r'\b(?:{})\b'.format("|".join(map(re.escape, words))) # words list is escaped
match_string = rf'\b(?:{"|".join(map(re.escape, words))})\b' # Same as above for Python 3.7+
If the words to be matched as whole words may start/end with special characters, \b
won't work, use unambiguous word boundaries:
match_string = r'(?<!\w){}(?!\w)'.format(re.escape(word))
match_string = r'(?<!\w)(?:{})(?!\w)'.format("|".join(map(re.escape, words)))
If the word boundaries are whitespace chars or start/end of string, use whitespace boundaries, (?<!\S)...(?!\S)
:
match_string = r'(?<!\S){}(?!\S)'.format(word)
match_string = r'(?<!\S)(?:{})(?!\S)'.format("|".join(map(re.escape, words)))