Find substring in string but only if whole words?
What is an elegant way to look for a string within another string in Python, but only if the substring is within whole words, not part of a word?
Perhaps an example will demonstrate what I mean:
string1 = "ADDLESHAW GODDARD"
string2 = "ADDLESHAW GODDARD LLP"
assert string_found(string1, string2) # this is True
string1 = "ADVANCE"
string2 = "ADVANCED BUSINESS EQUIPMENT LTD"
assert not string_found(string1, string2) # this should be False
How can I best write a function called string_found that will do what I need? I thought perhaps I could fudge it with something like this:
def string_found(string1, string2):
if string2.find(string1 + " "):
return True
return False
But that doesn't feel very elegant, and also wouldn't match string1 if it was at the end of string2. Maybe I need a regex? (argh regex fear)
You can use regular expressions and the word boundary special character \b
(highlight by me):
Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that
\b
is defined as the boundary between\w
and\W
, so the precise set of characters deemed to be alphanumeric depends on the values of theUNICODE
andLOCALE
flags. Inside a character range,\b
represents the backspace character, for compatibility with Python’s string literals.
def string_found(string1, string2):
if re.search(r"\b" + re.escape(string1) + r"\b", string2):
return True
return False
Demo
If word boundaries are only whitespaces for you, you could also get away with pre- and appending whitespaces to your strings:
def string_found(string1, string2):
string1 = " " + string1.strip() + " "
string2 = " " + string2.strip() + " "
return string2.find(string1)
The simplest and most pythonic way, I believe, is to break the strings down into individual words and scan for a match:
string = "My Name Is Josh"
substring = "Name"
for word in string.split():
if substring == word:
print("Match Found")
For a bonus, here's a oneliner:
any([substring == word for word in string.split()])
Here's a way to do it without a regex (as requested) assuming that you want any whitespace to serve as a word separator.
import string
def find_substring(needle, haystack):
index = haystack.find(needle)
if index == -1:
return False
if index != 0 and haystack[index-1] not in string.whitespace:
return False
L = index + len(needle)
if L < len(haystack) and haystack[L] not in string.whitespace:
return False
return True
And here's some demo code (codepad is a great idea: Thanks to Felix Kling for reminding me)