How to search if every word in string starts with any of the word in list using python
I am trying to filter sentences from my pandas data-frame having 50 million records using keyword search. If any words in sentence starts with any of these keywords.
WordsToCheck=['hi','she', 'can']
text_string1="my name is handhit and cannary"
text_string2="she can play!"
If I do something like this:
if any(key in text_string1 for key in WordsToCheck):
print(text_string1)
I get False positive
as handhit
as hit
in the last part of word.
How can I smartly avoid all such False positives from my result set?
Secondly, is there any faster way to do it in python? I am using apply
function currently.
I am following this link so that my question is not a duplicate: How to check if a string contains an element from a list in Python
If the case is important you can do something like this:
def any_word_starts_with_one_of(sentence, keywords):
for kw in keywords:
match_words = [word for word in sentence.split(" ") if word.startswith(kw)]
if match_words:
return kw
return None
keywords = ["hi", "she", "can"]
sentences = ["Hi, this is the first sentence", "This is the second"]
for sentence in sentences:
if any_word_starts_with_one_of(sentence, keywords):
print(sentence)
If case is not important replace line 3 with something like this:
match_words = [word for word in sentence.split(" ") if word.lower().startswith(kw.lower())]