Extract words from sentence that are containing substring

I want to extract full phrase (one or multiple words) that contain the specific substring. Substring can have one multiple words, and words from substring can 'break'/'split' words in the test_string, but desired output is full phrase/word from test_string, for example

test_string = 'this is an example of the text that I have, and I want to by amplifier and lamp'
substring1 = 'he text th'
substring2 = 'amp'

if substring1 in test_string:
    print("substring1 found")
    
if substring2 in test_string:
    print("substring2 found")

My desired output is:

[the text that]
[example, amplifier, lamp]

FYI

Substring can be at the beginning of the word, middle or end...it does not matter.


Solution 1:

If you want something robust I would do something like that:

re.findall(r"((?:\w+)?" + re.escape(substring2) + r"(?:\w+)?)", test_string)

This way you can have whatever you want in substring.

Explanation of the regex:

'(?:\w+)'   Non capturing group
'?'         zero or one

I have done this at the begining and at the end of your substring as it can be the start or the end of the missing part

To answer the latest comment about how to get the punctuation as well. I would do something like that using string.punctuation

import string
pattern = r"(?:[" + r"\w" + re.escape(string.punctuation) + r"]+)?"
re.findall("(" + pattern + re.escape(substring2) + pattern + ")", 

test_string)

Doing so, will match any punctuation in the word at the beginning and the end. Like: [I love you.., I love you!!, I love you!?, ?I love you!, ...]

Solution 2:

this is a job for regex, as you could do:

import re
substring2 = 'amp'
test_string = 'this is an example of the text that I have'

print("matches for substring 1:",re.findall(r"(\w+he text th\w+)", test_string))
print("matches for substring 2:",re.findall(r"(\w+amp\w+)",test_string))

Output:

matches for substring 1:['the text that']
matches for substring 2:['example']