Using replace() to remove full sentences from text data in Python [duplicate]

I am trying to remove three sentences from paragraphs of text data. I have a pandas dataframe with rows of paragraphs that I want to remove the same three sentences from. For example,

import pandas as pd

df_1 = pd.DataFrame({"text": ["the dog is red. He goes outside and runs.", 
                              "i like dogs because they are fun. i don't like that dogs bark at mailmen", 
                              "dogs bark at mailmen and i think its funny."]})
    
custom_stopwords = ["the dog is red", "i like dogs", "dogs bark at mailmen"]
 
for i in custom_stopwords: 
    df_1['text'] = df_1['text'].str.replace(i, '')

This method is working in this example I have provided, but it does not work on my actual data. The data I have is quite large, but I don't see why that would matter in this case. What is happening is some of my sentences will be removed and others will not. For example, I am unable to remove the word "installation(s)" without blocking out the parentheses with "/".

pandas.Series.str.replace has a default keyword argument of regex=True which means it assumes the replacements are regular expressions (like your "installation(s)" could be interpreted). You're trying to replace string literals (or non-regular expressions at the very least). Adding regex=False should work fine:

for i in custom_stopwords: 
    df_1['text'] = df_1['text'].str.replace(i, '', regex=False)

Use str.replace with the argument regex=False. (s) is interpreted as a regular expression group, in this specific case equal to the character s.

Using replace() to remove full sentences from text data in Python [duplicate]

Related

Recent Posts