Remove low frequency words

python pandas dataframe text replace

Solution 1:

Let's try using a Counter here:

Split sentences into words
Compute global word frequency
Filter words based on computed frequencies
Join and re-assign

from collections import Counter
from itertools import chain

# split words into lists
v = df['Col2'].str.split().tolist() # [s.split() for s in df['Col2'].tolist()]
# compute global word frequency
c = Counter(chain.from_iterable(v))
# filter, join, and re-assign
df['Col2'] = [' '.join([j for j in i if c[j] > 1]) for i in v]

df
   Col1                Col2
0     1  how to remove word
1     5  how to remove word

Solution 2:

Method from get_dummies

s=df.set_index('Col1').Col2.str.get_dummies(sep=' ')


s.loc[:,s.all()].stack().reset_index(level=1).groupby('Col1')['level_1'].apply(' '.join).reset_index(name='Col2')
Out[155]: 
   Col1                Col2
0     1  how remove to word
1     5  how remove to word

Related

Recent Posts

org.apache.kafka.common.errors.TimeoutException: Topic not present in metadata after 60000 ms

Why my code runs infinite time when i entered non integer type in c++ [duplicate]

How to retrieve Instagram username from User ID?

Serverless Framework - Variables resolution error

How do we access a file in github repo inside our azure databricks notebook