Remove low frequency words
Solution 1:
Let's try using a Counter
here:
- Split sentences into words
- Compute global word frequency
- Filter words based on computed frequencies
- Join and re-assign
from collections import Counter
from itertools import chain
# split words into lists
v = df['Col2'].str.split().tolist() # [s.split() for s in df['Col2'].tolist()]
# compute global word frequency
c = Counter(chain.from_iterable(v))
# filter, join, and re-assign
df['Col2'] = [' '.join([j for j in i if c[j] > 1]) for i in v]
df
Col1 Col2
0 1 how to remove word
1 5 how to remove word
Solution 2:
Method from get_dummies
s=df.set_index('Col1').Col2.str.get_dummies(sep=' ')
s.loc[:,s.all()].stack().reset_index(level=1).groupby('Col1')['level_1'].apply(' '.join).reset_index(name='Col2')
Out[155]:
Col1 Col2
0 1 how remove to word
1 5 how remove to word