Apply fuzzy matching across a dataframe column and save results in a new column

Solution 1:

I couldn't tell what you were doing. This is how I would do it.

from fuzzywuzzy import fuzz
from fuzzywuzzy import process

Create a series of tuples to compare:

compare = pd.MultiIndex.from_product([df1['Company'],
                                      df2['FDA Company']]).to_series()

Create a special function to calculate fuzzy metrics and return a series.

def metrics(tup):
    return pd.Series([fuzz.ratio(*tup),
                      fuzz.token_sort_ratio(*tup)],
                     ['ratio', 'token'])

Apply metrics to the compare series

compare.apply(metrics)

enter image description here

There are bunch of ways to do this next part:

Get closest matches to each row of df1

compare.apply(metrics).unstack().idxmax().unstack(0)

enter image description here

Get closest matches to each row of df2

compare.apply(metrics).unstack(0).idxmax().unstack(0)

enter image description here