How to merge pandas on string contains?

Solution 1:

New Answer

Here is one approach based on pandas/numpy.

rhs = (df1.column_common
          .apply(lambda x: df2[df2.column_common.str.find(x).ge(0)]['column_b'])
          .bfill(axis=1)
          .iloc[:, 0])

(pd.concat([df1.column_a, rhs], axis=1, ignore_index=True)
 .rename(columns={0: 'column_a', 1: 'column_b'}))

  column_a column_b
0     John    Moore
1  Michael    Cohen
2      Dan    Smith
3   George      NaN
4     Adam    Faber

Old Answer

Here's a solution for left-join behaviour, as in it doesn't keep column_a values that do not match any column_b values. This is slower than the above numpy/pandas solution because it uses two nested iterrows loops to build a python list.

tups = [(a1, a2) for i, (a1, b1) in df1.iterrows() 
                 for j, (a2, b2) in df2.iterrows()
        if b1 in b2]

(pd.DataFrame(tups, columns=['column_a', 'column_b'])
   .drop_duplicates('column_a')
   .reset_index(drop=True))

  column_a column_b
0     John    Moore
1  Michael    Cohen
2      Dan    Smith
3     Adam    Faber

Solution 2:

My solution involves applying a function to the common column. I can't imagine it holds up well when df2 is large but perhaps someone more knowledgeable than I can suggest an improvement.

def strmerge(strcolumn):
    for i in df2['column_common']:
        if strcolumn in i:
            return df2[df2['column_common'] == i]['column_b'].values[0]

df1['column_b'] = df1['column_common'].apply(strmerge)

df1
    column_a    column_common   column_b
0   John        code            Moore
1   Michael     other           Cohen
2   Dan         ome             Smith
3   George      no match        None
4   Adam        word            Faber

How to merge pandas on string contains?

Solution 1:

New Answer

Old Answer

Solution 2:

Related

Recent Posts