Append pairwise distances to dataframe without missmatch
For clarity let's work with vector indices. Note that df1, df2 are dictionaries, let's create small dataframes with vector indices and ids(where available), and then, importantly, merge them using 'cross' method:
dfi1 = pd.DataFrame(df1).reset_index()[['index']]
dfi2 = pd.DataFrame(df2).reset_index()[['index', 'id']]
dfm = dfi1.merge(dfi2, how = 'cross')
dfm
looks like this:
index_x index_y id
-- --------- --------- -----------------
0 832 774 A4060454751516272
1 832 6570 A4060463723916275
2 832 11466 A4060454576016272
3 832 7394 A4050494816516277
4 1623 774 A4060454751516272
5 1623 6570 A4060463723916275
6 1623 11466 A4060454576016272
7 1623 7394 A4050494816516277
Now we can apply custom_distance
to pairs of vectors as indexed by index_x
and index_y
dfm['dist'] = dfm.apply(
lambda r: custom_distance(np.array(df1['vec1'][r['index_x']]) ,
np.array(df2['vec2'][r['index_y']]) ), axis=1
)
Now dfm
looks like this:
index_x index_y id dist
-- --------- --------- ----------------- --------
0 832 774 A4060454751516272 12.742
1 832 6570 A4060463723916275 4.58371
2 832 11466 A4060454576016272 13.8423
3 832 7394 A4050494816516277 9.21726
4 1623 774 A4060454751516272 14.9185
5 1623 6570 A4060463723916275 12.5773
6 1623 11466 A4060454576016272 16.7649
7 1623 7394 A4050494816516277 11.1383
If you want to put the vectors into the same dataframe you can then do:
dfm['vec1'] = dfm['index_x'].map(df1['vec1'])
dfm['vec2'] = dfm['index_y'].map(df2['vec2'])
dfm.drop(columns = ['index_x', 'index_y'])