How to compare different columns from two Dataframes in Python
Solution 1:
You could extract the information into native python data structures and then merge it back with your original DataFrames
-
To do this - I would first make pairs out of the Sender
and Receiver
columns in df2 -
def make_pairs(row):
senders = row['Sender'].replace("[", "").replace("]", "").split(",")
receivers = row['Receiver'].replace("[", "").replace("]", "").split(",")
pairs = [(s, r) for s in senders for r in receivers]
return pairs
send_receive_combinations = df2.apply(make_pairs, axis=1).to_dict()
Then map the combination of IDA
and IDB
from df1
into a dictionary:
rels = {(ida, idb): rel for ida, idb, rel in df1.values}
A dict comprehension (or even a simple for loop) can then be used to subset values of interest
rel_pairs = {key: rels[pair] for key, combination in send_receive_combinations.items() for pair in combination if pair in rels}
And finally, we can merge this dict
with df2
-
df2['relationship'] = df2.index
df2['relationship'] = df2['relationship'].map(rel_pairs)
print(df2)
Sender Receiver relationship
#0 [A900,A200] [A500,A220] Spouse
#1 [A150,A100] [A400] NaN
#2 [A400,A112] [A500] NaN
#3 [A700,A112] [A111,A001] Parent