pandas left join and update existing column
I am new to pandas and can't seem to get this to work with merge function:
>>> left >>> right
a b c a c d
0 1 4 9 0 1 7 13
1 2 5 10 1 2 8 14
2 3 6 11 2 3 9 15
3 4 7 12
With a left join on column a, I would like to update common columns BY THE JOINED KEYS. Note last value in column c is from LEFT table since there is no match.
>>> final
a b c d
0 1 4 7 13
1 2 5 8 14
2 3 6 9 15
3 4 7 12 NAN
How should I do this with Pandas merge function? Thank you.
You can use merge()
between left
and right
with how='left'
on 'a'
column.
In [74]: final = left.merge(right, on='a', how='left')
In [75]: final
Out[75]:
a b c_x c_y d
0 1 4 9 7 13
1 2 5 10 8 14
2 3 6 11 9 15
3 4 7 12 NaN NaN
Replace NaN
value from c_y
with c_x
value
In [76]: final['c'] = final['c_y'].fillna(final['c_x'])
In [77]: final
Out[77]:
a b c_x c_y d c
0 1 4 9 7 13 7
1 2 5 10 8 14 8
2 3 6 11 9 15 9
3 4 7 12 NaN NaN 12
Drop unwanted columns, and you have the result
In [79]: final.drop(['c_x', 'c_y'], axis=1)
Out[79]:
a b d c
0 1 4 13 7
1 2 5 14 8
2 3 6 15 9
3 4 7 NaN 12
One way to do this is to set the a column as the index and update
:
In [11]: left_a = left.set_index('a')
In [12]: right_a = right.set_index('a')
Note: update
only does a left join (not merges), so as well as set_index you also need to include the additional columns not present in left_a
.
In [13]: res = left_a.reindex(columns=left_a.columns.union(right_a.columns))
In [14]: res.update(right_a)
In [15]: res.reset_index(inplace=True)
In [16]: res
Out[16]:
a b c d
0 1 4 7 13
1 2 5 8 14
2 3 6 9 15
3 4 7 12 NaN
One other way is to use pd.merge like so:
>>> import pandas as pd
>>> final = pd.merge(left=right, right=left,
how='outer',
left_index=True,
right_index=True,
on=('a', 'c')
).sort_index(axis=1)
>>> final
a b c d
0 1 4 7 13.0
1 2 5 8 14.0
2 3 6 9 15.0
3 4 7 12 NaN
Provide the intersection of both dataframe's columns to the 'on=' parameter of the function.
This does not create unwanted columns that have to be dropped like with Zero's solution.
The NaN value might change integers to floats in the same column.
Edit: This works for Pandas versions <= 1.1.5
Here's a way to do it with join
:
In [632]: t = left.set_index('a').join(right.set_index('a'), rsuffix='_right')
In [633]: t
Out[633]:
b c c_right d
a
1 4 9 7 13
2 5 10 8 14
3 6 11 9 15
4 7 12 NaN NaN
Now, we want to set null values of c_right
(which is from the right
dataframe) with values from c
column from the left
dataframe. Updated the below process with a method taking from @John Galt's answer
In [657]: t['c_right'] = t['c_right'].fillna(t['c'])
In [658]: t
Out[658]:
b c c_right d
a
1 4 9 7 13
2 5 10 8 14
3 6 11 9 15
4 7 12 12 NaN
In [659]: t.drop('c_right', axis=1)
Out[659]:
b c d
a
1 4 9 13
2 5 10 14
3 6 11 15
4 7 12 NaN