pandas left join and update existing column

I am new to pandas and can't seem to get this to work with merge function:

>>> left       >>> right
   a  b   c       a  c   d 
0  1  4   9    0  1  7  13
1  2  5  10    1  2  8  14
2  3  6  11    2  3  9  15
3  4  7  12

With a left join on column a, I would like to update common columns BY THE JOINED KEYS. Note last value in column c is from LEFT table since there is no match.

>>> final       
   a  b   c   d 
0  1  4   7   13
1  2  5   8   14
2  3  6   9   15
3  4  7   12  NAN

How should I do this with Pandas merge function? Thank you.

You can use merge() between left and right with how='left' on 'a' column.

In [74]: final = left.merge(right, on='a', how='left')

In [75]: final
Out[75]:
   a  b  c_x  c_y   d
0  1  4    9    7  13
1  2  5   10    8  14
2  3  6   11    9  15
3  4  7   12  NaN NaN

Replace NaN value from c_y with c_x value

In [76]: final['c'] = final['c_y'].fillna(final['c_x'])

In [77]: final
Out[77]:
   a  b  c_x  c_y   d   c
0  1  4    9    7  13   7
1  2  5   10    8  14   8
2  3  6   11    9  15   9
3  4  7   12  NaN NaN  12

Drop unwanted columns, and you have the result

In [79]: final.drop(['c_x', 'c_y'], axis=1)
Out[79]:
   a  b   d   c
0  1  4  13   7
1  2  5  14   8
2  3  6  15   9
3  4  7 NaN  12

One way to do this is to set the a column as the index and update:

In [11]: left_a = left.set_index('a')

In [12]: right_a = right.set_index('a')

Note: update only does a left join (not merges), so as well as set_index you also need to include the additional columns not present in left_a.

In [13]: res = left_a.reindex(columns=left_a.columns.union(right_a.columns))

In [14]: res.update(right_a)

In [15]: res.reset_index(inplace=True)

In [16]: res
Out[16]:
   a   b   c   d
0  1   4   7  13
1  2   5   8  14
2  3   6   9  15
3  4   7  12 NaN

One other way is to use pd.merge like so:

 >>> import pandas as pd

 >>> final = pd.merge(left=right, right=left, 
                      how='outer',
                      left_index=True,
                      right_index=True,
                      on=('a', 'c')
                     ).sort_index(axis=1)

 >>> final       
    a  b   c   d 
 0  1  4   7   13.0
 1  2  5   8   14.0
 2  3  6   9   15.0
 3  4  7   12  NaN

Provide the intersection of both dataframe's columns to the 'on=' parameter of the function.

This does not create unwanted columns that have to be dropped like with Zero's solution.

The NaN value might change integers to floats in the same column.

Edit: This works for Pandas versions <= 1.1.5

Here's a way to do it with join:

In [632]: t = left.set_index('a').join(right.set_index('a'), rsuffix='_right')

In [633]: t
Out[633]: 
   b   c  c_right   d
a                    
1  4   9        7  13
2  5  10        8  14
3  6  11        9  15
4  7  12      NaN NaN

Now, we want to set null values of c_right (which is from the right dataframe) with values from c column from the left dataframe. Updated the below process with a method taking from @John Galt's answer

In [657]: t['c_right'] = t['c_right'].fillna(t['c'])

In [658]: t
Out[658]: 
   b   c  c_right   d
a                    
1  4   9        7  13
2  5  10        8  14
3  6  11        9  15
4  7  12       12 NaN

In [659]: t.drop('c_right', axis=1)
Out[659]: 
   b   c   d
a           
1  4   9  13
2  5  10  14
3  6  11  15
4  7  12 NaN

pandas left join and update existing column

Related

Recent Posts