Pandas Merge - How to avoid duplicating columns
Solution 1:
You can work out the columns that are only in one DataFrame and use this to select a subset of columns in the merge.
cols_to_use = df2.columns.difference(df.columns)
Then perform the merge (note this is an index object but it has a handy tolist()
method).
dfNew = merge(df, df2[cols_to_use], left_index=True, right_index=True, how='outer')
This will avoid any columns clashing in the merge.
Solution 2:
I use the suffixes
option in .merge()
:
dfNew = df.merge(df2, left_index=True, right_index=True,
how='outer', suffixes=('', '_y'))
dfNew.drop(dfNew.filter(regex='_y$').columns.tolist(),axis=1, inplace=True)
Thanks @ijoseph