How to create new column dynamically in dataframe in python
Use DataFrame.stack
for possible compare all levels columns first
with second
, create new column in DataFrame.assign
and reshape back by DataFrame.unstack
with DataFrame.swaplevel
and DataFrame.reindex
for original order:
#original ordering
orig = df1.columns[1:].tolist()
print (orig)
['empname', 'empcity']
df_final = (df_all.stack()
.assign(comparions=lambda x: x['first'].eq(x['second']))
.unstack()
.swaplevel(axis = 'columns')
.reindex(orig, axis=1, level=0))
print (df_final)
empname empcity
first second comparions first second comparions
empid
1 a a True aa aa True
2 b b True bb bb True
3 c m False cc cc True
4 d d True dd ddd False
5 e n False ee ee True
6 f f True ff fff False
Directly comparing 2 dataframes with ==
You can do this with a simple ==
between two dataframes that you need to compare. Let's start with the original 2 dataframes df1
and df2
-
first = df1.set_index('empid')
second = df2.set_index('empid')
comparisons = first==second #<---
output = pd.concat([first, second, comparisons], axis=1,keys=['first','second', 'comparisons'])
#Swapping level and reindexing, borrowed from Jezrael's excellent answer
output = output.swaplevel(axis=1).reindex(first.columns, axis=1, level=0)
print(output)
empname empcity
first second comparisons first second comparisons
empid
1 a a True aa aa True
2 b b True bb bb True
3 c m False cc cc True
4 d d True dd ddd False
5 e n False ee ee True
6 f f True ff fff False
Alternate approach with pandas groupby
In addition to the excellent answer by jezrael, I am adding an alternate way of doing this using pandas groupby.
- Tranpose to get columns as row indexes
- Groupby on first level which contains empcity and empname
- Apply comparison between the 2 rows
- Transpose back to columns
- Add multi index columns by product of original columns and "comparisons"
- Combine the two dataframes (original one and one with comparisons)
- Use swaplevel and reindex to get the order of columns that you need
#create comparisons
comparisons = (df_all.T
.groupby(level=-1)
.apply(lambda x: x.iloc[0]==x.iloc[1])
.T)
#add multi index columns
comparisons.columns = pd.MultiIndex.from_product([['comparison'],comparisons.columns])
#concatenate with original data
df_final = pd.concat([df_all, comparisons], axis='columns')
#Swapping level and reindexing, borrowed from Jezrael's excellent answer
df_final = (df_final.swaplevel(axis = 'columns')
.reindex(df1.set_index('empid')
.columns, axis=1, level=0))
print(df_final)
empname empcity
first second comparison first second comparison
empid
1 a a True aa aa True
2 b b True bb bb True
3 c m False cc cc True
4 d d True dd ddd False
5 e n False ee ee True
6 f f True ff fff False