python pandas DataFrame - compare two identically indexed and labeled dataframes cell by cell
I have two dataframe with exactly same indices and columns. Some of their values are different, and I want to generate a new dataframe with the same index and column but each cell indicating the result of comparison between the two dataframes.
Dataframe 1:
2018 2019 2020
A 1 1 1
B 0 1 2
C 2 0 0
Dataframe 2:
2018 2019 2020
A 1 0 0
B 0 1 1
C 1 1 1
Essentially, I am trying to apply the following function to each pair of cells from the two dataframe with same index and same column, and then map the result of comparison to a new dataframe at the same position:
def compare_two(cell_1, cell_2):
if cell_1 == cell_2:
return "same"
elif cell_1 == 0 and cell_2 > 0:
return "rise"
elif cell_2 == 0 and cell_1 > 0:
return "fall"
elif cell_1 * cell_2 == 2:
return "change"
The result dataframe should look like:
2018 2019 2020
A same fall fall
B same same change
C change rise rise
How should I achieve this?
One way using numpy.select
:
data = np.select([df1.eq(df2),
df1.eq(0) & df2.gt(0),
df2.eq(0) & df1.gt(0),
df1.mul(df2).eq(2)],
["same", "rise", "fall", "change"])
new_df = pd.DataFrame(data, columns = df1.columns, index=df1.index)
print(new_df)
Output:
2018 2019 2020
A same fall fall
B same same change
C change rise rise
other way you can use np.where
with np.logic
function to replace your function
result_data = np.where(df1==df2,'Same',
np.where(np.logical_and(df1==0 ,df2>0),"rise",
np.where(np.logical_and(df2==0 , df1>0),"fall",
np.where(df1*df2==2,"change",""))))
result = pd.DataFrame(result_data,index=df1.index,columns=df1.columns)
output
2018 2019 2020
A Same fall fall
B Same Same change
C change rise rise