How to calculate Mean Absolute Error (MAE) and Mean Signed Error (MSE) using pandas/numpy/python math libray?
Edit
I think I understand now, let me know if this is what you want
MAE:
df['MAE'] = df[['M1_ABS_Error','M2_ABS_Error']].mean(axis = 1)
df
produces
date Thermometer True_Temperature Method_1 Method_2 M1_ABS_Error M2_ABS_Error MAE
-- -------- ------------- ------------------ ---------- ---------- -------------- -------------- -----
0 1/1/2021 red 0.2 0.2 0.5 0 0.3 0.15
1 1/1/2021 red 0.6 0.6 0.3 0 0.3 0.15
2 1/1/2021 red 0.4 0.6 0.23 0.2 0.17 0.185
3 1/1/2021 green 0.2 0.4 nan 0.2 nan 0.2
4 1/1/2021 green 1 1 0.23 0 0.77 0.385
5 1/1/2021 yellow 0.4 0.4 0.32 0 0.08 0.04
6 1/1/2021 yellow 0.1 nan 0.4 nan 0.3 0.3
7 1/1/2021 yellow 1.3 0.5 0.54 0.8 0.76 0.78
8 1/1/2021 yellow 1.5 0.5 0.43 1 1.07 1.035
9 1/1/2021 yellow 1.5 0.5 0.43 1 1.07 1.035
10 1/1/2021 blue 0.4 0.3 nan 0.1 nan 0.1
11 1/1/2021 blue 0.8 0.2 0.11 0.6 0.69 0.645
and for MSE (Signed error)
df["MSE"]= df[['Method_1','Method_2']].mean(axis = 1)- df['True_Temperature']
produces
date Thermometer True_Temperature Method_1 Method_2 M1_ABS_Error M2_ABS_Error MAE MSE
-- -------- ------------- ------------------ ---------- ---------- -------------- -------------- ----- ------
0 1/1/2021 red 0.2 0.2 0.5 0 0.3 0.15 0.15
1 1/1/2021 red 0.6 0.6 0.3 0 0.3 0.15 -0.15
2 1/1/2021 red 0.4 0.6 0.23 0.2 0.17 0.185 0.015
3 1/1/2021 green 0.2 0.4 nan 0.2 nan 0.2 0.2
4 1/1/2021 green 1 1 0.23 0 0.77 0.385 -0.385
5 1/1/2021 yellow 0.4 0.4 0.32 0 0.08 0.04 -0.04
6 1/1/2021 yellow 0.1 nan 0.4 nan 0.3 0.3 0.3
7 1/1/2021 yellow 1.3 0.5 0.54 0.8 0.76 0.78 -0.78
8 1/1/2021 yellow 1.5 0.5 0.43 1 1.07 1.035 -1.035
9 1/1/2021 yellow 1.5 0.5 0.43 1 1.07 1.035 -1.035
10 1/1/2021 blue 0.4 0.3 nan 0.1 nan 0.1 -0.1
11 1/1/2021 blue 0.8 0.2 0.11 0.6 0.69 0.645 -0.645
Original answer
It is not entirely clear what you want, but somewhat guessing here, is this what you are after? If you groupby
by color and apply mean
to the `ABS columns within each group
data.groupby('Thermometer', sort = False)[['M1_ABS_Error','M2_ABS_Error']].mean()
you get this
M1_ABS_Error M2_ABS_Error
Thermometer
red 0.066667 0.256667
green 0.100000 0.770000
yellow 0.700000 0.656000
blue 0.350000 0.690000
Here, for example, the first top left number '0.066667is the average of the
M1_ABS_Errorcolumn for those Thermometers that are
red`. Similar to others. NaNs are skipped within each color/column
to get MSE (which normally means Mean Squared Error so I assume this is what you are after) you can do
import numpy as np
data["M1_Sqr_Error"]= (data["True_Temperature"]-data["Method_1"])**2
data["M2_Sqr_Error"]= (data["True_Temperature"]-data["Method_2"])**2
data.groupby('Thermometer', sort = False)[['M1_Error','M2_Error']].apply(lambda v: np.sqrt(np.mean(v)))
to get
M1_Error M2_Error
Thermometer
red 0.115470 0.263881
green 0.141421 0.770000
yellow 0.812404 0.769909
blue 0.430116 0.690000