What is the easiest way to do this transformation in Python?
I have a table in this format, I want to calculate group-wise weightage average such that if there is null value for a particular value_KPI its weightage should be distributed equally to other KPIs from same group.
Groups | KPIs | Weightages | value_KPI |
---|---|---|---|
G1 | KP1 | 30% | 45 |
G1 | KP2 | 30% | |
G1 | KP3 | 40% | |
G2 | KP4 | 30% | 34 |
G2 | KP5 | 30% | |
G2 | KP6 | 20% | 90 |
G2 | KP7 | 20% | 45 |
something like this:
Groups | KPIs | Weightages | value_KPI |
---|---|---|---|
G1 | KP1 | 100% | 45 |
G1 | KP2 | ||
G1 | KP3 | ||
G2 | KP4 | 40% | 34 |
G2 | KP5 | ||
G2 | KP6 | 30% | 90 |
G2 | KP7 | 30% | 45 |
Please help me with python code to do this.
Let's define a simple helper function:
def distribute(g):
nans = g['value_KPI'].isna()
g.loc[~nans, 'Weightages'] += g.loc[nans, 'Weightages'].sum()/sum(~nans)
g.loc[nans, 'Weightages'] = 'NaN'
return g
Now we apply it to each group after groupby
df.groupby(['Groups']).apply(distribute)
output:
Groups KPIs Weightages value_KPI
-- -------- ------ ------------ -----------
0 G1 KP1 1 45
1 G1 KP2 nan nan
2 G1 KP3 nan nan
3 G2 KP4 0.4 34
4 G2 KP5 nan nan
5 G2 KP6 0.3 90
6 G2 KP7 0.3 45