Pandas - Sum of rows between two indexes in diffrent columns
The logic for grouping is complicated enough so that we need to write a function to do it, so we start with that:
def groups_from_conditions(cond1, cond2):
'''
assign a unique non-NaN integer to each group as defined by the rules
'''
n = len(cond1)
group_idx = -1
groups = np.zeros(n)
curr_state = 0 # 0 = not in a group, 1 = in a group
for n in range(n):
if curr_state == 0:
# Currently not in a group
if cond1[n] == 1:
# Detected start of a group. so:
# switch the state to 1 ie in a group
curr_state = 1
# get a new group_idx
group_idx = group_idx + 1
# assign it to the output for element n
groups[n] = group_idx
else:
# no start of the group detected, we are not in a group so mark as NaN
groups[n] = np.NaN
else:
# current_state == 1 so we are in a group
if cond2[n] == 1:
# detected end of group -- switch state to 0
curr_state = 0
# as we are in a group assign current group_idx. Note that this happens for the element
# for which cond2[n] == 1 as well, ie this element is included
groups[n] = group_idx
return groups
To see that it is doing the right thing let's apply it to the first 25 rows and observe the results
# calculate cond1 and cond2
df2 = analyse(dfA)
# Test our logic
test_cond1 = df2.loc[:25, 'cond1'].values
test_cond2 = df2.loc[:25, 'cond2'].values
test_groups = groups_from_conditions(test_cond1, test_cond2)
# stick in test_df for ease of printing
test_df = pd.DataFrame({'cond1' : test_cond1, 'cond2' : test_cond2, 'groups' : test_groups})
test_df
we obtain
cond1 cond2 groups
-- ------- ------- --------
0 0 0 nan
1 0 0 nan
2 0 0 nan
3 1 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 1 0 0
8 0 0 0
9 0 1 0
10 0 0 nan
11 1 0 1
12 0 0 1
13 0 0 1
14 0 0 1
15 1 1 1
16 0 0 nan
17 1 0 2
18 1 1 2
19 1 0 3
20 0 0 3
21 0 0 3
22 0 1 3
23 0 0 nan
24 0 0 nan
we see that a unique integer id (in groups
) is assigned to each group and it seems to satisfy the requirements
Now we can just groupby
on the output of our function and sum
the right column
df2.groupby(groups_from_conditions(df2['cond1'].values, df2['cond2'].values))['change'].sum().to_frame()
output
change
-- ----------
0 0.21502
1 0.173852
2 0.138442
3 0.00701182
4 0.209301
5 0.175492
6 0.0209937
7 0.146636
8 0.290243
Edit -- better output
This also returns the start and the end index for each group:
df2.reset_index().groupby(groups_from_conditions(df2['cond1'].values, df2['cond2'].values)).agg({'change':sum, 'index':['first','last']})
output
change index
sum first last
0 0.215020 4 10
1 0.173852 12 16
2 0.138442 18 19
3 0.007012 20 23
4 0.209301 45 46
5 0.175492 50 56
6 0.020994 57 61
7 0.146636 62 71
8 0.290243 85 99
It took me some time to understand the logic, but transform
can do a good job here.
If would group the dataframe by blocs ending with a 1 in the cond2 column, and transform each bloc using an auxiliary function to give the result the sum of the changes column starting with the first 1 value in cond1.
The auxiliary function is:
def process(f):
x = df.loc[f.index, 'cond1'].cummax()
val = f[x > 0].sum()
return np.where(x > 0, val, 0)
We can now directly get the expected result with:
df = analyse(dfA)
df['changes_sum'] = df.groupby(df['cond2'].shift().fillna(0).cumsum()
)['change'].transform(process)
to get:
A B change cond1 cond2 changes_sum
1 0.000357 9.565909 -0.105097 0 0 0.000000
2 -0.360938 10.326678 0.079529 0 0 0.000000
3 0.968598 9.536587 -0.076510 0 0 0.000000
4 -1.144810 9.565296 0.003010 1 0 0.215020
5 1.107228 10.781904 0.127190 0 0 0.215020
6 0.113930 9.230490 -0.143891 0 0 0.215020
7 -0.186869 8.754208 -0.051599 0 0 0.215020
8 -2.016947 8.509005 -0.028010 1 0 0.215020
9 1.324123 10.621020 0.248209 0 0 0.215020
10 -0.021689 11.259448 0.060110 0 1 0.215020
11 -0.303271 9.845053 -0.125618 0 0 0.000000
12 -1.274105 9.730489 -0.011637 1 0 0.173852
13 -0.299103 9.541319 -0.019441 0 0 0.173852
14 -0.374649 10.755250 0.127229 0 0 0.173852
15 0.426690 10.450911 -0.028297 0 0 0.173852
16 -2.347033 11.558679 0.105997 1 1 0.173852
17 0.213736 9.951042 -0.139085 0 0 0.000000
18 -1.701469 10.251330 0.030176 1 0 0.138442
19 -1.485369 11.361190 0.108265 1 1 0.138442
20 -1.311128 10.514855 -0.074494 1 0 0.007012
21 1.158399 8.753032 -0.167556 0 0 0.007012
22 0.065564 9.997387 0.142163 0 0 0.007012
23 2.517158 11.066090 0.106898 0 1 0.007012
24 -0.388263 8.438005 -0.237490 0 0 0.000000
25 0.822394 8.924752 0.057685 0 0 0.000000
26 0.122491 10.624278 0.190428 0 0 0.000000
27 -0.882316 9.728327 -0.084330 0 0 0.000000
28 -0.245772 10.567543 0.086265 0 0 0.000000
29 0.172045 10.855364 0.027236 0 0 0.000000
30 1.709001 9.422214 -0.132022 0 0 0.000000
31 -0.280294 9.770575 0.036972 0 0 0.000000
32 -0.691827 9.080614 -0.070616 0 0 0.000000
33 -0.321549 10.590952 0.166326 0 0 0.000000
34 -0.155032 9.512700 -0.101809 0 0 0.000000
35 -0.052168 9.782063 0.028316 0 0 0.000000
36 0.951612 7.770738 -0.205614 0 0 0.000000
37 0.725763 11.399600 0.466991 0 1 0.000000
38 0.793190 10.718898 -0.059713 0 0 0.000000
39 2.080533 10.018371 -0.065354 0 0 0.000000
40 2.890031 11.147476 0.112703 0 1 0.000000
41 1.563451 9.316422 -0.164257 0 0 0.000000
42 -0.358973 8.371704 -0.101404 0 0 0.000000
43 -0.042927 10.443485 0.247474 0 0 0.000000
44 0.438877 9.734462 -0.067891 0 0 0.000000
45 -1.369854 10.522775 0.080982 1 0 0.209301
46 -1.260196 11.873045 0.128319 1 1 0.209301
47 -0.012729 9.267605 -0.219442 0 0 0.000000
48 0.603590 9.903244 0.068587 0 0 0.000000
49 -0.120610 10.041392 0.013950 0 0 0.000000
50 -1.372217 9.180103 -0.085774 1 0 0.175492
51 -0.068498 9.098150 -0.008927 0 0 0.175492
52 0.125410 9.040123 -0.006378 0 0 0.175492
53 0.874425 8.371506 -0.073961 0 0 0.175492
54 -0.442511 9.988561 0.193162 0 0 0.175492
55 -0.621700 9.558664 -0.043039 0 0 0.175492
56 -1.120992 11.474312 0.200410 1 1 0.175492
57 -1.458678 10.187339 -0.112161 1 0 0.020994
58 0.114353 9.922086 -0.026038 0 0 0.020994
59 -0.254377 8.413193 -0.152074 0 0 0.020994
60 -0.754028 10.927496 0.298852 0 0 0.020994
61 0.028300 11.063152 0.012414 0 1 0.020994
62 -1.768822 11.431818 0.033324 1 1 0.033324
63 0.742857 9.080574 -0.205675 0 0 0.000000
64 -1.040765 9.573166 0.054247 1 0 0.318988
65 -0.510999 10.168282 0.062165 0 0 0.318988
66 0.438022 10.404902 0.023270 0 0 0.318988
67 -0.359517 9.077677 -0.127558 0 0 0.318988
68 -0.730178 8.614026 -0.051076 0 0 0.318988
69 -0.449042 10.532863 0.222757 0 0 0.318988
70 1.003614 9.469792 -0.100929 0 0 0.318988
71 1.386934 11.705711 0.236111 0 1 0.318988
72 0.296448 11.640889 -0.005538 0 1 0.000000
73 -0.549103 11.163379 -0.041020 0 1 0.000000
74 -0.389459 10.764772 -0.035707 0 0 0.000000
75 -0.857943 8.109918 -0.246624 0 0 0.000000
76 -0.623137 10.847840 0.337602 0 0 0.000000
77 -0.397094 10.518781 -0.030334 0 0 0.000000
78 -0.874785 9.509236 -0.095975 0 0 0.000000
79 0.930555 9.490074 -0.002015 0 0 0.000000
80 -0.800944 8.843778 -0.068102 0 0 0.000000
81 -0.374924 9.377653 0.060367 0 0 0.000000
82 -0.509361 9.875481 0.053087 0 0 0.000000
83 -0.063740 10.099272 0.022661 0 0 0.000000
84 0.254932 8.432778 -0.165011 0 0 0.000000
85 -2.104728 8.578837 0.017320 1 0 0.290243
86 -0.355120 9.887398 0.152534 0 0 0.290243
87 -1.911956 9.824299 -0.006382 1 0 0.290243
88 -0.116750 10.173219 0.035516 0 0 0.290243
89 1.389123 10.281146 0.010609 0 0 0.290243
90 -0.441094 10.642809 0.035177 0 0 0.290243
91 0.686238 10.029521 -0.057625 0 0 0.290243
92 0.983907 8.541020 -0.148412 0 0 0.290243
93 -0.486963 9.527907 0.115547 0 0 0.290243
94 0.250544 10.741613 0.127384 0 0 0.290243
95 0.929991 9.916856 -0.076781 0 0 0.290243
96 0.674956 9.617204 -0.030216 0 0 0.290243
97 -0.533244 7.702358 -0.199106 0 0 0.290243
98 1.973100 10.727567 0.392764 0 0 0.290243
99 1.460717 9.889898 -0.078086 0 0 0.290243