A loop that makes multi-conditional summations
Following code doesn't use any package. Starting from Python 3.7
all dicts are insertion-ordered, this fact is used in following code so that final result has order of original appearance of elements. If for some reason your python is below 3.7
, tell me, I'll modify code to explicitly do ordering instead of relying on this language feature.
Try it online!
df = [["john","2019","30.2"], ["john","2019","40"], ["john","2020","50.3"],
["amy","2019","60"], ["amy","2019","20"], ["amy","2020","40.1"]]
r = {}
for *a, b in df:
a = tuple(a)
if a not in r:
r[a] = 0
r[a] += float(b)
r = [list(k) + [str(v)] for k, v in r.items()]
print(r)
Output:
[['john', '2019', '70.2'], ['john', '2020', '50.3'], ['amy', '2019', '80.0'], ['amy', '2020', '40.1']]
Since you are using df
variable name I am assuming you are familiar with pandas.
You can easily do this in pandas. Just convert your list into df.
And the groupby columns which you want unique values and select the last row
df.groupby(['col_a', 'col_b'], as_index=False).last()
You can sort the df before calling groupby if you have any custom logic
Here's a way to do it using defaultdict
:
from collections import defaultdict
sums = defaultdict(lambda: defaultdict(float))
for item in df:
sums[item[0]][item[1]] += float(item[2])
lst = [[key, inner_key, value] for key in sums for inner_key, value in sums[key].items()]