Get value from previous column for each group in groupby

This is my df -

Site	Product	Period	Outflow	Production	Opening Inventory	New Opening Inventory
California	Apples	1	3226	4300	1213	1213
California	Apples	2	3279	3876	0	0
California	Apples	3	4390	4530	0	0
California	Apples	4	4281	3870	0	0
California	Apples	5	4421	4393	0	0
California	Oranges	1	505	400	0	0
California	Oranges	2	278	505	0	0
California	Oranges	3	167	278	0	0
California	Oranges	4	124	167	0	0
California	Oranges	5	106	124	0	0
Montreal	Maple Syrup	1	445	465	293	293
Montreal	Maple Syrup	2	82	398	0	0
Montreal	Maple Syrup	3	745	346	0	0
Montreal	Maple Syrup	4	241	363	0	0
Montreal	Maple Syrup	5	189	254	0	0

As seen, there are three groups when grouped by Site and Product. For each of the three groups I want to do the following (for periods 2 to 5) -

Set New Opening Inventory to Closing Inventory of previous period
Calculate Closing Inventory for next period using the formula, Closing Inventory = Production + Inflow + New Opening Inventory - Outflow

I am trying to solve this using a combination of groupby and for loop

Here is what I have so far -

If df is a single group I can simply do

# calculate closing inventory of period 1
df['Closing Inventory'] = np.where(df['PeriodNo']==1, <formula>, 0)

for i in range(1, len(df)):
    df.loc[i, 'New Opening Inventory'] = df.loc[i-1, 'Closing Inventory']
    df.loc[i, 'Closing Inventory'] = df.loc[i, 'Production'] + df.loc[i, 'Inflow'] + df.loc[i, 'New Opening Inventory'] - df.loc[i, 'Outflow']

and when I try to nest this for loop in a loop over groups

# calculate closing inventory of period 1 for all groups
df['Closing Inventory'] = np.where(df['PeriodNo']==1, <formula>, 0)

g = df.groupby(['Site', 'Product']

alist = []

for k in g.groups.keys():
    temp = g.get_group(k).reset_index(drop=True)
    for i in range(1, len(temp)):
        temp.loc[i, 'New Opening Inventory'] = temp.loc[i-1, 'Closing Inventory']
        temp.loc[i, 'Closing Inventory'] = temp.loc[i, 'Production'] + temp.loc[i, 'Inflow'] + temp.loc[i, 'New Opening Inventory'] - temp.loc[i, 'Outflow']
    alist.append(temp)

df2 = pd.concat(alist, ignore_index=True)
df2

This solution works but seems very inefficient with the nested loop. Is there a better way to do this?

Solution 1:

Your New Opening Inventory is always previous Closing Inventory.

So I can modify this formula

Closing Inventory = Production + Inflow + New Opening Inventory - Outflow

Closing Inventory = Production + Inflow + Previous Closing Inventory - Outflow

For the first row, you do not have the Closing Inventory. But from the 2nd row, you calculate the Closing Inventory and you carry over the Closing Inventory to the next row.

Before obtaining the Closing Inventory, at first, calculate "Production" + "Inflow" - "Overflow" using list comprehension. The list comprehension performs better than for loop.

df['Closing Inventory'] = [x + y - z if p > 1 else 0 for p, x, y, z in zip(df['Period'], df['Production'], df['Inflow'], df['Outflow'])]

# df[['Site', 'Product', 'Closing Inventory']]
#         Site  Product Closing Inventory
# 0 California  Apples                  0
# 1 California  Apples                597
# 2 California  Apples                140
# 3 California  Apples               -411
# 4 California  Apples                -28
# 5 California  Oranges                 0
# 6 California  Oranges               227
# 7 California  Oranges               111
# ...

Then, the rest of formula is adding the previously calculated Closing Inventory, that means you can cumsum this result.

For row 1, Previous Closing (0) + calculated part (597) = 597
For row 2, Previous Closing (597) + calculated part (140) = 737
...

df['Closing Inventory'] = df.groupby(['Site', 'Product'])['Closing Inventory'].cumsum()

# df[['Site', 'Product', 'Closing Inventory']]
#         Site  Product Closing_Inventory
# 0 California  Apples                  0
# 1 California  Apples                597
# 2 California  Apples                737
# 3 California  Apples                326
# 4 California  Apples                298
# 5 California  Oranges                 0
# 6 California  Oranges               227
# 7 California  Oranges               338
# ...

Again the New Opening Inventory is always previous Closing Inventory except when period is 1. Hence first, shift the Closing Inventory then pick New Opening Inventory when period is 1.

I used combine_first to pick value from either New Opening or Closing Inventory.

df['New Opening Inventory'] = (df['New Opening Inventory'].replace(0, np.nan)
                               .combine_first(
                                   df.groupby(['Site', 'Product'])['Closing Inventory']
                                   .shift()
                                   .fillna(0)
                               ).astype(int))

Result

          Site  Product Period  New Opening Inventory Closing Inventory
0   California  Apples       1                   1213                 0
1   California  Apples       2                      0               597
2   California  Apples       3                    597               737
3   California  Apples       4                    737               326
4   California  Apples       5                    326               298
5   California  Oranges      1                      0                 0
6   California  Oranges      2                      0               227
7   California  Oranges      3                    227               338
...

With the sample data on my laptop,

Original solution: 8.44 ms ± 280 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
This solution: 2.95 ms ± 23.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I think with list comprehension and vectorization function, this solution can perform faster.

Get value from previous column for each group in groupby

Solution 1:

Related

Recent Posts