Calculate the sum of the differences between all dates within an expanding window of dates

The output column below is what I'm trying to calculate and the diffs column is an explanation of the differences that are summed to calculate output.

+------------+--------+-------------+
|       date | output |    diffs    |
+------------+--------+-------------+
| 01/01/2000 |        |             |
| 10/01/2000 |      9 | [9]         |
| 20/01/2000 |     29 | [10, 19]    |
| 25/01/2000 |     44 | [5, 15, 24] |
+------------+--------+-------------+

I've thought about using rolling and then creating a new column within each window for the diffs based on the last record in the current window and then summing these. However, rolling doesn't seem to have the ability to fix at the beginning of a DataFrame. I suppose I could calculate the difference between the minimum and maximum dates and use this as the rolling period but that seems hacky.

I've also looked at expanding but I couldn't see a way of creating new diffs as the window expanded.

Is there a non-loop, hopefully vectorisable, solution to this?

Here's the DataFrame:

import pandas as pd
import numpy as np


df = pd.DataFrame(
    {
        'date': (
            dt.datetime(2000, 1, 1), dt.datetime(2000, 1, 10),
            dt.datetime(2000, 1, 20), dt.datetime(2000, 1, 25),
        ),
        'output': (np.NaN, 9, 29, 44),
    }
)

Solution 1:

If you're looking for output, try:

datediff = df.date.diff()/pd.Timedelta('1D')

df['output'] = (datediff * np.arange(len(df))).cumsum()

Output:

        date  output
0 2000-01-01     NaN
1 2000-01-10     9.0
2 2000-01-20    29.0
3 2000-01-25    44.0

I'll leave the it to you to work out the logic behind.

Calculate the sum of the differences between all dates within an expanding window of dates

Solution 1:

Related

Recent Posts