Getting the rolling.sum of row values with irregular time intervals
I am trying to get the rolling.sum of my time series. However, the rows have varying time intervals (see below my df_water_level_US1 dataframe):
DATE TIMEREAD WATERLEVEL(M) DateAndTime
0 01/01/2016 0:00:15 0.65 01/01/2016 0:00:15
1 01/01/2016 0:10:14 0.65 01/01/2016 0:10:14
2 01/01/2016 0:20:11 0.64 01/01/2016 0:20:11
3 01/01/2016 0:30:12 0.66 01/01/2016 0:30:12
4 01/01/2016 0:40:12 0.64 01/01/2016 0:40:12
and so on. I tried to use this to get the sum for each day and save it to final_1D:
final_1D = df_water_level_US1.set_index('DateAndTime').rolling('1D').sum()
but I get this error:
ValueError: window must be an integer 0 or greater
The expected output is:
DATETIMEREAD WATERLEVEL(M) DateAndTime
01/01/2016 3.24 01/01/2016
and so on (02/01/2016, 03/01/2016 etc)
Anyone who have idea how to fix this?
Solution 1:
Try:
df_water_level_US1['DateAndTime'] = pd.to_datetime(df_water_level_US1['DateAndTime'])
final_1D = df_water_level_US1.resample('D', on='DateAndTime')['WATERLEVEL(M)'].sum()
print(final_1D.reset_index())
# Output
DateAndTime WATERLEVEL(M)
0 2016-01-01 3.24
The first line is not mandatory if your column DateAndTime
is already a DatetimeIndex.
Solution 2:
For daily sum of water level try grouping by date:
df.groupby('Date').sum('WATERLEVEL(M)')