Calculate time difference between Pandas Dataframe indices
I am trying to add a column of deltaT to a dataframe where deltaT is the time difference between the successive rows (as indexed in the timeseries).
time value
2012-03-16 23:50:00 1
2012-03-16 23:56:00 2
2012-03-17 00:08:00 3
2012-03-17 00:10:00 4
2012-03-17 00:12:00 5
2012-03-17 00:20:00 6
2012-03-20 00:43:00 7
Desired result is something like the following (deltaT units shown in minutes):
time value deltaT
2012-03-16 23:50:00 1 0
2012-03-16 23:56:00 2 6
2012-03-17 00:08:00 3 12
2012-03-17 00:10:00 4 2
2012-03-17 00:12:00 5 2
2012-03-17 00:20:00 6 8
2012-03-20 00:43:00 7 23
Solution 1:
Note this is using numpy >= 1.7, for numpy < 1.7, see the conversion here: http://pandas.pydata.org/pandas-docs/dev/timeseries.html#time-deltas
Your original frame, with a datetime index
In [196]: df
Out[196]:
value
2012-03-16 23:50:00 1
2012-03-16 23:56:00 2
2012-03-17 00:08:00 3
2012-03-17 00:10:00 4
2012-03-17 00:12:00 5
2012-03-17 00:20:00 6
2012-03-20 00:43:00 7
In [199]: df.index
Out[199]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-03-16 23:50:00, ..., 2012-03-20 00:43:00]
Length: 7, Freq: None, Timezone: None
Here is the timedelta64 of what you want
In [200]: df['tvalue'] = df.index
In [201]: df['delta'] = (df['tvalue']-df['tvalue'].shift()).fillna(0)
In [202]: df
Out[202]:
value tvalue delta
2012-03-16 23:50:00 1 2012-03-16 23:50:00 00:00:00
2012-03-16 23:56:00 2 2012-03-16 23:56:00 00:06:00
2012-03-17 00:08:00 3 2012-03-17 00:08:00 00:12:00
2012-03-17 00:10:00 4 2012-03-17 00:10:00 00:02:00
2012-03-17 00:12:00 5 2012-03-17 00:12:00 00:02:00
2012-03-17 00:20:00 6 2012-03-17 00:20:00 00:08:00
2012-03-20 00:43:00 7 2012-03-20 00:43:00 3 days, 00:23:00
Getting out the answer while disregarding the day difference (your last day is 3/20, prior is 3/17), actually is tricky
In [204]: df['ans'] = df['delta'].apply(lambda x: x / np.timedelta64(1,'m')).astype('int64') % (24*60)
In [205]: df
Out[205]:
value tvalue delta ans
2012-03-16 23:50:00 1 2012-03-16 23:50:00 00:00:00 0
2012-03-16 23:56:00 2 2012-03-16 23:56:00 00:06:00 6
2012-03-17 00:08:00 3 2012-03-17 00:08:00 00:12:00 12
2012-03-17 00:10:00 4 2012-03-17 00:10:00 00:02:00 2
2012-03-17 00:12:00 5 2012-03-17 00:12:00 00:02:00 2
2012-03-17 00:20:00 6 2012-03-17 00:20:00 00:08:00 8
2012-03-20 00:43:00 7 2012-03-20 00:43:00 3 days, 00:23:00 23
Solution 2:
We can create a series with both index and values equal to the index keys using to_series
and then compute the differences between successive rows which would result in timedelta64[ns]
dtype. After obtaining this, via the .dt
property, we could access the seconds attribute of the time portion and finally divide each element by 60 to get it outputted in minutes(optionally filling the first value with 0).
In [13]: df['deltaT'] = df.index.to_series().diff().dt.seconds.div(60, fill_value=0)
...: df # use .astype(int) to obtain integer values
Out[13]:
value deltaT
time
2012-03-16 23:50:00 1 0.0
2012-03-16 23:56:00 2 6.0
2012-03-17 00:08:00 3 12.0
2012-03-17 00:10:00 4 2.0
2012-03-17 00:12:00 5 2.0
2012-03-17 00:20:00 6 8.0
2012-03-20 00:43:00 7 23.0
simplification:
When we perform diff
:
In [8]: ser_diff = df.index.to_series().diff()
In [9]: ser_diff
Out[9]:
time
2012-03-16 23:50:00 NaT
2012-03-16 23:56:00 0 days 00:06:00
2012-03-17 00:08:00 0 days 00:12:00
2012-03-17 00:10:00 0 days 00:02:00
2012-03-17 00:12:00 0 days 00:02:00
2012-03-17 00:20:00 0 days 00:08:00
2012-03-20 00:43:00 3 days 00:23:00
Name: time, dtype: timedelta64[ns]
Seconds to minutes conversion:
In [10]: ser_diff.dt.seconds.div(60, fill_value=0)
Out[10]:
time
2012-03-16 23:50:00 0.0
2012-03-16 23:56:00 6.0
2012-03-17 00:08:00 12.0
2012-03-17 00:10:00 2.0
2012-03-17 00:12:00 2.0
2012-03-17 00:20:00 8.0
2012-03-20 00:43:00 23.0
Name: time, dtype: float64
If suppose you want to include even the date
portion as it was excluded previously(only time portion was considered), dt.total_seconds
would give you the elapsed duration in seconds with which minutes could then be calculated again by division.
In [12]: ser_diff.dt.total_seconds().div(60, fill_value=0)
Out[12]:
time
2012-03-16 23:50:00 0.0
2012-03-16 23:56:00 6.0
2012-03-17 00:08:00 12.0
2012-03-17 00:10:00 2.0
2012-03-17 00:12:00 2.0
2012-03-17 00:20:00 8.0
2012-03-20 00:43:00 4343.0 # <-- number of minutes in 3 days 23 minutes
Name: time, dtype: float64