Truncate `TimeStamp` column to hour precision in pandas `DataFrame`
I have a pandas.DataFrame
called df
which has an automatically generated index, with a column dt
:
df['dt'].dtype, df['dt'][0]
# (dtype('<M8[ns]'), Timestamp('2014-10-01 10:02:45'))
What I'd like to do is create a new column truncated to hour precision. I'm currently using:
df['dt2'] = df['dt'].apply(lambda L: datetime(L.year, L.month, L.day, L.hour))
This works, so that's fine. However, I've an inkling there's some nice way using pandas.tseries.offsets
or creating a DatetimeIndex
or similar.
So if possible, is there some pandas
wizardry to do this?
Solution 1:
In pandas 0.18.0 and later, there are datetime floor
, ceil
and round
methods to round timestamps to a given fixed precision/frequency. To round down to hour precision, you can use:
>>> df['dt2'] = df['dt'].dt.floor('h')
>>> df
dt dt2
0 2014-10-01 10:02:45 2014-10-01 10:00:00
1 2014-10-01 13:08:17 2014-10-01 13:00:00
2 2014-10-01 17:39:24 2014-10-01 17:00:00
Here's another alternative to truncate the timestamps. Unlike floor
, it supports truncating to a precision such as year or month.
You can temporarily adjust the precision unit of the underlying NumPy datetime64
datatype, changing it from [ns]
to [h]
:
df['dt'].values.astype('<M8[h]')
This truncates everything to hour precision. For example:
>>> df
dt
0 2014-10-01 10:02:45
1 2014-10-01 13:08:17
2 2014-10-01 17:39:24
>>> df['dt2'] = df['dt'].values.astype('<M8[h]')
>>> df
dt dt2
0 2014-10-01 10:02:45 2014-10-01 10:00:00
1 2014-10-01 13:08:17 2014-10-01 13:00:00
2 2014-10-01 17:39:24 2014-10-01 17:00:00
>>> df.dtypes
dt datetime64[ns]
dt2 datetime64[ns]
The same method should work for any other unit: months 'M'
, minutes 'm'
, and so on:
- Keep up to year:
'<M8[Y]'
- Keep up to month:
'<M8[M]'
- Keep up to day:
'<M8[D]'
- Keep up to minute:
'<M8[m]'
- Keep up to second:
'<M8[s]'
Solution 2:
A method I've used in the past to accomplish this goal was the following (quite similar to what you're already doing, but thought I'd throw it out there anyway):
df['dt2'] = df['dt'].apply(lambda x: x.replace(minute=0, second=0))