Resample hourly TimeSeries with certain starting hour
I want to resample a TimeSeries in daily (exactly 24 hours) frequence starting at a certain hour.
Like:
index = date_range(datetime(2012,1,1,17), freq='H', periods=60)
ts = Series(data=[1]*60, index=index)
ts.resample(rule='D', how='sum', closed='left', label='left')
Result i get:
2012-01-01 7
2012-01-02 24
2012-01-03 24
2012-01-04 5
Freq: D
Result i wish:
2012-01-01 17:00:00 24
2012-01-02 17:00:00 24
2012-01-03 17:00:00 12
Freq: D
Some weeks ago you could pass '24H'
to the freq
argument and it worked totally fine.
But now it combines '24H'
to '1D'
.
Was I using a bug with '24H'
which is fixed now?
And how can i get the wished result in a efficient and pythonic (or pandas) way back?
versions:
- python 2.7.3
- pandas 0.9.0rc1 (but doesn't work in 0.8.1, too)
- numpy 1.6.1
Resample has an base
argument which covers this case:
ts.resample(rule='24H', closed='left', label='left', base=17).sum()
Output:
2012-01-01 17:00:00 24
2012-01-02 17:00:00 24
2012-01-03 17:00:00 12
Freq: 24H
2021 Update: base
is deprecated since version 1.1.0: The new arguments that you should use are ‘offset’ or ‘origin’.
df.resample('24H',
origin=datetime(2012,1,1,17) # <-- ADD THIS
).sum()
New in version 1.1.0
origin{‘epoch’, ‘start’, ‘start_day’}, Timestamp or str, default ‘start_day’ The timestamp on which to adjust the grouping. The timezone of origin must match the timezone of the index. If a timestamp is not used, these values are also supported:
- ‘epoch’: origin is 1970-01-01
- ‘start’: origin is the first value of the timeseries
- ‘start_day’: origin is the first day at midnight of the timeseries