Resampling Minute data
I have minute based OHLCV data for the opening range/first hour (9:30-10:30 AM EST). I'm looking to resample this data so I can get one 60-minute value and then calculate the range.
When I call the dataframe.resample() function on the data I get two rows and the initial row starts at 9:00 AM. I'm looking to get only one row which starts at 9:30 AM.
Note: the initial data begins at 9:30.
Edit: Adding code:
# Extract data for regular trading hours (rth) from the 24 hour data set
rth = data.between_time(start_time = '09:30:00', end_time = '16:15:00', include_end = False)
# Extract data for extended trading hours (eth) from the 24 hour data set
eth = data.between_time(start_time = '16:30:00', end_time = '09:30:00', include_end = False)
# Extract data for initial balance (rth) from the 24 hour data set
initial_balance = data.between_time(start_time = '09:30:00', end_time = '10:30:00', include_end = False)
Got stuck tried to separate the opening range by individual date and get the Initial Balance
conversion = {'Open' : 'first', 'High' : 'max', 'Low' : 'min', 'Close' : 'last', 'Volume' : 'sum'}
sample = data.between_time(start_time = '09:30:00', end_time = '10:30:00', include_end = False)
sample = sample.ix['2007-05-07']
sample.tail()
sample.resample('60Min', how = conversion)
By default resample starts at the beggining of the hour. I would like it to start from where the data starts.
Solution 1:
You can use the base
argument of resample
:
sample.resample('60Min', how=conversion, base=30)
From the above docs-link:
base
:int
, default 0
For frequencies that evenly subdivide 1 day, the “origin” of the aggregated intervals.
For example, for ‘5min’ frequency, base could range from 0 through 4. Defaults to 0
Solution 2:
value is the column you want to aggregate, resample the dataframe dates by second and aggregate by mean, then drop the nan rows.
data=[('2014-02-24 16:16:47.204000', 1.391424)
,('2014-02-24 16:18:48.296000', 1.048143)
,('2014-02-24 16:19:52.346000', -0.823974)
,('2014-02-24 16:22:13.665000', -0.689560)
,('2014-02-24 16:24:13.760000', -0.323252)
,('2014-02-24 16:26:15.155000', -1.095331)
,('2014-02-24 16:29:58.235000', -0.185681)]
df=pd.DataFrame(data,columns=['Date','Value'])
df['Date']=pd.to_datetime(df['Date'])
minutes=df.resample('1Min',on='Date').mean().dropna()
print(minutes)
output:
Value
Date
2014-02-24 16:16:00 1.391424
2014-02-24 16:18:00 1.048143
2014-02-24 16:19:00 -0.823974
2014-02-24 16:22:00 -0.689560
2014-02-24 16:24:00 -0.323252
2014-02-24 16:26:00 -1.095331
2014-02-24 16:29:00 -0.185681