Remove non-business days rows from pandas dataframe
I have a dataframe with a timeseries data of wheat in df
.
df = wt["WHEAT_USD"]
2016-05-02 02:00:00+02:00 4.780
2016-05-02 02:01:00+02:00 4.777
2016-05-02 02:02:00+02:00 4.780
2016-05-02 02:03:00+02:00 4.780
2016-05-02 02:04:00+02:00 4.780
Name: closeAsk, dtype: float64
When I plot the data it has these annoying horizontal lines because of weekends. Is there a simple way of removing the non-business days from the dataframe itself?
Something like
df = df.BDays()
Solution 1:
One simple solution is to slice out the days not in Monday to Friday:
In [11]: s[s.index.dayofweek < 5]
Out[11]:
2016-05-02 00:00:00 4.780
2016-05-02 00:01:00 4.777
2016-05-02 00:02:00 4.780
2016-05-02 00:03:00 4.780
2016-05-02 00:04:00 4.780
Name: closeAsk, dtype: float64
Note: this doesn't take into account bank holidays etc.
Solution 2:
Pandas BDay
just ends up using .dayofweek<5
like the chosen answer, but can be extended to account for bank holidays, etc.
import pandas as pd
from pandas.tseries.offsets import BDay
isBusinessDay = BDay().onOffset
csv_path = 'C:\\Python27\\Lib\\site-packages\\bokeh\\sampledata\\daylight_warsaw_2013.csv'
dates_df = pd.read_csv(csv_path)
match_series = pd.to_datetime(dates_df['Date']).map(isBusinessDay)
dates_df[match_series]