How to filter a dataframe of dates by a particular month/day?
So my code is as follows:
df['Dates'][df['Dates'].index.month == 11]
I was doing a test to see if I could filter the months so it only shows November dates, but this did not work. It gives me the following error: AttributeError: 'Int64Index' object has no attribute 'month'.
If I do
print type(df['Dates'][0])
then I get class 'pandas.tslib.Timestamp', which leads me to believe that the types of objects stored in the dataframe are Timestamp objects. (I'm not sure where the 'Int64Index' is coming from... for the error before)
What I want to do is this: The dataframe column contains dates from the early 2000's to present in the following format: dd/mm/yyyy. I want to filter for dates only between November 15 and March 15, independent of the YEAR. What is the easiest way to do this?
Thanks.
Here is df['Dates'] (with indices):
0 2006-01-01
1 2006-01-02
2 2006-01-03
3 2006-01-04
4 2006-01-05
5 2006-01-06
6 2006-01-07
7 2006-01-08
8 2006-01-09
9 2006-01-10
10 2006-01-11
11 2006-01-12
12 2006-01-13
13 2006-01-14
14 2006-01-15
...
Solution 1:
Using pd.to_datetime
& dt
accessor
The accepted answer is not the "pandas" way to approach this problem.
To select only rows with month 11
, use the dt
accessor:
# df['Date'] = pd.to_datetime(df['Date']) -- if column is not datetime yet
df = df[df['Date'].dt.month == 11]
Same works for days or years, where you can substitute dt.month
with dt.day
or dt.year
Besides that, there are many more, here are a few:
dt.quarter
dt.week
dt.weekday
dt.day_name
dt.is_month_end
dt.is_month_start
dt.is_year_end
dt.is_year_start
For a complete list see the documentation
Solution 2:
Map an anonymous function to calculate the month on to the series and compare it to 11 for nov. That will give you a boolean mask. You can then use that mask to filter your dataframe.
nov_mask = df['Dates'].map(lambda x: x.month) == 11
df[nov_mask]
I don't think there is straight forward way to filter the way you want ignoring the year so try this.
nov_mar_series = pd.Series(pd.date_range("2013-11-15", "2014-03-15"))
#create timestamp without year
nov_mar_no_year = nov_mar_series.map(lambda x: x.strftime("%m-%d"))
#add a yearless timestamp to the dataframe
df["no_year"] = df['Date'].map(lambda x: x.strftime("%m-%d"))
no_year_mask = df['no_year'].isin(nov_mar_no_year)
df[no_year_mask]
Solution 3:
In your code there are two issues. First, need to bring column reference after the filtering condition. Second, can either use ".month" with a column or index, but not both. One of the following should work:
df[df.index.month == 11]['Dates']
df[df['Dates'].month == 11]['Dates']