Python Pandas: Group datetime column into hour and minute aggregations

This seems like it would be fairly straight forward but after nearly an entire day I have not found the solution. I've loaded my dataframe with read_csv and easily parsed, combined and indexed a date and a time column into one column but now I want to be able to just reshape and perform calculations based on hour and minute groupings similar to what you can do in excel pivot.

I know how to resample to hour or minute but it maintains the date portion associated with each hour/minute whereas I want to aggregate the data set ONLY to hour and minute similar to grouping in excel pivots and selecting "hour" and "minute" but not selecting anything else.

Any help would be greatly appreciated.


Solution 1:

Can't you do, where df is your DataFrame:

times = pd.to_datetime(df.timestamp_col)
df.groupby([times.dt.hour, times.dt.minute]).value_col.sum()

Solution 2:

Wes' code didn't work for me. But the DatetimeIndex function (docs) did:

times = pd.DatetimeIndex(data.datetime_col)
grouped = df.groupby([times.hour, times.minute])

The DatetimeIndex object is a representation of times in pandas. The first line creates a array of the datetimes. The second line uses this array to get the hour and minute data for all of the rows, allowing the data to be grouped (docs) by these values.

Solution 3:

Came across this when I was searching for this type of groupby. Wes' code above didn't work for me, not sure if it's because changes in pandas over time.

In pandas 0.16.2, what I did in the end was:

grp = data.groupby(by=[data.datetime_col.map(lambda x : (x.hour, x.minute))])
grp.count()

You'd have (hour, minute) tuples as the grouped index. If you want multi-index:

grp = data.groupby(by=[data.datetime_col.map(lambda x : x.hour),
                       data.datetime_col.map(lambda x : x.minute)])

Solution 4:

I have an alternative of Wes & Nix answers above, with just one line of code, assuming your column is already a datetime column, you don't need to get the hour and minute attributes separately:

df.groupby(df.timestamp_col.dt.time).value_col.sum()