Pandas - convert strings to time without date
I've read loads of SO answers but can't find a clear solution.
I have this data in a df called day1
which represents hours:
1 10:53
2 12:17
3 14:46
4 16:36
5 18:39
6 20:31
7 22:28
Name: time, dtype: object>
I want to convert it into a time format. But when I do this:
day1.time = pd.to_datetime(day1.time, format='H%:M%')
The result includes today's date:
1 2015-09-03 10:53:00
2 2015-09-03 12:17:00
3 2015-09-03 14:46:00
4 2015-09-03 16:36:00
5 2015-09-03 18:39:00
6 2015-09-03 20:31:00
7 2015-09-03 22:28:00
Name: time, dtype: datetime64[ns]>
It seems the format
argument isn't working - how do I get the time as shown here without the date?
Update
The following formats the time correctly, but somehow the column is still an object type. Why doesn't it convert to datetime64
?
day1['time'] = pd.to_datetime(day1['time'], format='%H:%M').dt.time
1 10:53:00
2 12:17:00
3 14:46:00
4 16:36:00
5 18:39:00
6 20:31:00
7 22:28:00
Name: time, dtype: object>
After performing the conversion you can use the datetime accessor dt
to access just the hour
or time
component:
In [51]:
df['hour'] = pd.to_datetime(df['time'], format='%H:%M').dt.hour
df
Out[51]:
time hour
index
1 10:53 10
2 12:17 12
3 14:46 14
4 16:36 16
5 18:39 18
6 20:31 20
7 22:28 22
Also your format string H%:M%
is malformed, it's likely to raise a ValueError: ':' is a bad directive in format 'H%:M%'
Regarding your last comment the dtype is datetime.time
not datetime
:
In [53]:
df['time'].iloc[0]
Out[53]:
datetime.time(10, 53)
You can use to_timedelta
pd.to_timedelta(df+':00')
Out[353]:
1 10:53:00
2 12:17:00
3 14:46:00
4 16:36:00
5 18:39:00
6 20:31:00
7 22:28:00
Name: Time, dtype: timedelta64[ns]
I recently also struggled with this problem. My method is close to EdChum's method and the result is the same as YOBEN_S's answer.
Just like EdChum illustrated, using dt.hour
or dt.time
will give you a datetime.time object, which is probably only good for display. I can barely do any comparison or calculation on these objects. So if you need any further comparison or calculation operations on the result columns, it's better to avoid such data formats.
My method is just subtract the date from the to_datetime
result:
c = pd.Series(['10:23', '12:17', '14:46'])
pd.to_datetime(c, format='%H:%M') - pd.to_datetime(c, format='%H:%M').dt.normalize()
The result is
0 10:23:00
1 12:17:00
2 14:46:00
dtype: timedelta64[ns]
dt.normalize()
basically sets all time component to 00:00:00, and it will only display the date while keeping the datetime64
data format, thereby making it possible to do calculations with it.
My answer is by no means better than the other two. I just want to provide a different approach and hope it helps.