Combine Date and Time columns using python pandas
It's worth mentioning that you may have been able to read this in directly e.g. if you were using read_csv
using parse_dates=[['Date', 'Time']]
.
Assuming these are just strings you could simply add them together (with a space), allowing you to use to_datetime
, which works without specifying the format=
parameter
In [11]: df['Date'] + ' ' + df['Time']
Out[11]:
0 01-06-2013 23:00:00
1 02-06-2013 01:00:00
2 02-06-2013 21:00:00
3 02-06-2013 22:00:00
4 02-06-2013 23:00:00
5 03-06-2013 01:00:00
6 03-06-2013 21:00:00
7 03-06-2013 22:00:00
8 03-06-2013 23:00:00
9 04-06-2013 01:00:00
dtype: object
In [12]: pd.to_datetime(df['Date'] + ' ' + df['Time'])
Out[12]:
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
dtype: datetime64[ns]
Alternatively, without the + ' '
, but the format=
parameter must be used. Additionally, pandas is good at inferring the format to be converted to a datetime
, however, specifying the exact format is faster.
pd.to_datetime(df['Date'] + df['Time'], format='%m-%d-%Y%H:%M:%S')
Note: surprisingly (for me), this works fine with NaNs being converted to NaT, but it is worth worrying that the conversion (perhaps using the raise
argument).
%%timeit
# sample dataframe with 10000000 rows using df from the OP
df = pd.concat([df for _ in range(1000000)]).reset_index(drop=True)
%%timeit
pd.to_datetime(df['Date'] + ' ' + df['Time'])
[result]:
1.73 s ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
pd.to_datetime(df['Date'] + df['Time'], format='%m-%d-%Y%H:%M:%S')
[result]:
1.33 s ± 9.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The accepted answer works for columns that are of datatype string
. For completeness: I come across this question when searching how to do this when the columns are of datatypes: date and time.
df.apply(lambda r : pd.datetime.combine(r['date_column_name'],r['time_column_name']),1)
Cast the columns if the types are different (datetime
and timestamp
or str
) and use to_datetime
:
df.loc[:,'Date'] = pd.to_datetime(df.Date.astype(str)+' '+df.Time.astype(str))
Result :
0 2013-01-06 23:00:00
1 2013-02-06 01:00:00
2 2013-02-06 21:00:00
3 2013-02-06 22:00:00
4 2013-02-06 23:00:00
5 2013-03-06 01:00:00
6 2013-03-06 21:00:00
7 2013-03-06 22:00:00
8 2013-03-06 23:00:00
9 2013-04-06 01:00:00
Best,