Convert Excel style date with pandas
OK I think the easiest thing is to construct a TimedeltaIndex
from the floats and add this to the scalar datetime for 1900,1,1
:
In [85]:
import datetime as dt
import pandas as pd
df = pd.DataFrame({'date':[42580.3333333333, 10023]})
df
Out[85]:
date
0 42580.333333
1 10023.000000
In [86]:
df['real_date'] = pd.TimedeltaIndex(df['date'], unit='d') + dt.datetime(1900,1,1)
df
Out[86]:
date real_date
0 42580.333333 2016-07-31 07:59:59.971200
1 10023.000000 1927-06-12 00:00:00.000000
OK it seems that excel is a bit weird with it's dates thanks @ayhan:
In [89]:
df['real_date'] = pd.TimedeltaIndex(df['date'], unit='d') + dt.datetime(1899, 12, 30)
df
Out[89]:
date real_date
0 42580.333333 2016-07-29 07:59:59.971200
1 10023.000000 1927-06-10 00:00:00.000000
See related: How to convert a python datetime.datetime to excel serial date number
You can use the 3rd party xlrd
library before passing to pd.to_datetime
:
import xlrd
def read_date(date):
return xlrd.xldate.xldate_as_datetime(date, 0)
df = pd.DataFrame({'date':[42580.3333333333, 10023]})
df['new'] = pd.to_datetime(df['date'].apply(read_date), errors='coerce')
print(df)
date new
0 42580.333333 2016-07-29 08:00:00
1 10023.000000 1927-06-10 00:00:00
you can directly parse with pd.to_datetime
, with keywords unit='D'
and origin='1899-12-30'
:
import pandas as pd
df = pd.DataFrame({'xldate': [42580.3333333333]})
df['date'] = pd.to_datetime(df['xldate'], unit='D', origin='1899-12-30')
df['date']
Out[2]:
0 2016-07-29 07:59:59.999971200
Name: date, dtype: datetime64[ns]
further reading:
- What is story behind December 30, 1899 as base date?
- an answer from Martijn Pieters how to handle excel ordinal value < 60 correctly