Keep only date part when using pandas.to_datetime
Solution 1:
Since version 0.15.0
this can now be easily done using .dt
to access just the date component:
df['just_date'] = df['dates'].dt.date
The above returns a datetime.date
dtype, if you want to have a datetime64
then you can just normalize
the time component to midnight so it sets all the values to 00:00:00
:
df['normalised_date'] = df['dates'].dt.normalize()
This keeps the dtype
as datetime64
, but the display shows just the date
value.
- pandas:
.dt
accessor pandas.Series.dt
Solution 2:
Simple Solution:
df['date_only'] = df['date_time_column'].dt.date
Solution 3:
While I upvoted EdChum's answer, which is the most direct answer to the question the OP posed, it does not really solve the performance problem (it still relies on python datetime
objects, and hence any operation on them will be not vectorized - that is, it will be slow).
A better performing alternative is to use df['dates'].dt.floor('d')
. Strictly speaking, it does not "keep only date part", since it just sets the time to 00:00:00
. But it does work as desired by the OP when, for instance:
- printing to screen
- saving to csv
- using the column to
groupby
... and it is much more efficient, since the operation is vectorized.
EDIT: in fact, the answer the OP's would have preferred is probably "recent versions of pandas
do not write the time to csv if it is 00:00:00
for all observations".
Solution 4:
Pandas v0.13+: Use to_csv
with date_format
parameter
Avoid, where possible, converting your datetime64[ns]
series to an object
dtype series of datetime.date
objects. The latter, often constructed using pd.Series.dt.date
, is stored as an array of pointers and is inefficient relative to a pure NumPy-based series.
Since your concern is format when writing to CSV, just use the date_format
parameter of to_csv
. For example:
df.to_csv(filename, date_format='%Y-%m-%d')
See Python's strftime
directives for formatting conventions.
Solution 5:
Pandas DatetimeIndex
and Series
have a method called normalize
that does exactly what you want.
You can read more about it in this answer.
It can be used as ser.dt.normalize()