Check if string has date, any format
The parse
function in dateutils.parser
is capable of parsing many date string formats to a datetime
object.
If you simply want to know whether a particular string could represent or contain a valid date, you could try the following simple function:
from dateutil.parser import parse
def is_date(string, fuzzy=False):
"""
Return whether the string can be interpreted as a date.
:param string: str, string to check for date
:param fuzzy: bool, ignore unknown tokens in string if True
"""
try:
parse(string, fuzzy=fuzzy)
return True
except ValueError:
return False
Then you have:
>>> is_date("1990-12-1")
True
>>> is_date("2005/3")
True
>>> is_date("Jan 19, 1990")
True
>>> is_date("today is 2019-03-27")
False
>>> is_date("today is 2019-03-27", fuzzy=True)
True
>>> is_date("Monday at 12:01am")
True
>>> is_date("xyz_not_a_date")
False
>>> is_date("yesterday")
False
Custom parsing
parse
might recognise some strings as dates which you don't want to treat as dates. For example:
Parsing
"12"
and"1999"
will return a datetime object representing the current date with the day and year substituted for the number in the string"23, 4"
and"23 4"
will be parsed asdatetime.datetime(2023, 4, 16, 0, 0)
.-
"Friday"
will return the date of the nearest Friday in the future. - Similarly
"August"
corresponds to the current date with the month changed to August.
Also parse
is not locale aware, so does not recognise months or days of the week in languages other than English.
Both of these issues can be addressed to some extent by using a custom parserinfo
class, which defines how month and day names are recognised:
from dateutil.parser import parserinfo
class CustomParserInfo(parserinfo):
# three months in Spanish for illustration
MONTHS = [("Enero", "Enero"), ("Feb", "Febrero"), ("Marzo", "Marzo")]
An instance of this class can then be used with parse
:
>>> parse("Enero 1990")
# ValueError: Unknown string format
>>> parse("Enero 1990", parserinfo=CustomParserInfo())
datetime.datetime(1990, 1, 27, 0, 0)
If you want to parse those particular formats, you can just match against a list of formats:
txt='''\
Jan 19, 1990
January 19, 1990
Jan 19,1990
01/19/1990
01/19/90
1990
Jan 1990
January1990'''
import datetime as dt
fmts = ('%Y','%b %d, %Y','%b %d, %Y','%B %d, %Y','%B %d %Y','%m/%d/%Y','%m/%d/%y','%b %Y','%B%Y','%b %d,%Y')
parsed=[]
for e in txt.splitlines():
for fmt in fmts:
try:
t = dt.datetime.strptime(e, fmt)
parsed.append((e, fmt, t))
break
except ValueError as err:
pass
# check that all the cases are handled
success={t[0] for t in parsed}
for e in txt.splitlines():
if e not in success:
print e
for t in parsed:
print '"{:20}" => "{:20}" => {}'.format(*t)
Prints:
"Jan 19, 1990 " => "%b %d, %Y " => 1990-01-19 00:00:00
"January 19, 1990 " => "%B %d, %Y " => 1990-01-19 00:00:00
"Jan 19,1990 " => "%b %d,%Y " => 1990-01-19 00:00:00
"01/19/1990 " => "%m/%d/%Y " => 1990-01-19 00:00:00
"01/19/90 " => "%m/%d/%y " => 1990-01-19 00:00:00
"1990 " => "%Y " => 1990-01-01 00:00:00
"Jan 1990 " => "%b %Y " => 1990-01-01 00:00:00
"January1990 " => "%B%Y " => 1990-01-01 00:00:00