Parsing dates of the format "January 10th, 2010" in Java? (with ordinal indicators, st|nd|rd|th)

I need to parse the dates of the format "January 10th, 2010" in Java. How can I do this?

How to handle the ordinal indicators, the st, nd, rd, or th trailing the day number?


This works:

String s = "January 10th, 2010";
DateFormat dateFormat = new SimpleDateFormat("MMM dd yyyy");
System.out.println("" + dateFormat.parse(s.replaceAll("(?:st|nd|rd|th),", "")));

but you need to make sure you are using the right Locale to properly parse the month name.

I know you can include general texts inside the SimpleDateFormat pattern. However in this case the text is dependent on the info and is actually not relevant to the parsing process.

This is actually the simplest solution I can think of. But I would love to be shown wrong.

You can avoid the pitfalls exposed in one of the comments by doing something similar to this:

String s = "January 10th, 2010";
DateFormat dateFormat = new SimpleDateFormat("MMM dd yyyy");
System.out.println("" + dateFormat.parse(s.replaceAll("(?<= \\d+)(?:st|nd|rd|th),(?= \\d+$)", "")));

This will allow you to not match Jath,uary 10 2010 for example.


I should like to contribute the modern answer. Rather than the SimpleDateFormat class used in the two top-voted answer today you should use java.time, the modern Java date and time API. It offers a couple of nice solutions.

Easy solution

We first define a formatter for parsing:

private static final DateTimeFormatter PARSING_FORMATTER = DateTimeFormatter.ofPattern(
        "MMMM d['st']['nd']['rd']['th'], uuuu", Locale.ENGLISH);

Then we use it like this:

    String dateString = "January 10th, 2010";
    LocalDate date = LocalDate.parse(dateString, PARSING_FORMATTER);
    System.out.println("Parsed date: " + date);

Output is:

Parsed date: 2010-01-10

The square brackets [] in the format pattern string enclose optional parts, and the single quotes enclose literal text. So d['st']['nd']['rd']['th'] means that there may be st, nd, rd and/or th after the day of month.

More solid solution

A couple of limitations with the approach above are

  1. It accepts any ordinal indicator, for example 10st and even 10stndrdth.
  2. While the formatter works for parsing, you cannot use it for formatting (it would give January 10stndrdth, 2010).

If you want better validation of the ordinal indicator or you want the possibility of formatting the date back into a string, you may build your formatter in this way:

private static final DateTimeFormatter FORMATTING_AND_PARSING_FORMATTER;
static {
    Map<Long, String> ordinalNumbers = new HashMap<>(42);
    ordinalNumbers.put(1L, "1st");
    ordinalNumbers.put(2L, "2nd");
    ordinalNumbers.put(3L, "3rd");
    ordinalNumbers.put(21L, "21st");
    ordinalNumbers.put(22L, "22nd");
    ordinalNumbers.put(23L, "23rd");
    ordinalNumbers.put(31L, "31st");
    for (long d = 1; d <= 31; d++) {
        ordinalNumbers.putIfAbsent(d, "" + d + "th");
    }

    FORMATTING_AND_PARSING_FORMATTER = new DateTimeFormatterBuilder()
            .appendPattern("MMMM ")
            .appendText(ChronoField.DAY_OF_MONTH, ordinalNumbers)
            .appendPattern(", uuuu")
            .toFormatter(Locale.ENGLISH);
}

This will parse the date string the same as the one above. Let’s also try it for formatting:

    System.out.println("Formatted back using the same formatter: "
            + date.format(FORMATTING_AND_PARSING_FORMATTER));

Formatted back using the same formatter: January 10th, 2010

Links

  • Oracle tutorial: Date Time explaining how to use java.time.
  • My answer to a question about formatting ordinal indicators from which I took the more solid formatter.

You can set nd etc as literals in a SimpleDateFormat. You can define the four needed format and try them. Starting with th first, because I guess this will occur more often. If it fails with ParseException, try the next one. If all fail, throw the ParseException. The code here is just a concept. In real-life you may would not generate the formats new everytime and may think about thread-safety.

public static Date hoolaHoop(final String dateText) throws ParseException
        {
        ParseException pe=null;
        String[] sss={"th","nd","rd","st"};
        for (String special:sss)
        {
        SimpleDateFormat sdf=new SimpleDateFormat("MMMM d'"+special+",' yyyy");
        
        try{
        return sdf.parse(dateText);
        }
        catch (ParseException e)
        {
        // remember for throwing later 
        pe=e;
        }
        }
        throw pe;
        }
        public static void main (String[] args) throws java.lang.Exception
        {
         String[] dateText={"January 10th, 2010","January 1st, 2010","January 2nd, 2010",""};
         for (String dt:dateText) {System.out.println(hoolaHoop(dt))};
        }

Output:

Sun Jan 10 00:00:00 GMT 2010

Fri Jan 01 00:00:00 GMT 2010

Sat Jan 02 00:00:00 GMT 2010

Exception in thread "main" java.text.ParseException: Unparseable date: ""

"th","nd","rd","st" is of course only suitable for Locales with english language. Keep that in mind. In france, "re","nd" etc I guess.