regex to remove ordinals

You need to use a look-behind assertion so that only st|nd|rd|th preceded by a [0-9] are matched, but the [0-9] isn't included in the match. i.e.:

(?<=[0-9])(?:st|nd|rd|th)

I've linked to the perl-compatible syntax, but if you're using posix, posix extended, vi or one of many other regex syntaxes you'll need to look up the syntax.


In perl:

$var =~ s{\b(\d+)(?:st|nd|rd|th)\b}{$1};

In PHP:

$var = preg_replace('/\\b(\d+)(?:st|nd|rd|th)\\b/', '$1', $var);

In .NET:

var = Regex.Replace(@"\b(\d+)(?:st|nd|rd|th)\b", "$1");

If you want to remove as well the numbers followed by ordinals you could use this one:

[0-9]+(?:st| st|nd| nd|rd| rd|th| th)

So for a given text: "The 3rd person is missing but the 2 nd and the 1st is here" you'll have this output: "The person is missing but the and the is here"


Try a negative lookbehind:

(?<=[0-9])(?:st|nd|rd|th)

assuming the dialect of regex supports it.