How can I remove leading zeros for two digits number (01, 02, etc.) from the middle of character string using R?

For the following string vector s, I hope to remove leading zeros in each elements, which is reverse of the answer from this link:

s <- c('week 01st', 'weeks 02nd', 'year2022week01st', 'week 4th')

The expected result will like:

s <- c('week 1st', 'weeks 2nd', 'year2022week1st', 'week 4th')

I test the following code, it's not working out since the regex syntax is not complete:

s <- 'week 01st'
sub('^0+(?=[1-9])', '', s, perl=TRUE)
sub('^0+([1-9])', '\\1', s)

Out:

[1] "week 01st"

How could I do that using R?

Update: for the following code contributed by @dvantwisk, it works for year2022week01st, but not suitable to other elements:

s <- c('week 01st', 'weeks 02nd', 'year2022week01st', 'week 4th')
gsub('(year[0-9]{4,})(week)(0{0,})([1-9]{1})([0-9a-zA-Z]{1,})', '\\1\\2\\4\\5', s)

Out:

[1] "week 01st"       "weeks 02nd"      "year2022week1st" "week 4th"

Solution 1:

gsub('(week )(0{0,})([1-9]{1})([0-9a-zA-Z]{1,})', '\\1\\3\\4', week_string)

gsub() takes three arguments as input: a pattern, a replacement, and a query character vector. Our strategy is to create a regular expression with four groups with ()s.

We fist match 'week '.

We then match zero or more zeros with the expression (0{0,}). The first zero indicates the character we are trying to match and the expression {0,} indicates we are trying to match zero (hence the 0) or more (hence the comma) times.

Our third group is matching any number between 1 to 9 one time.

Out fourth group is to match any number between 0 to 9 or any letter 1 or more times

Our replacement is '\\1\\3\\4'. This indicates we only want to keep group one and three in our result. Thus the output is:

[1] "week 1st" "week 2nd" "week 3rd" "week 4th"

Solution 2:

You might use:

weeks?\h*\K0+(?=[1-9]\d*[a-zA-Z])

The pattern matches:

  • weeks? Match week with optional s
  • \h*\K Match optional spaces and forget what is matched so far
  • 0+ Match 1+ times a zero
  • (?=[1-9]\d*[a-zA-Z]) Positive lookahead, assert a char 1-9, optional digit and a char a-zA-Z to the right

See a Regex demo and a R demo.

In the replacement use an empty string.

For example

s <- c('week 01st', 'weeks 02nd', 'year2022week01st', 'week 4th')
gsub("weeks?\\h*\\K0+(?=[1-9]\\d*[a-zA-Z])", '', s, perl=T)

Output

[1] "week 1st"        "weeks 2nd"       "year2022week1st" "week 4th"     

Or with 2 capture groups:

(weeks?\h*)0+([1-9]\d*[a-zA-Z])

Example:

s <- c('week 01st', 'weeks 02nd', 'year2022week01st', 'week 4th')
gsub("(weeks?\\h*)0+([1-9]\\d*[a-zA-Z])", '\\1\\2', s,)

Output

[1] "week 01st"       "weeks 02nd"      "year2022week1st" "week 4th"