Apostrophes and regular expressions; Cleaning text in R

You can use

gsub("(?i)\\b(?<!')(?![AOI])\\p{L}\\b", "", x, perl=TRUE)

Details:

  • (?i) - case insensitive matching on
  • \b - a word boundary
  • (?<!') - no ' is allowed immediately on the left
  • (?![AOI]) - the next char cannot be A, I, or O
  • \p{L} - any Unicod letter
  • \b - a word boundary