What is the easiest way to do PCRE-style regexp search/replace for MS Word?

Solution 1:

The wildcards in Microsoft Word are bit like Regular Expressions. This article has more detail.

Standard Regular Expressions compare to Word Regular Expressions as follows:

  • . becomes ?
  • .* becomes *
  • * becomes @ - e.g. lo@t matches lot and loot
  • [] works the same in both
  • () works the same in both
  • \ escapes wildcards in both
  • \b becomes < and > for matching word boundaries

Solution 2:

You could probably write a VBA macro. Internet Explorer 5.5 shipped with a reasonably decent regex engine for use with VBscript. That same engine can also be used in VBA macros on any computer that has MS Office and IE 5.5+ installed - which should be pretty much any Windows machine by now.

To use the regex objects in VBA macros, you need to add a reference to the VBScript regex engine in the VBA editor. Load up the VBA macro editor, and select Tools->References from the menu. Find "Microsoft VBScript Regular Expressions 5.5" in the list of available references and tick it.

Then you can write macros which process the text directly in Word (like any other Word macro), using the RegExp object from the VBScript_RegEx_55 library to actually do the regex-based matching and replacements. It's not quite as easy as using a dialogue box directly, but it's not terribly difficult. If you know enough about programming to actually use regexes, I'm sure you'd be able to handle the VBA coding.

http://www.regular-expressions.info/vb.html has some info on how to actually use the RegEx objects provided in that library.