Use Python's string.replace vs re.sub

For Python 2.5, 2.6, should I be using string.replace or re.sub for basic text replacements?

In PHP, this was explicitly stated but I can't find a similar note for Python.


Solution 1:

As long as you can make do with str.replace(), you should use it. It avoids all the pitfalls of regular expressions (like escaping), and is generally faster.

Solution 2:

str.replace() should be used whenever it's possible to. It's more explicit, simpler, and faster.

In [1]: import re

In [2]: text = """For python 2.5, 2.6, should I be using string.replace or re.sub for basic text replacements.
In PHP, this was explicitly stated but I can't find a similar note for python.
"""

In [3]: timeit text.replace('e', 'X')
1000000 loops, best of 3: 735 ns per loop

In [4]: timeit re.sub('e', 'X', text)
100000 loops, best of 3: 5.52 us per loop

Solution 3:

String manipulation is usually preferable to regex when you can figure out how to adapt it. Regex is incredibly powerful, but it's usually slower, and usually harder to write, debug, and maintain.

That being said, notice the amount of "usually" in the above paragraph! It's possible (and I've seen it done) to write a zillion lines of string manipulation for something you could've done with a 20-character regex. It's also possible to waste valuable time using "efficient" string functions on tasks a good regex engine could do almost as fast. Then there's maintainability: Regex can be horribly complex, but sometimes a regex will be simpler and easier to read than a giant block of procedural code.

Regex is fantastic for its intended purpose: searching for highly-variable needles in highly-variable haystacks. Think of it as a precision torque wrench: It's the perfect tool for a specific set of jobs, but it makes a lousy hammer.

Some guidelines you should follow when you aren't sure what to use:

  • Is the pattern you're looking for highly static? For example, do you want to split a string on every comma, pipe, or tab?
  • Is resource efficiency more important than developer time? What are your priorities? Remember: Hardware is cheap, programmers are expensive.
  • Are you working with HTML, XML, or other context-free grammars? Don't forget that regex has limitations.
  • And my #1 rule of thumb: If you work on the problem for 5 minutes, can you rough out an idea for a non-regex approach?

If the answer to any of these questions is "yes", you probably want string manipulation. Otherwise, consider regex.