Worst Case Analysis for Regular Expressions
Are there any tools that will take a particular regular expression and return the worst case scenario in terms of the number of operations required for a certain number of characters that the regular expression is matched against?
So for example, given a (f|a)oo.*[ ]baz
, how many steps might the engine possibly go though through to match 100 characters?
I would also be interested if there is a tool that can take a bunch of text samples and show the average operations for each run.
I realize this will depend a lot on the engine used and the implementation -- but I am ignorant as to how common this is. So if it is common for many languages (making my question too vague) I would be particularly interested in Perl and Python.
Regexbuddy's debugger shows how many steps engine would take to conclude match or not on a given sample. More information on catastrophic backtracking and debugging regular expressions.
PS: It is not free but they offer a 3-month money-back guarantee.
Note that it depends on the engine. While regex theory is based on straight automata theory, most of the engines are not strict translations of those theories. For this reason, for instance, some engines incur in exponential time while strict NFA processing would not.
You might get what you're looking for something like using re.compile
with re.DEBUG
. See this excellent answer from the Python Hidden Features Community wiki for an extensive explanation.