Python re infinite execution
Solution 1:
Your regex runs into catastrophic backtracking because you have nested quantifiers (([...]+)*
). Since your regex requires the string to end in /
(which fails on your example), the regex engine tries all permutations of the string in the vain hope to find a matching combination. That's where it gets stuck.
To illustrate, let's assume "A*BCD"
as the input to your regex and see what happens:
-
(\w+)
matchesA
. Good. -
\*
matches*
. Yay. -
[\w\s]+
matchesBCD
. OK. -
/
fails to match (no characters left to match). OK, let's back up one character. -
/
fails to matchD
. Hum. Let's back up some more. -
[\w\s]+
matchesBC
, and the repeated[\w\s]+
matchesD
. -
/
fails to match. Back up. -
/
fails to matchD
. Back up some more. -
[\w\s]+
matchesB
, and the repeated[\w\s]+
matchesCD
. -
/
fails to match. Back up again. -
/
fails to matchD
. Back up some more, again. - How about
[\w\s]+
matchesB
, repeated[\w\s]+
matchesC
, repeated[\w\s]+
matchesD
? No? Let's try something else. -
[\w\s]+
matchesBC
. Let's stop here and see what happens. - Darn,
/
still doesn't matchD
. -
[\w\s]+
matchesB
. - Still no luck.
/
doesn't matchC
. - Hey, the whole group is optional
(...)*
. - Nope,
/
still doesn't matchB
. - OK, I give up.
Now that was a string of just three letters. Yours had about 30, trying all permutations of which would keep your computer busy until the end of days.
I suppose what you're trying to do is to get the strings before/after *
, in which case, use
pattern = r"(\w+)\*([\w\s]+)$"
Solution 2:
Interestingly, Perl runs it very quickly
-> perl -e 'print "Match\n" if "COPRO*HORIZON 2000 HOR" =~ m|(\w+)\*([\w\s]+)*/$|'
-> perl -e 'print "Match\n" if "COPRO*HORIZON 2000 HOR/" =~ m|(\w+)\*([\w\s]+)*/$|'
Match
Solution 3:
Try re2 or any other regular expression engine base on automata theory. The one in a current python re module is a simple and slow backtracking engine (for now, things may change in future). But automata based engines have some restriction, it wouldn't allow you to use backreferences for example. Collate with this re2 syntax page to find out will it satisfy your needs or not.
Solution 4:
Looks like it might be something in your pattern. I'm not sure what you are trying to do with the last '*' in your expression. The following code seems to work for me:
import re
pattern = r"(\w+)\*([\w\s]+)$"
re_compiled = re.compile(pattern)
results = re_compiled.search('COPRO*HORIZON 2000 HOR')
print(results.groups())