Regex match character only when NOT preceeded by specific word
The goal is to have regex match all newline character which are not preceded by a 2-decimal number. Here's some example text:
This line ends with text
this line ends with a number: 55
this line ends with a 2-decimal number: 5.00
here's 22.22, not at the end of the line
Regex should match the end of lines 1, 2, and 4 (assuming a newline after the 4th line). I thought negative lookahead was the answer so I tried
(?!\d*\.\d\d)\n
without success as seen in this regex101 snippet: https://regex101.com/r/qbrKlt/4
Edit: I later discovered the reason this didn't work is because Python's Regex doesn't support variable length negative lookahead - it only supports fixed-length negative lookahead.
Unfortunately fixed-length look-ahead still didnt work:
(?!\.\d\d)\n
Instead I did a workaround by running regex twice & subtracting the result:
- find all indices of newline characters:
\n
- find all indices of newline characters preceded by 2-decimal numbers:
\d*\.\d\d\n
- remove indices found in step 2 from those found in step 1 for the answer
But I'm sure there's a way to do this in 1 go and I'd be grateful to anyone out there that can help in discovering the solution :)
Solution 1:
You need to use a negative lookbehind instead of a negative lookahead:
(?<!\.\d\d)\n
Updated RegEx Demo
This will match \n
if that is not immediately preceded by dot and 2 digits.
Solution 2:
Why get esoteric with regexes, when you can just capture the final word using string.split()[-1] and test that for the form you need? Python isn't Perl (fortunately).