Get regex pattern to match only digits with at least 2 decimal points python
I am trying to use a lazy regex pattern in python to grab just the first number after a specified word, in this case Non-GAAP. However I only want figures that have at least 2 or more decimal places.
Here is my string:
s = 'Non-GAAP-2 net income of with EPS of 1.21, up 23% from the fourth quarter of 2020.'
and my pattern is:
\bNon.*GAAP\b.*?\b(\d+(?:\.\d+)?)\b
This matches the number 2 after Non-GAAP when in fact I want the number 1.21.
How do I fix this pattern would you be able to explain the logic?
Thanks.
EDIT
If i want to edit this so I can choose any word to enter into the specified string, how would I change this as using r
literal string fails and so does foramtted string because of the {2,}.
e.g.
s = f'\b{adjusted}\b.*?\b(\d+\.\d\{2,\})\b'
I have tried to backspace these characters but this also fails.
Solution 1:
You'd probably need:
\bNon-GAAP\b.*?\b(\d+\.\d{2,})\b
See an online demo
-
\bNon-GAAP\b
- Your literal string 'Non-GAAP' inbetween word-boundaries; -
.*?
- 0+ (Lazy) characters other than newline; -
\b(\d+\.\d{2,})\b
- A capture group for 1+ digits followed by a literal dot and at least two more digits, inbetween word-boundaries.
Use this with re.findall()
import re
s = 'Non-GAAP-2 net income of with EPS of 1.21, up 23% from the fourth quarter of 2020.'
print(float(re.findall(r'\bNon-GAAP\b.*?\b(\d+\.\d{2,})\b', s)[0]))
Prints:
1.21
EDIT:
Combining a variable with f-strings:
import re
s = 'Non-GAAP-2 net income of with EPS of 1.21, up 23% from the fourth quarter of 2020.'
adjusted = 'Non-GAAP'
print(float(re.findall(fr'\b{adjusted}\b.*?\b(\d+\.\d{{2,}})\b', s)[0]))
Solution 2:
Your original regex is almost right, only the part which matches the decimals should be updated a bit:
\bNon.*GAAP\b.*?\b(\d+\.\d{2})\b
-
Non.*GAAP
: the original capturing group -
.*?
0+ characters -
(\d+\.\d{2})
matches 1+ decimals, a literal dot and then exactly 2 decimals
See a demo here.
You can also achieve the same result with a non-capturing group:
(?:Non-GAAP.*)(\d+\.\d{2})
-
(?:Non-GAAP.*)
: non-capturing group, the literal string 'Non-GAAP' and 0+ characters won't be included -
(\d+\.\d{2,})
: capturing group to capture 1+ decimals, a literal dot and then exactly 2 decimals
See a demo here.
UPDATE: for the updated question
To make the search string variable, you can just build the regex as you would a string:
import re;
s = 'Non-GAAP-2 net income of with EPS of 1.21, up 23% from the fourth quarter of 2020.';
search = 'Non-GAAP';
regex = r"(?:" + search + ".*)(\d+\.\d{2})";
print(float(re.findall(regex, s)[0]));
See a demo repl.it here.