Get regex pattern to match only digits with at least 2 decimal points python

I am trying to use a lazy regex pattern in python to grab just the first number after a specified word, in this case Non-GAAP. However I only want figures that have at least 2 or more decimal places.

Here is my string:

s = 'Non-GAAP-2 net income of  with EPS of 1.21, up 23% from the fourth quarter of 2020.'

and my pattern is:

\bNon.*GAAP\b.*?\b(\d+(?:\.\d+)?)\b

This matches the number 2 after Non-GAAP when in fact I want the number 1.21.

How do I fix this pattern would you be able to explain the logic?

Thanks.

EDIT

If i want to edit this so I can choose any word to enter into the specified string, how would I change this as using r literal string fails and so does foramtted string because of the {2,}.

e.g.

s = f'\b{adjusted}\b.*?\b(\d+\.\d\{2,\})\b'

I have tried to backspace these characters but this also fails.


Solution 1:

You'd probably need:

\bNon-GAAP\b.*?\b(\d+\.\d{2,})\b

See an online demo


  • \bNon-GAAP\b - Your literal string 'Non-GAAP' inbetween word-boundaries;
  • .*? - 0+ (Lazy) characters other than newline;
  • \b(\d+\.\d{2,})\b - A capture group for 1+ digits followed by a literal dot and at least two more digits, inbetween word-boundaries.

Use this with re.findall()

import re
s = 'Non-GAAP-2 net income of  with EPS of 1.21, up 23% from the fourth quarter of 2020.'
print(float(re.findall(r'\bNon-GAAP\b.*?\b(\d+\.\d{2,})\b', s)[0]))

Prints:

1.21

EDIT:

Combining a variable with f-strings:

import re
s = 'Non-GAAP-2 net income of  with EPS of 1.21, up 23% from the fourth quarter of 2020.'
adjusted = 'Non-GAAP'
print(float(re.findall(fr'\b{adjusted}\b.*?\b(\d+\.\d{{2,}})\b', s)[0]))

Solution 2:

Your original regex is almost right, only the part which matches the decimals should be updated a bit:

\bNon.*GAAP\b.*?\b(\d+\.\d{2})\b
  • Non.*GAAP: the original capturing group
  • .*? 0+ characters
  • (\d+\.\d{2}) matches 1+ decimals, a literal dot and then exactly 2 decimals

See a demo here.

You can also achieve the same result with a non-capturing group:

(?:Non-GAAP.*)(\d+\.\d{2})
  • (?:Non-GAAP.*): non-capturing group, the literal string 'Non-GAAP' and 0+ characters won't be included
  • (\d+\.\d{2,}): capturing group to capture 1+ decimals, a literal dot and then exactly 2 decimals

See a demo here.


UPDATE: for the updated question

To make the search string variable, you can just build the regex as you would a string:

import re;

s = 'Non-GAAP-2 net income of  with EPS of 1.21, up 23% from the fourth quarter of 2020.';

search = 'Non-GAAP';

regex = r"(?:" + search + ".*)(\d+\.\d{2})";

print(float(re.findall(regex, s)[0]));

See a demo repl.it here.