Regex works alone but not when used in groups

The main problem is when I try to create a regex compounded by different groups I can get the values for each group.

I have a string like this:

some text here en Value1 (VALUE2)

I want to create a regex to get three substrings:

  1. some text here. Is everything that is between the start of the string and the word en.
  2. Value1. Is the text between en and the first (.
  3. Value2. Is the text between ().

I've searched information about groups and I've read:

They are created by placing the characters to be grouped inside a set of parentheses (, ).

Anything you have in parentheses () will be a capture group.

We can specify as many groups as we wish. Each sub-pattern inside a pair of parentheses will be captured as a group.

So I assume I need a regex with three groups (i.e. three () where every bracket contains the regex for each case).

So my idea was: Look for a every regex you need and add together with () to create the groups. So I've used these regex for the groups:

  1. First group: .+?(?= en) example
  2. Second group: (?<=en\s).*?(?=\s\() example
  3. Third group: (?<=\().+?(?=\)) example

But when I try all regex together:

(.+?(?= en))((?<=en\s).*?(?=\s\())((?<=\().+?(?=\)))

It not get the groups. Actually it nof found any group (example).

I don't know if I'm missing something about the use of groups in regex but I can't get the way to do it.

Also, my python code is:

import re

str = "some text here en Value1 (VALUE2)"
result = re.search(r"(.+?(?= en))((?<=en\s).*?(?=\s\())((?<=\().+?(?=\)))", str)

print(result.groups())

And yes, I know I can iterate over the string and get values, but I want to do it with regex if it is possible.

Thanks in advance.


You can use

^(.*?)\s+en\s+(.*?)\s+\(([^()]*)\)

If VALUE1 is always a chunk of non-whitespace chars you can use

^(.*?)\s+en\s+(\S+)\s+\(([^()]*)\)
#              ^^^

If the pattern must match the whole string:

^(.*?)\s+en\s+(\S+)\s+\(([^()]*)\)$

See the regex demo.

See the Python demo:

import re
text = "some text here en Value1 (VALUE2)"
result = re.search(r"^(.*?)\s+en\s+(.*?)\s+\(([^()]*)\)", text)
print(result.groups())
# => ('some text here', 'Value1', 'VALUE2')

Details:

  • ^ - start of string
  • (.*?) - Group 1: any zero or more chars other than line break chars as few as possible
  • \s+en\s+ - en enclosed with one or more whitespaces
  • (.*?) - Group 2: any zero or more chars other than line break chars as few as possible
  • \s+ - one or more whitespaces
  • \( - a ( char
  • ([^()]*) - Group 3: any zero or more chars other than ( and ) (replace with .* if there can be any chars and you want to match till the rightmost occurrence of ))
  • \) - a ) char.