Regex works alone but not when used in groups
The main problem is when I try to create a regex compounded by different groups I can get the values for each group.
I have a string like this:
some text here en Value1 (VALUE2)
I want to create a regex to get three substrings:
-
some text here
. Is everything that is between the start of the string and the worden
. -
Value1
. Is the text betweenen
and the first(
. -
Value2
. Is the text between()
.
I've searched information about groups and I've read:
They are created by placing the characters to be grouped inside a set of parentheses (, ).
Anything you have in parentheses () will be a capture group.
We can specify as many groups as we wish. Each sub-pattern inside a pair of parentheses will be captured as a group.
So I assume I need a regex with three groups (i.e. three ()
where every bracket contains the regex for each case).
So my idea was: Look for a every regex you need and add together with ()
to create the groups. So I've used these regex for the groups:
- First group:
.+?(?= en)
example - Second group:
(?<=en\s).*?(?=\s\()
example - Third group:
(?<=\().+?(?=\))
example
But when I try all regex together:
(.+?(?= en))((?<=en\s).*?(?=\s\())((?<=\().+?(?=\)))
It not get the groups. Actually it nof found any group (example).
I don't know if I'm missing something about the use of groups in regex but I can't get the way to do it.
Also, my python code is:
import re
str = "some text here en Value1 (VALUE2)"
result = re.search(r"(.+?(?= en))((?<=en\s).*?(?=\s\())((?<=\().+?(?=\)))", str)
print(result.groups())
And yes, I know I can iterate over the string and get values, but I want to do it with regex
if it is possible.
Thanks in advance.
You can use
^(.*?)\s+en\s+(.*?)\s+\(([^()]*)\)
If VALUE1
is always a chunk of non-whitespace chars you can use
^(.*?)\s+en\s+(\S+)\s+\(([^()]*)\)
# ^^^
If the pattern must match the whole string:
^(.*?)\s+en\s+(\S+)\s+\(([^()]*)\)$
See the regex demo.
See the Python demo:
import re
text = "some text here en Value1 (VALUE2)"
result = re.search(r"^(.*?)\s+en\s+(.*?)\s+\(([^()]*)\)", text)
print(result.groups())
# => ('some text here', 'Value1', 'VALUE2')
Details:
-
^
- start of string -
(.*?)
- Group 1: any zero or more chars other than line break chars as few as possible -
\s+en\s+
-en
enclosed with one or more whitespaces -
(.*?)
- Group 2: any zero or more chars other than line break chars as few as possible -
\s+
- one or more whitespaces -
\(
- a(
char -
([^()]*)
- Group 3: any zero or more chars other than(
and)
(replace with.*
if there can be any chars and you want to match till the rightmost occurrence of)
) -
\)
- a)
char.