RegEx BackReference to Match Different Values

I have a regex that I use to match Expression of the form (val1 operator val2)

This regex looks like :

(\(\s*([a-zA-Z]+[0-9]*|[0-9]+|\'.*\'|\[.*\])\s*(ni|in|\*|\/|\+|\-|==|!=|>|>=|<|<=)\s*([a-zA-Z]+[0-9]*|[0-9]+|\'.*\'|\[.*\])\s*\))

Which is actually good and matches what I want as you can see here in this demo

BUT :D (here comes the butter)

I want to optimise the regex itself by making it more readable and "Compact". I searched on how to do that and I found something called back-reference, in which you can name your capturing groups and then reference them later as such:

(\(\s*(?P<Val>[a-zA-Z]+[0-9]*|[0-9]+|\'.*\'|\[.*\])\s*(ni|in|\*|\/|\+|\-|==|!=|>|>=|<|<=)\s*(\g{Val})\s*\))

where I named the group that captures the left side of the expression Val and later I referenced it as (\g{Val}), now the problem is that this expression as you can see here only case where left side of the expression is exactly the same as right side! e.g. (a==a) or (1==1) and does not match expressions such as (a==b)!

Now the question is: is there a way to reference the pattern instead of the matched value?!


Note that \g{N} is equivalent to \1, that is, a backreference that matches the same value, not the pattern, that the corresponding capturing group matched. This syntax is a bit more flexible though, since you can define the capture groups that are relative to the current group by using - before the number (i.e. \g{-2}, (\p{L})(\d)\g{-2} will match a1a).

The PCRE engine allows subroutine calls to recurse subpatterns. To repeat the pattern of Group 1, use (?1), and (?&Val) to recurse the pattern of the named group Val.

Also, you may use character classes to match single characters, and consider using ? quantifier to make parts of the regex optional:

(\(\s*(?P<Val>[a-zA-Z]+[0-9]*|[0-9]+|\'.*\'|\[.*\])\s*(ni|in|[*\/+-]|[=!><]=|[><])\s*((?&Val))\s*\))

See the regex demo

Note that \'.*\' and \[.*\] can match too much, consider replacing with \'[^\']*\' and \[[^][]*\].


What language/application are you using this regular expression in? If you have the option you can specify the different parts as named variables and then build the final regular expression by combining them.

val = "([a-zA-Z]+[0-9]*|[0-9]+|\'.*\'|\[.*\])"
op = "(ni|in|\*|\/|\+|\-|==|!=|>|>=|<|<=)"
exp = "(\(" .. val .. "\s*" .. op .. "\s*" .. val .. "\))"