Regex: Exclude matches containing specific strings
Solution 1:
You can match field(
and the digits between the square brackets before the closing parenthesis using a negated character class starting with [^
The same negated character class approach can also be taken for asserting not TableRelation
between curly braces.
Not that you can write (?:C|c)
as [Cc]
using a character class instead of using an alternation |
Assuming the curly brace after field has a single non nested closing curly:
field\([^()]+; ?[Cc]ode\[\d+\]\)\s*{(?![^{}]*[Tt]able[Rr]elation)[^{}]*}
The pattern matches:
-
field\([^()]+
Matchfield(
and 1+ chars other than(
)
(which can also match a newline) -
; ?[Cc]ode
Match;
optional space and Code/code -
\[\d+\]\)
Match[
1+ digits])
-
\s*{
Match optional whitespace chars (which can also match a newline) and{
-
(?![^{}]*[Tt]able[Rr]elation)
Negative lookahead, assert notTableRelation
after the openin curly -
[^{}]*
Match optional repetitions of any character except{
}
-
}
Match closing}
See a regex demo.
Solution 2:
With a caseless research and with a regex engine that allows atomic groups and possessive quantifiers, you can write:
\bfield\((?>[^);]*;\s*)*code\b[^)]*\)\s*{(?>[^\w}]*+(?!tablerelation\s*=)\w+)*[^\w}]*}
demo
This pattern is based on negative character classes to stop the greedy quantifiers as in The four birds answer.
Atomic groups (?>...)
and possessive quantifiers *+
are used to reduce the backtracking. In particular, the presence of tablerelation
is only tested after a range of non-word characters with a negative lookahead.
Note that the code
part can be everywhere between the parenthesis after field
.
Solution 3:
The following regex can capture the Code[...]
value of areas not having 'TableRelation'.
/field\([^)]+; Code\[(\d+)\]\)\n\s+{((?!TableRelation).)+?}\n/gs
It uses g
(global) and s
(dotall) flags.
A notable part of this regexp is the ((?!TableRelation).)+?
expression.
-
(?!TableRelation)
: negative lookahead(should not appear) -
((?!TableRelation).)+?
: not having 'TableRelation', match as few as possible
I created a simple JS snippet. The code uses two steps to extract.
const regexp = /field\([^)]+; Code\[(\d+)\]\)\n\s+{((?!TableRelation).)+?}\n/gs;
const target = `
table 123 "MyTable"
{
fields
{
field(1000; "Created on"; Date)
{
Caption = 'Created on';
DataClassification = CustomerContent;
Editable = false;
}
field(2000; "Created by"; Code[50])
{
Caption = 'Created by';
TableRelation = User."User Name";
DataClassification = CustomerContent;
Editable = false;
}
field(3000; Resigned; Boolean)
{
Caption = 'Resigned';
DataClassification = CustomerContent;
}
field(4000; "Holding No."; Code[20])
{
Caption = 'Holding No.';
TableRelation = Contact."No." where(Type = const(Company));
DataClassification = CustomerContent;
trigger OnValidate()
var
[...]
begin
[...]
end;
}
field(4010; "Holding Name"; Code[100])
{
Caption = 'Holding Name';
DataClassification = CustomerContent;
}
field(4050; "Holding Name"; Code[80])
{
Caption = 'Holding Name 2';
DataClassification = CustomerContent;
}
field(5000; "Geocoding Entry No."; Integer)
{
Caption = 'Geocoding Entry No.';
DataClassification = CustomerContent;
}
}
keys
{
key(AppliesToContact; "Holding No.", "Holding Name", "Company Level") { }
}
}
`;
// step 1: extract field(...){...} chunks that do not contain "TableRelation"
const matchedBlocks = target.match(regexp);
// step 2: extract code values
const codes = matchedBlocks.map(m => m.match(/; Code\[(\d+)\]/)[1] );
console.log(codes);