Regex: Exclude matches containing specific strings

Solution 1:

You can match field( and the digits between the square brackets before the closing parenthesis using a negated character class starting with [^

The same negated character class approach can also be taken for asserting not TableRelation between curly braces.

Not that you can write (?:C|c) as [Cc] using a character class instead of using an alternation |

Assuming the curly brace after field has a single non nested closing curly:

field\([^()]+; ?[Cc]ode\[\d+\]\)\s*{(?![^{}]*[Tt]able[Rr]elation)[^{}]*}

The pattern matches:

  • field\([^()]+ Match field( and 1+ chars other than ( ) (which can also match a newline)
  • ; ?[Cc]ode Match ; optional space and Code/code
  • \[\d+\]\) Match [ 1+ digits ])
  • \s*{ Match optional whitespace chars (which can also match a newline) and {
  • (?![^{}]*[Tt]able[Rr]elation) Negative lookahead, assert not TableRelation after the openin curly
  • [^{}]* Match optional repetitions of any character except { }
  • } Match closing }

See a regex demo.

Solution 2:

With a caseless research and with a regex engine that allows atomic groups and possessive quantifiers, you can write:

\bfield\((?>[^);]*;\s*)*code\b[^)]*\)\s*{(?>[^\w}]*+(?!tablerelation\s*=)\w+)*[^\w}]*}

demo

This pattern is based on negative character classes to stop the greedy quantifiers as in The four birds answer. Atomic groups (?>...) and possessive quantifiers *+ are used to reduce the backtracking. In particular, the presence of tablerelation is only tested after a range of non-word characters with a negative lookahead. Note that the code part can be everywhere between the parenthesis after field.

Solution 3:

The following regex can capture the Code[...] value of areas not having 'TableRelation'.

/field\([^)]+; Code\[(\d+)\]\)\n\s+{((?!TableRelation).)+?}\n/gs

It uses g(global) and s(dotall) flags.

A notable part of this regexp is the ((?!TableRelation).)+? expression.

  • (?!TableRelation) : negative lookahead(should not appear)
  • ((?!TableRelation).)+? : not having 'TableRelation', match as few as possible

I created a simple JS snippet. The code uses two steps to extract.

const regexp = /field\([^)]+; Code\[(\d+)\]\)\n\s+{((?!TableRelation).)+?}\n/gs;

const target = `
table 123 "MyTable"
{
    fields
    {
        field(1000; "Created on"; Date)
        {
            Caption = 'Created on';
            DataClassification = CustomerContent;
            Editable = false;
        }
        field(2000; "Created by"; Code[50])
        {
            Caption = 'Created by';
            TableRelation = User."User Name";
            DataClassification = CustomerContent;
            Editable = false;
        }
        field(3000; Resigned; Boolean)
        {
            Caption = 'Resigned';
            DataClassification = CustomerContent;
        }
        field(4000; "Holding No."; Code[20])
        {
            Caption = 'Holding No.';
            TableRelation = Contact."No." where(Type = const(Company));
            DataClassification = CustomerContent;

            trigger OnValidate()
            var
               [...]
            begin
               [...]
            end;

        }
        field(4010; "Holding Name"; Code[100])
        {
            Caption = 'Holding Name';
            DataClassification = CustomerContent;
        }
        field(4050; "Holding Name"; Code[80])
        {
            Caption = 'Holding Name 2';
            DataClassification = CustomerContent;
        }
        field(5000; "Geocoding Entry No."; Integer)
        {
            Caption = 'Geocoding Entry No.';
            DataClassification = CustomerContent;
        }
    }
    keys
    {
        key(AppliesToContact; "Holding No.", "Holding Name", "Company Level") { }
    }
}
`;

// step 1: extract field(...){...} chunks that do not contain "TableRelation"
const matchedBlocks = target.match(regexp);

// step 2: extract code values
const codes = matchedBlocks.map(m => m.match(/; Code\[(\d+)\]/)[1] );
console.log(codes);