Find JSON strings in a string

Solution 1:

Extracting the JSON string from given text

Since you're looking for a simplistic solution, you can use the following regular expression that makes use of recursion to solve the problem of matching set of parentheses. It matches everything between { and } recursively.

Although, you should note that this isn't guaranteed to work with all possible cases. It only serves as a quick JSON-string extraction method.

$pattern = '
/
\{              # { character
    (?:         # non-capturing group
        [^{}]   # anything that is not a { or }
        |       # OR
        (?R)    # recurses the entire pattern
    )*          # previous group zero or more times
\}              # } character
/x
';

preg_match_all($pattern, $text, $matches);
print_r($matches[0]);

Output:

Array
(
    [0] => {"action":"product","options":{...}}
    [1] => {"action":"review","options":{...}}
)

Regex101 Demo


Validating the JSON strings

In PHP, the only way to know if a JSON-string is valid is by applying json_decode(). If the parser understands the JSON-string and is according to the defined standards, json_decode() will create an object / array representation of the JSON-string.

If you'd like to filter out those that aren't valid JSON, then you can use array_filter() with a callback function:

function isValidJSON($string) {
    json_decode($string);
    return (json_last_error() == JSON_ERROR_NONE);
}

$valid_jsons_arr = array_filter($matches[0], 'isValidJSON');

Online demo

Solution 2:

Javascript folks looking for similar regex. The (?R) which is recursive regex pattern is not supported by javascript, python, and other languages as such.

Note: It's not 1 on 1 replacement.

 \{(?:[^{}]|(?R))*\} # PCRE Supported Regex

Steps:

  1. Copy the whole regex and replace ?R which copied string example
  • level 1 json => \{(?:[^{}]|(?R))*\} => \{(?:[^{}]|())*\}
  • level 2 json => \{(?:[^{}]|(\{(?:[^{}]|(?R))*\}))*\} => \{(?:[^{}]|(\{(?:[^{}]|())*\}))*\}
  • level n json => \{(?:[^{}]|(?<n times>))*\}
  1. when decided to stop at some level replace ?R with blank string.

Done.

Solution 3:

I would add a * to include the nested objects:

{(?:[^{}]*|(?R))*}

Check it Demo