Find JSON strings in a string
Solution 1:
Extracting the JSON string from given text
Since you're looking for a simplistic solution, you can use the following regular expression that makes use of recursion to solve the problem of matching set of parentheses. It matches everything between {
and }
recursively.
Although, you should note that this isn't guaranteed to work with all possible cases. It only serves as a quick JSON-string extraction method.
$pattern = '
/
\{ # { character
(?: # non-capturing group
[^{}] # anything that is not a { or }
| # OR
(?R) # recurses the entire pattern
)* # previous group zero or more times
\} # } character
/x
';
preg_match_all($pattern, $text, $matches);
print_r($matches[0]);
Output:
Array
(
[0] => {"action":"product","options":{...}}
[1] => {"action":"review","options":{...}}
)
Regex101 Demo
Validating the JSON strings
In PHP, the only way to know if a JSON-string is valid is by applying json_decode()
. If the parser understands the JSON-string and is according to the defined standards, json_decode()
will create an object / array representation of the JSON-string.
If you'd like to filter out those that aren't valid JSON, then you can use array_filter()
with a callback function:
function isValidJSON($string) {
json_decode($string);
return (json_last_error() == JSON_ERROR_NONE);
}
$valid_jsons_arr = array_filter($matches[0], 'isValidJSON');
Online demo
Solution 2:
Javascript folks looking for similar regex. The (?R) which is recursive regex pattern is not supported by javascript, python, and other languages as such.
Note: It's not 1 on 1 replacement.
\{(?:[^{}]|(?R))*\} # PCRE Supported Regex
Steps:
- Copy the whole regex and replace
?R
which copied string example
- level 1 json =>
\{(?:[^{}]|(?R))*\}
=>\{(?:[^{}]|())*\}
- level 2 json =>
\{(?:[^{}]|(\{(?:[^{}]|(?R))*\}))*\}
=>\{(?:[^{}]|(\{(?:[^{}]|())*\}))*\}
- level n json =>
\{(?:[^{}]|(?<n times>))*\}
- when decided to stop at some level replace
?R
with blank string.
Done.
Solution 3:
I would add a *
to include the nested objects:
{(?:[^{}]*|(?R))*}
Check it Demo