Regex for attribute value having quotes in between same as the enclosing quotes
You can convert the "
in the attribute value to "
and then it is easier to use a dom parser to get the alt values:
$text = 'advcd<img loading="lazy" class="abcd pqr" alt="chi-phi-sinh-o-benh-v"ien-dai-hoc-y-duoc-co-so-2" attr="val"><img loading="lazy" class="abcd pqr" alt="abcd-sinh-o-benh-"ien-dai-hoc-y-duoc-co-so-3">sdfs';
$dom = new DOMDocument();
$dom->loadHTML($text, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXpath($dom);
foreach($xpath->evaluate("//img/@alt") as $i) {
echo $i->nodeValue . PHP_EOL;
}
Output
chi-phi-sinh-o-benh-v"ien-dai-hoc-y-duoc-co-so-2
abcd-sinh-o-benh-"ien-dai-hoc-y-duoc-co-so-3
Using a regex for your examples strings:
-
(alt)=
Capture group 1, match alt followed by=
-
(
Capture group 2-
".*?"
match from"
and then the least amount of characters till the next"
-
(?=
Positive lookahead-
\s*
Match optional whitespace chars -
(?:[^\s=]+="|>)
Match either non whitespace chars except the=
until you match the=
and"
OR match>
-
-
)
Close lookahead
-
-
)
Close group 2
Php demo | regex demo
$text = 'advcd<img loading="lazy" class="abcd pqr" alt="chi-phi-sinh-o-benh-v"ien-dai-hoc-y-duoc-co-so-2" attr="val"><img loading="lazy" class="abcd pqr" alt="abcd-sinh-o-benh-"ien-dai-hoc-y-duoc-co-so-3">sdfs';
preg_match_all('/(alt)=(".*?"(?=\s*(?:[^\s=]+="|>)))/i', $text, $matches);
if (count($matches) > 1) {
print_r($matches);
}
Output
Array
(
[0] => Array
(
[0] => alt="chi-phi-sinh-o-benh-v"ien-dai-hoc-y-duoc-co-so-2"
[1] => alt="abcd-sinh-o-benh-"ien-dai-hoc-y-duoc-co-so-3"
)
[1] => Array
(
[0] => alt
[1] => alt
)
[2] => Array
(
[0] => "chi-phi-sinh-o-benh-v"ien-dai-hoc-y-duoc-co-so-2"
[1] => "abcd-sinh-o-benh-"ien-dai-hoc-y-duoc-co-so-3"
)
)
It seems the structure is wrong and before "
the \
should be added. But the following regex leads to a solution.
(alt)=((["\']).*?[^\\]\3)(?:\s|>)
\3
: matches to 3rd match group. It is used because the value should end with the same sign that started with ("
or '
).
[^\\]\3
: Before the end quotation sign, \
is escaped the closing.
(?:\s|>)
after "
or '
a space or '>' is required.
https://www.phpliveregex.com/p/DmU