Regex: Cannot find a line that contain a string in one place, but it finds the line if the string is in other place on the same line
I have this regex, it finds only the meta html tag that contain at least 3 of this words.
<meta name="description" content=.*(( the | that | of ).*){3,}.*>
The problem:
I have this 2 similar lines. Both have the same words, except the second line, where the is in a different place. So why does my regex finds only the second line, and not also the first line? How can I change the regex so as to find both lines?
<meta name="description" content="the mystery of the art that seeks its meaning.">
<meta name="description" content="the mystery of art that seeks the its meaning.">
Solution 1:
For such search, you have to use positive lookahead:
- Ctrl+F
- Find what:
<meta name="description" content="(?=[^">]*?\bthe\b)(?=[^">]*?\bthat\b)(?=[^">]*?\bof\b )[^">]*">
- CHECK Wrap around
- CHECK Regular expression
- Find All in Current Document
Explanation:
<meta name="description" content=" # literally
(?= # positive lookahead, make sure we have after:
[^">]*? # 0 or more any character that is not " or >
\b # word boundary
the # the word the
\b # word boundary
) # end lookahead
(?=[^">]*?\bthat\b) # same for the word that
(?=[^">]*?\bof\b ) # same for the word of
[^">]* # 0 or more any character that is not " or >
"> # literally
Screenshot: