Regex: Cannot find a line that contain a string in one place, but it finds the line if the string is in other place on the same line

I have this regex, it finds only the meta html tag that contain at least 3 of this words.

<meta name="description" content=.*(( the | that | of ).*){3,}.*>

The problem:

I have this 2 similar lines. Both have the same words, except the second line, where the is in a different place. So why does my regex finds only the second line, and not also the first line? How can I change the regex so as to find both lines?

<meta name="description" content="the mystery of the art that seeks its meaning.">

<meta name="description" content="the mystery of art that seeks the its meaning.">


Solution 1:

For such search, you have to use positive lookahead:

  • Ctrl+F
  • Find what: <meta name="description" content="(?=[^">]*?\bthe\b)(?=[^">]*?\bthat\b)(?=[^">]*?\bof\b )[^">]*">
  • CHECK Wrap around
  • CHECK Regular expression
  • Find All in Current Document

Explanation:

<meta name="description" content="      # literally
(?=                                     # positive lookahead, make sure we have after:
    [^">]*?                                 # 0 or more any character that is not " or >
    \b                                      # word boundary
    the                                     # the word the
    \b                                      # word boundary
)                                       # end lookahead
(?=[^">]*?\bthat\b)                     # same for the word that
(?=[^">]*?\bof\b )                      # same for the word of
[^">]*                                  # 0 or more any character that is not " or >
">                                      # literally

Screenshot:

enter image description here