Challenging RegEx not working for fellow noob

I want to capture all URL's in a document, but that are not from google,bscscan,github, etc.

So far I have this Regex working

(www|http:|https:)+[\W]+(?!bscscan|google|binance|t\.me)[\w]+

When applied to this paragraph

https://bscscan.com   testing123
website: https://www.yahoo.com
another one www.bing.com is great
www.binance.org
http://bob.bscscan.com
https://twitter.google.com
https://google.twitter.com
https://t.me/rawr omg

It matches only

1) https://www 
2) www.bing
3) http://bob
4) https:/twitter

But I want it to match

https://yahoo.com
www.bing.com

Fixes desired

#1) Include entire URL link.

#2) Omit the URLs that have ANY mention of the negative lookahead words within the link.


Solution 1:

Use

\b(?:www\.|https?:)(?!\S*\b(?:bscscan|google|binance|t\.me)\b)\S+

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    www                      'www'
--------------------------------------------------------------------------------
    \.                       '.'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    http                     'http'
--------------------------------------------------------------------------------
    s?                       's' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    :                        ':'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    \S*                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (0 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
    (?:                      group, but do not capture:
--------------------------------------------------------------------------------
      bscscan                  'bscscan'
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      google                   'google'
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      binance                  'binance'
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      t                        't'
--------------------------------------------------------------------------------
      \.                       '.'
--------------------------------------------------------------------------------
      me                       'me'
--------------------------------------------------------------------------------
    )                        end of grouping
--------------------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  \S+                      non-whitespace (all but \n, \r, \t, \f,
                           and " ") (1 or more times (matching the
                           most amount possible))

Solution 2:

Try this one, it has enough expressions in there to allow you to modify them based on how it is implemented:

/(|www\.|http\:\/\/|https\:\/\/)(?!(bscscan|google|binance|t\.me|twitter|bob))(yahoo\.com|bing\.com)/g

This will match any of the following variations:

https://yahoo.com.  <- your required one
www.bing.com.       <- your required one
www.yahoo.com
https://bing.com
http://bing.com
bing.com            <- remove the "|" before "www" if you don't want this one
yahoo.com           <- remove the "|" before "www" if you don't want this one

if you add (https\:\/\/www\.)|(http\:\/\/www\.) then it will also match https://www and http://www