get url link from href by beautifulsoup without redirect link
Solution 1:
You can use urllib.parse
package. The URL you are looking for is indeed one of the parameters of the /biz_redir
, so we need to first get the 'url'
parameter out of it.
from urllib.parse import urlparse, parse_qs
url = '/biz_redir?url=https%3A%2F%2Faceplumbingandrooter.com&' \
'cachebuster=1642876680&website_link_type=website&' \
'src_bizid=hqjCHBGnEj4nECnLJBvjQw&s=2caa69aa7350cca9ad00' \
'f1fd1d5a6346f341dd43e1ede874aa2eaa94d6a3458f'
parsed_url = urlparse(url)
print(parse_qs(parsed_url.query)['url'][0])
This gives you full URL https://aceplumbingandrooter.com
. You can then parse it further and get the netloc
, here is complete code:
from urllib.parse import urlparse, parse_qs
url = '/biz_redir?url=https%3A%2F%2Faceplumbingandrooter.com&' \
'cachebuster=1642876680&website_link_type=website&' \
'src_bizid=hqjCHBGnEj4nECnLJBvjQw&s=2caa69aa7350cca9ad00' \
'f1fd1d5a6346f341dd43e1ede874aa2eaa94d6a3458f'
parsed_url = urlparse(url)
new = parse_qs(parsed_url.query)['url'][0]
new = urlparse(new)
print(new.netloc)
output:
aceplumbingandrooter.com