Extract domain from URL in python [duplicate]
I have an url like:http://abc.hostname.com/somethings/anything/
I want to get:hostname.com
What module can I use to accomplish this?
I want to use the same module and method in python2.
Solution 1:
For parsing the domain of a URL in Python 3, you can use:
from urllib.parse import urlparse
domain = urlparse('http://www.example.test/foo/bar').netloc
print(domain) # --> www.example.test
However, for reliably parsing the top-level domain (example.test
in this example), you need to install a specialized library (e.g., tldextract).
Solution 2:
Instead of regex or hand-written solutions, you can use python's urlparse
from urllib.parse import urlparse
print(urlparse('http://abc.hostname.com/somethings/anything/'))
>> ParseResult(scheme='http', netloc='abc.hostname.com', path='/somethings/anything/', params='', query='', fragment='')
print(urlparse('http://abc.hostname.com/somethings/anything/').netloc)
>> abc.hostname.com
To get without the subdomain
t = urlparse('http://abc.hostname.com/somethings/anything/').netloc
print ('.'.join(t.split('.')[-2:]))
>> hostname.com
Solution 3:
You can use tldextract.
Example code:
from tldextract import extract
tsd, td, tsu = extract("http://abc.hostname.com/somethings/anything/") # prints abc, hostname, com
url = td + '.' + tsu # will prints as hostname.com
print(url)