Changing hostname in a url
You can use urllib.parse.urlparse
function and ParseResult._replace
method (Python 3):
>>> import urllib.parse
>>> parsed = urllib.parse.urlparse("https://www.google.dk:80/barbaz")
>>> replaced = parsed._replace(netloc="www.foo.dk:80")
>>> print(replaced)
ParseResult(scheme='https', netloc='www.foo.dk:80', path='/barbaz', params='', query='', fragment='')
If you're using Python 2, then replace urllib.parse
with urlparse
.
ParseResult
is a subclass of namedtuple
and _replace
is a namedtuple
method that:
returns a new instance of the named tuple replacing specified fields with new values
UPDATE:
As @2rs2ts said in the comment netloc
attribute includes a port number.
Good news: ParseResult
has hostname
and port
attributes.
Bad news: hostname
and port
are not the members of namedtuple
, they're dynamic properties and you can't do parsed._replace(hostname="www.foo.dk")
. It'll throw an exception.
If you don't want to split on :
and your url always has a port number and doesn't have username
and password
(that's urls like "https://username:[email protected]:80/barbaz") you can do:
parsed._replace(netloc="{}:{}".format(parsed.hostname, parsed.port))
You can take advantage of urlsplit
and urlunsplit
from Python's urlparse
:
>>> from urlparse import urlsplit, urlunsplit
>>> url = list(urlsplit('https://www.google.dk:80/barbaz'))
>>> url
['https', 'www.google.dk:80', '/barbaz', '', '']
>>> url[1] = 'www.foo.dk:80'
>>> new_url = urlunsplit(url)
>>> new_url
'https://www.foo.dk:80/barbaz'
As the docs state, the argument passed to urlunsplit()
"can be any five-item iterable", so the above code works as expected.
Using urlparse
and urlunparse
methods of urlparse
module:
import urlparse
old_url = 'https://www.google.dk:80/barbaz'
url_lst = list(urlparse.urlparse(old_url))
# Now url_lst is ['https', 'www.google.dk:80', '/barbaz', '', '', '']
url_lst[1] = 'www.foo.dk:80'
# Now url_lst is ['https', 'www.foo.dk:80', '/barbaz', '', '', '']
new_url = urlparse.urlunparse(url_lst)
print(old_url)
print(new_url)
Output:
https://www.google.dk:80/barbaz
https://www.foo.dk:80/barbaz
A simple string replace of the host in the netloc also works in most cases:
>>> p = urlparse.urlparse('https://www.google.dk:80/barbaz')
>>> p._replace(netloc=p.netloc.replace(p.hostname, 'www.foo.dk')).geturl()
'https://www.foo.dk:80/barbaz'
This will not work if, by some chance, the user name or password matches the hostname. You cannot limit str.replace to replace the last occurrence only, so instead we can use split and join:
>>> p = urlparse.urlparse('https://www.google.dk:[email protected]:80/barbaz')
>>> new_netloc = 'www.foo.dk'.join(p.netloc.rsplit(p.hostname, 1))
>>> p._replace(netloc=new_netloc).geturl()
'https://www.google.dk:[email protected]:80/barbaz'