Can't open Unicode URL with Python

This works for me:

#!/usr/bin/env python
# define source file encoding, see: http://www.python.org/dev/peps/pep-0263/
# -*- coding: utf-8 -*-

import urllib
url = u'http://example.com/índice.html'
content = urllib.urlopen(url.encode("UTF-8")).read()

Per the applicable standard, RFC 1378, URLs can only contain ASCII characters. Good explanation here, and I quote:

"...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."

As the URLs I've given explain, this probably means you'll have to replace that "lowercase i with acute accent" with `%ED'.


Encoding the URL as utf-8, should have worked. I wonder if your source file is properly encoded, and whether the interpreter knows it. If your python source file is saved as UTF-8, for example, then you should have

# coding=UTF-8

as the first or second line.

import urllib
url = u'http://mydomain.es/índice.html'
content = urllib.urlopen(url.encode('utf-8')).read()

works for me.

Edit: also, be aware that Unicode text in an interactive Python session (whether through IDLE, or a console) is fraught with encoding-related difficulty. In those cases, you should use Unicode literals (like \u00ED in your case).