Unescape Python Strings From HTTP
I've got a string from an HTTP header, but it's been escaped.. what function can I use to unescape it?
myemail%40gmail.com -> [email protected]
Would urllib.unquote() be the way to go?
Solution 1:
I am pretty sure that urllib's unquote
is the common way of doing this.
>>> import urllib
>>> urllib.unquote("myemail%40gmail.com")
'[email protected]'
There's also unquote_plus
:
Like unquote(), but also replaces plus signs by spaces, as required for unquoting HTML form values.
Solution 2:
Yes, it appears that urllib.unquote()
accomplishes that task. (I tested it against your example on codepad.)
Solution 3:
In Python 3, these functions are urllib.parse.unquote
and urllib.parse.unquote_plus
.
The latter is used for example for query strings in the HTTP URLs, where the space characters () are traditionally encoded as plus character (
+
), and the +
is percent-encoded to %2B
.
In addition to these there is the unquote_to_bytes
that converts the given encoded string to bytes
, which can be used when the encoding is not known or the encoded data is binary data. However there is no unquote_plus_to_bytes
, if you need it, you can do:
def unquote_plus_to_bytes(s):
if isinstance(s, bytes):
s = s.replace(b'+', b' ')
else:
s = s.replace('+', ' ')
return unquote_to_bytes(s)
More information on whether to use unquote
or unquote_plus
is available at URL encoding the space character: + or %20.