URLEncoder not able to translate space character
I am expecting
System.out.println(java.net.URLEncoder.encode("Hello World", "UTF-8"));
to output:
Hello%20World
(20 is ASCII Hex code for space)
However, what I get is:
Hello+World
Am I using the wrong method? What is the correct method I should be using?
Solution 1:
This behaves as expected. The URLEncoder
implements the HTML Specifications for how to encode URLs in HTML forms.
From the javadocs:
This class contains static methods for converting a String to the application/x-www-form-urlencoded MIME format.
and from the HTML Specification:
application/x-www-form-urlencoded
Forms submitted with this content type must be encoded as follows:
- Control names and values are escaped. Space characters are replaced by `+'
You will have to replace it, e.g.:
System.out.println(java.net.URLEncoder.encode("Hello World", "UTF-8").replace("+", "%20"));
Solution 2:
A space is encoded to %20
in URLs, and to +
in forms submitted data (content type application/x-www-form-urlencoded). You need the former.
Using Guava:
dependencies {
compile 'com.google.guava:guava:23.0'
// or, for Android:
compile 'com.google.guava:guava:23.0-android'
}
You can use UrlEscapers:
String encodedString = UrlEscapers.urlFragmentEscaper().escape(inputString);
Don't use String.replace, this would only encode the space. Use a library instead.
Solution 3:
This class perform application/x-www-form-urlencoded
-type encoding rather than percent encoding, therefore replacing with
+
is a correct behaviour.
From javadoc:
When encoding a String, the following rules apply:
- The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same.
- The special characters ".", "-", "*", and "_" remain the same.
- The space character " " is converted into a plus sign "+".
- All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.
Solution 4:
Encode Query params
org.apache.commons.httpclient.util.URIUtil
URIUtil.encodeQuery(input);
OR if you want to escape chars within URI
public static String escapeURIPathParam(String input) {
StringBuilder resultStr = new StringBuilder();
for (char ch : input.toCharArray()) {
if (isUnsafe(ch)) {
resultStr.append('%');
resultStr.append(toHex(ch / 16));
resultStr.append(toHex(ch % 16));
} else{
resultStr.append(ch);
}
}
return resultStr.toString();
}
private static char toHex(int ch) {
return (char) (ch < 10 ? '0' + ch : 'A' + ch - 10);
}
private static boolean isUnsafe(char ch) {
if (ch > 128 || ch < 0)
return true;
return " %$&+,/:;=?@<>#%".indexOf(ch) >= 0;
}