Sending non-ASCII text in HTTP POST header
I am sending a file to a server as an octet-stream, and I need to specify the filename in the header:
String filename = "«úü¡»¿.doc"
URL url = new URL("http://www.myurl.com");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.addRequestProperty("Accept", "application/json; charset=UTF-8");
conn.addRequestProperty("Content-Type", "application/octet-stream; charset=UTF-8");
conn.addRequestProperty("Filename", filename);
// do more stuff here
The problem is, some of the files I need to send have filenames that contain non-ASCII characters. I have read that you cannot send non-ASCII text in an HTTP header.
My questions are:
- Can you send non-ASCII text in an HTTP header?
- If you can, how do you do this? The code above does not work when filename contains non-ASCII text. The server responds with "Bad Request 400".
- If you cannot, what is the typical way to get around this limitation?
You cannot use non ASCII character in HTTP headers, see the RFC 2616. URI are themselves standardized by RFC 2396 and don't permit non-ASCII either. The RFC says :
The URI syntax was designed with global transcribability as one of its main concerns. A URI is a sequence of characters from a very limited set, i.e. the letters of the basic Latin alphabet, digits, and a few special characters.
In order to use non ASCII characters in URI you need to escape them using the %hexcode syntax (see section 2 of RFC 2396).
In Java you can do this using the java.net.URLEncoder
class.
2020 edit: RFC 2616 has been updated and the relevant section on header syntax is now at https://www.rfc-editor.org/rfc/rfc7230#section-3.2
header-field = field-name ":" OWS field-value OWS
field-name = token
field-value = *( field-content / obs-fold )
field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
field-vchar = VCHAR / obs-text
obs-fold = CRLF 1*( SP / HTAB )
; obsolete line folding
; see Section 3.2.4
Where VCHAR is defined in https://www.rfc-editor.org/rfc/rfc7230#section-1.2 as "any visible [USASCII] character". With the [USASCII] reference being
[USASCII] American National Standards Institute, "Coded Character
Set -- 7-bit American Standard Code for Information
Interchange", ANSI X3.4, 1986.
The standards are still very clear, HTTP header are still US-ASCII ONLY