Accept and Accept-Charset - Which is superior?

In HTTP you can specify in a request that your client can accept specific content in responses using the accept header, with values such as application/xml. The content type specification allows you to include parameters in the content type, such as charset=utf-8, indicating that you can accept content with a specified character set.

There is also the accept-charset header, which specifies the character encodings which are accepted by the client.

If both headers are specified and the accept header contains content types with the charset parameter, which should be considered the superior header by the server?

e.g.:

Accept: application/xml; q=1,
        text/plain; charset=ISO-8859-1; q=0.8
Accept-Charset: UTF-8

I've sent a few example requests to various servers using Fiddler to test how they respond:

Examples

W3

Request

GET http://www.w3.org/ HTTP/1.1
Host: www.w3.org
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1

Response

Content-Type: text/html; charset=utf-8

Google

Request

GET http://www.google.co.uk/ HTTP/1.1
Host: www.google.co.uk
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1

Response

Content-Type: text/html; charset=ISO-8859-1

StackOverflow

Request

GET http://stackoverflow.com/ HTTP/1.1
Host: stackoverflow.com
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1

Response

Content-Type: text/html; charset=utf-8

Microsoft

Request

GET http://www.microsoft.com/ HTTP/1.1
Host: www.microsoft.com
Accept: text/html;charset=UTF-8
Accept-Charset: ISO-8859-1

Response

Content-Type: text/html

There doesn't seem to be any consensus around what the expected behaviour is. I am trying to look surprised.


Altough you can set media type in Accept header, the charset parameter definition for that media type is not defined anywhere in RFC 2616 (but it is not forbidden, though).

Therefore if you are going to implement a HTTP 1.1 compliant server, you shall first look for Accept-charset header, and then search for your own parameters at Accept header.


Read RFC 2616 Section 14.1 and 14.2. The Accept header does not allow you to specify a charset. You have to use the Accept-Charset header instead.


Firstly, Accept headers can accept parameters, see RFC 7231 section 5.3.2

All text/* mime-types can accept a charset parameter.

The Accept-Charset header allows a user-agent to specify the charsets it supports.

If the Accept-Charset header did not exist, a user-agent would have to specify each charset parameter for each text/* media type it accepted, e.g.

Accept: text/html;charset=US-ASCII, text/html;charset=UTF-8, text/plain;charset=US-ASCII, text/plain;charset=UTF-8

RFC 7231 section 5.3.2 (Accept) clearly states:

Each media-range might be followed by zero or more applicable media type parameters (e.g., charset)

So a charset parameter for each content-type is allowed. In theory a client could accept, for example, text/html only in UTF-8 and text/plain only in US-ASCII.

But it would usually make more sense to state possible charsets in the Accept-Charset header as that applies to all types mentioned in the Accept header.

If those headers’ charsets don’t overlap, the server could send status 406 Not Acceptable.

However, I wouldn’t expect fancy cross-matching from a server for various reasons. It would make the server code more complicated (and therefore more error-prone) while in practice a client would rarely send such requests. Also nowadays I would expect everything server-side is using UTF-8 and sent as-is so there’s nothing to negotiate.