mod_proxy_html garbles non-ASCII characters
I've set up a reverse proxy with mod_proxy, mod_proxy_html (3.1.3) and mod_xml2enc on a CentOS 6.4 box.
The proxy serves up the target server just fine, but it garbles non-ASCII characters (in my case 'äöüéàè').
I've googled all over the map trying to find a solution to this but to no avail.
The encoding is correctly specified in the response header and identical to the one of the target server (utf-8). I've also tried explicity setting the encoding used by xml2enc via:
xml2EncDefault utf-8
but to no effect.
I'm running the proxy off a vhost with the proxy configuration set as follows:
ProxyRequests off
ProxyHTMLLinks a href
ProxyHTMLLinks area href
ProxyHTMLLinks link href
ProxyHTMLLinks img src longdesc usemap
ProxyHTMLLinks object classid codebase data usemap
ProxyHTMLLinks q cite
ProxyHTMLLinks blockquote cite
ProxyHTMLLinks ins cite
ProxyHTMLLinks del cite
ProxyHTMLLinks form action
ProxyHTMLLinks input src usemap
ProxyHTMLLinks head profile
ProxyHTMLLinks base href
ProxyHTMLLinks script src for
ProxyHTMLLinks iframe src
ProxyPass /foo/ http://someserver.com/
ProxyPassReverse /foo/ http://www.someserver.com/
<Location /foo/>
SetOutputFilter INFLATE;proxy-html;DEFLATE
ProxyPassReverse /
ProxyPassReverseCookiePath / /foo
ProxyHTMLURLMap http://www.someserver.com /foo
ProxyHTMLURLMap http://someserver.com /foo
RequestHeader unset Accept-Encoding
</Location>
Solution 1:
Turns out that 'mod_proxy_html' was innocent in all this.
Declaring the encoding via:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
made the issue go away.
This is a bit odd, as the 'Content-Type' was properly set in the response header.