How to encode special characters using mod_rewrite & Apache?
I would like to have pretty URLs for my tagging system along with all the special characters: +
, &
, #
, %
, and =
. Is there a way to do this with mod_rewrite without having to double encode the links?
I notice that delicious.com and stackoverflow seem to be able to handle singly encoded special characters. What's the magic formula?
Here's an example of what I want to happen:
http://www.example.com/tag/c%2b%2b
Would trigger the following RewriteRule:
RewriteRule ^tag/(.*) script.php?tag=$1
and the value of tag would be "c++"
The normal operation of apache/mod_rewrite doesn't work like this, as it seems to turn the plus signs into spaces. If I double encode the plus sign to '%252B' then I get the desired result - however it makes for messy URLS and seems pretty hack to me.
Solution 1:
The normal operation of apache/mod_rewrite doesn't work like this, as it seems to turn the plus signs into spaces.
I don't think that's quite what's happening. Apache is decoding the %2Bs to +s in the path part since + is a valid character there. It does this before letting mod_rewrite look at the request.
So then mod_rewrite changes your request '/tag/c++' to 'script.php?tag=c++'. But in a query string component in the application/x-www-form-encoded format, the escaping rules are very slightly different to those that apply in path parts. In particular, '+' is a shorthand for space (which could just as well be encoded as '%20', but this is an old behaviour we'll never be able to change now).
So PHP's form-reading code receives the 'c++' and dumps it in your _GET as C-space-space.
Looks like the way around this is to use the rewriteflag 'B'. See http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriteflags - curiously it uses more or less the same example!
RewriteRule ^tag/(.*)$ /script.php?tag=$1 [B]
Solution 2:
I'm not sure I understand what you're asking, but the NE
(noescape) flag to Apache's RewriteRule
directive might be of some interest to you. Basically, it prevents mod_rewrite
from automatically escaping special characters in the substitution pattern you provide. The example given in the Apache 2.2 documentation is
RewriteRule /foo/(.*) /bar/arg=P1\%3d$1 [R,NE]
which will turn, for example, /foo/zed
into a redirect to /bar/arg=P1%3dzed
, so that the script /bar
will then see a query parameter named arg
with a value P1=zed
, if it looks in its PATH_INFO
(okay, that's not a real query parameter, so sue me ;-P).
At least, I think that's how it works . . . I've never used that particular flag myself.