How to encode special characters using mod_rewrite & Apache?

I would like to have pretty URLs for my tagging system along with all the special characters: +, &, #, %, and =. Is there a way to do this with mod_rewrite without having to double encode the links?

I notice that delicious.com and stackoverflow seem to be able to handle singly encoded special characters. What's the magic formula?

Here's an example of what I want to happen:

http://www.example.com/tag/c%2b%2b

Would trigger the following RewriteRule:

RewriteRule ^tag/(.*)   script.php?tag=$1

and the value of tag would be "c++"

The normal operation of apache/mod_rewrite doesn't work like this, as it seems to turn the plus signs into spaces. If I double encode the plus sign to '%252B' then I get the desired result - however it makes for messy URLS and seems pretty hack to me.


Solution 1:

The normal operation of apache/mod_rewrite doesn't work like this, as it seems to turn the plus signs into spaces.

I don't think that's quite what's happening. Apache is decoding the %2Bs to +s in the path part since + is a valid character there. It does this before letting mod_rewrite look at the request.

So then mod_rewrite changes your request '/tag/c++' to 'script.php?tag=c++'. But in a query string component in the application/x-www-form-encoded format, the escaping rules are very slightly different to those that apply in path parts. In particular, '+' is a shorthand for space (which could just as well be encoded as '%20', but this is an old behaviour we'll never be able to change now).

So PHP's form-reading code receives the 'c++' and dumps it in your _GET as C-space-space.

Looks like the way around this is to use the rewriteflag 'B'. See http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriteflags - curiously it uses more or less the same example!

RewriteRule ^tag/(.*)$ /script.php?tag=$1 [B]

Solution 2:

I'm not sure I understand what you're asking, but the NE (noescape) flag to Apache's RewriteRule directive might be of some interest to you. Basically, it prevents mod_rewrite from automatically escaping special characters in the substitution pattern you provide. The example given in the Apache 2.2 documentation is

RewriteRule /foo/(.*) /bar/arg=P1\%3d$1 [R,NE]

which will turn, for example, /foo/zed into a redirect to /bar/arg=P1%3dzed, so that the script /bar will then see a query parameter named arg with a value P1=zed, if it looks in its PATH_INFO (okay, that's not a real query parameter, so sue me ;-P).

At least, I think that's how it works . . . I've never used that particular flag myself.