Apache mod_rewrite/mod_redirect: convert URL with query parameters into an SEO-friendly URL. some query parameters are ignored

As above.

might be best explained with an example:

  • original URL (the order of the params is NOT consistent)

    • www.myhost.com/my-page?p1=v1&p2=v2&p3=v3
    • OR www.myhost.com/my-page?p2=v2&p1=v1&p3=v3
    • OR www.myhost.com/my-page?p3=v3&p1=v1&p2=v2
  • If I visit any of the 3 sample URLs above, I want to get redirected to www.myhost.com/my-page/v1/v2

    • p3 was purposely ignored because it is not needed anymore.
    • in my real world URL, I have p4 to p6 parameters which are also ignored. I only wrote up to p3 for simplicity.

I've looked/searched the net but nothing comes close to what I'm trying to do. Thank you.


Solution 1:

You can do it like the following using mod_rewrite:

RewriteEngine On

RewriteCond %{QUERY_STRING} (?:^|&)p1=([^&]+)
RewriteCond %1@%{QUERY_STRING} ^([^@]+)@(?:|.*&)p2=([^&]*)
RewriteRule ^/?(my-page)$ /$1/%1/%2 [QSD,R=302,L]

The first condition captures the value of p1 and passes this to the second condition that also captures the value of p2. These are then used in the substitution string as %1 and %2 backreferences respectively.

The URL parameters can occur in any order.

@ (in the second condition) is just an arbitrary character that does not occur in v1 and so is used as a delimiter between v1 and the query string when searching for p2.

All other URL parameters (ie. p3..pN) are ignored (and discarded).

Both p1 and p2 must exist with a non-empty value for the redirect to occur.

UPDATE: p1 must exist with a non-empty value (otherwise the resulting redirect will be ambiguous). p2 must also exist, but the value can be empty. I've also updated the regex so that it would also allow an arguably malformed (but still valid) query string where the URL parameter delimiter (&) occurs before p2 at the very start of the query string. eg. /my-page?&p2=&p1=v1 would still be redirected to /my-page/v1/.

The QSD flag discards the original query string from the request.

Depending on where these directives are being used (eg. directory or server context) and how the URL is ultimately being routed, you may need to add an initial condition to ensure that only direct requests (as opposed to rewritten requests) are processed.

For example, add the following as the first condition if required:

RewriteCond %{ENV:REDIRECT_STATUS} ^$
:

UPDATE:

p1 and p2 are mandatory. p3 is optional.

You could change the above as follows:

RewriteCond %{QUERY_STRING} (?:^|&)p1=([^&]+)
RewriteCond %1@%{QUERY_STRING} ^([^@]+)@(?:|.*&)p2=([^&]+)
# Either p3 is mandatory and must have a value
RewriteCond %1/%2/@%{QUERY_STRING} ^([^@]+)@(?:|.*&)p3=([^&]+) [OR]
# OR p3 is omitted or does not have a value
RewriteCond %1/%2 (.+)
RewriteRule ^/?(my-page)$ /$1/%1%2 [QSD,R,L]

The 2nd condition now ensures there is a value for the p2 parameter (ie. v2) - it cannot be empty.

Then either the 3rd condition checks for a non-empty p3 parameter OR the 4th condition simply captures the values when the p3 parameter is either empty or omitted altogether.

The resulting redirect omits the trailing slash when p3 is not present. For example, the resulting redirect is either /mypage/v1/v2 or /mypage/v1/v2/v3. (The trailing slash that might otherwise occur on /mypage/v1/v2/ is avoided.)

In the final substitution string, the values of the %1 and %2 backreferences are a little different to before. These no longer contain each value (ie. v1 and v2). Instead %1 contains v1/v2 and %2 contains either /v3 (including the slash prefix) or is empty (when p3 is empty or omitted).