.htaccess: Remove everything after '/index.php'
I've spent hours looking for the ultimate way to optimize my URLs in a way they cannot break the pages and their structure.
I first needed a way to remove the trailing slash at the end of the URL - resolved! website.com/index.php/
used to be a problem but now the /
at the end of the URL is gone, no matter what!
Then I found out that directories themselves can also cause problems. So I had to find a way to turn something like website.com/page/////
to website.com/page/
. No extra slashes. Resolved!
Now, I jumped onto the next problem. Putting something after website.com/index.php
will cause problems. For instance: website.com/index.php/index.php
or website.com/index.php/abc
will not throw you a 404 error but you will remain at the current page with all the URLs (for stylesheets and scripts) broken. Which... is not good.
I actually found a few similar problems out there but none of them really helped me resolve this case. Any help would be appreciated!
.htaccess:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1 [L,R=302]
RewriteCond %{THE_REQUEST} //
RewriteRule ^.*$ $0 [R=302,L,NE]
Putting something after
example.com/index.php
will cause problems. For instance:example.com/index.php/index.php
orexample.com/index.php/abc
will not throw you a 404 error
The /<something>
that appears after a file, such as /index.php
(duplicated) and /abc
in the above examples is called additional pathname information (path-info). Whether this is permitted or not (ie. serves a 404) is, by default, dependent on the file handler. PHP permits path-info.
However, you can override this and disable path-info on all file types:
AcceptPathInfo Off
Now, any path-info appearing on the URL-path will trigger a 404. Incidentally, this would also handle requests of the form example.com/index.php/
, that you mentioned previously and handled with mod_rewrite instead.
...with all the URLs (for stylesheets and scripts) broken.
That's because you are using relative URL-paths to your client-side resources. Relative URLs are always likely to be a problem when rewriting the URL. More info about that here:
https://webmasters.stackexchange.com/questions/86450/htaccess-rewrite-url-leads-to-missing-css
RewriteCond %{THE_REQUEST} // RewriteRule ^.*$ $0 [R=302,L,NE]
This isn't strictly correct as I could (maliciously) add multiple slashes in the query string and send the request into a redirect loop.
To avoid matching multiple slashes in the query string you could change the CondPattern from the simple //
to \s[^?]*//+
. (Although I feel there must be a more efficient regex?)