Using Apache rewrite rules in .htaccess to remove .html causing a 500 error
tl;dr A request for /contact/
(or /contact/blah
) results in a rewrite loop (500 Internal Server Error response) because REQUEST_FILENAME
contains the mapped filesystem path; not the URL-path you are expecting.
RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_FILENAME}\.html -f RewriteRule ^(.*)$ $1.html
The "problem" is the use of REQUEST_FILENAME
in the 2nd condition. The REQUEST_FILENAME
server variable contains the absolute filesystem path after the URL has been mapped to the filesystem. This is not necessarily the same as the URL-path - but this condition assumes that it is. When the URL-path contains whole path segments that do not map to the filesystem (as in /contact/blah
or /contact123/blah
) then the REQUEST_FILENAME
is essentially "reduced" to the last path segment that maps to a directory, plus the "filename" (ie. .../contact
and .../contact123
respectively - the document root, ie. /
, is the last matched directory in this example).
Request /contact
When you request /contact
then the URL-path is /contact
and REQUEST_FILENAME
is /path/to/document-root/contact
- so the REQUEST_FILENAME
maps directly to the URL-path. The test condition /path/to/document-root/contact.html
is successful and the request is rewritten to contact.html
. All is good.
Request /contact/
or /contact/blah
However, when you request /contact/
then the URL-path is /contact/
, but the REQUEST_FILENAME
is again /path/to/document-root/contact
(no slash suffix). The test condition is again successful (as above), but the request is rewritten to contact/.html
(since .html
is appended to the captured URL-path, ie. $1.html
). Processing loops, REQUEST_FILENAME
evaluates to the same as before (the condition is again successful) and the request is rewritten a 2nd time to contact/.html.html
. Etc, etc, resulting in a rewrite loop which eventually reaches an internal limit (default 10) when it "breaks" and the server responds with a 500 Internal Server Error.
Request /contact123/blah
/contact123/blah
, on the other hand, results in a 404 because the REQUEST_FILENAME
server variable becomes /path/to/document-root/contact123
and /path/to/document-root/contact123.html
does not exist, so no rewrite occurs in the first place.
Solution
To "fix" this behaviour we need to make sure we are testing the same file/URL-path that we are ultimately rewriting to.
We can do this by constructing the absolute filename (to test) by concatenating the DOCUMENT_ROOT
and REQUEST_URI
server variables (or $1
backreference), which contains the root-relative URL-path. (Note that REQUEST_URI
includes the slash prefix, whereas the $1
backreference does not.)
For example:
# Rewrite request to append ".html" extension to URL
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule (.+) $1.html [L]
Now, the test condition is testing the same filesystem path that the request will be rewritten to (if successful).
There is no need to check that the request does not map to a directory and that it does map to a file (when appending the .html
extension), unless you also have directories with the same name as the file basename (eg. basename.html
and basename/
). But if that is the case then one or other is not going to be inaccessible anyway, so the situation is best avoided.
A request for /contact/
, /contact/blah
or /contact123/blah
all now result in a 404 as expected.
Note that there's no need to backslash escape the literal dot in the RewriteCond
TestString since this is not a regex.
Minor points... the ^
and $
anchors on ^(.*)$
(and ^(.+)$
) are unnecessary since the *
(and +
) quantifier is greedy by default (although some users do still seem to like them for readability?). You should also include the L
(last
) flag on the RewriteRule
. Whilst this is not necessary if this is the only (or last) rule in the .htaccess
file, if you should add more rules later then it probably is (and having to remember to modify existing rules in this way is prone to error).
With the use of the $1
backreference in the RewriteCond
directive, this does assume that the .htaccess
file is in the document root, otherwise, the filesystem check as written will be incorrect. If the .htaccess
file is located in a subdirectory then change the RewriteCond
directive to use the REQUEST_URI
server variable instead. For example:
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.html -f
RewriteRule (.+) $1.html [L]
Optimisation
You could avoid unnecessarily checking all requests that already contain a file extension (ie. all your static resources) by restricting the regex to URLs that do not contain what looks-like a file extension. For example:
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}.html [L]