Using Apache rewrite rules in .htaccess to remove .html causing a 500 error

tl;dr A request for /contact/ (or /contact/blah) results in a rewrite loop (500 Internal Server Error response) because REQUEST_FILENAME contains the mapped filesystem path; not the URL-path you are expecting.


RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html

The "problem" is the use of REQUEST_FILENAME in the 2nd condition. The REQUEST_FILENAME server variable contains the absolute filesystem path after the URL has been mapped to the filesystem. This is not necessarily the same as the URL-path - but this condition assumes that it is. When the URL-path contains whole path segments that do not map to the filesystem (as in /contact/blah or /contact123/blah) then the REQUEST_FILENAME is essentially "reduced" to the last path segment that maps to a directory, plus the "filename" (ie. .../contact and .../contact123 respectively - the document root, ie. /, is the last matched directory in this example).

Request /contact

When you request /contact then the URL-path is /contact and REQUEST_FILENAME is /path/to/document-root/contact - so the REQUEST_FILENAME maps directly to the URL-path. The test condition /path/to/document-root/contact.html is successful and the request is rewritten to contact.html. All is good.

Request /contact/ or /contact/blah

However, when you request /contact/ then the URL-path is /contact/, but the REQUEST_FILENAME is again /path/to/document-root/contact (no slash suffix). The test condition is again successful (as above), but the request is rewritten to contact/.html (since .html is appended to the captured URL-path, ie. $1.html). Processing loops, REQUEST_FILENAME evaluates to the same as before (the condition is again successful) and the request is rewritten a 2nd time to contact/.html.html. Etc, etc, resulting in a rewrite loop which eventually reaches an internal limit (default 10) when it "breaks" and the server responds with a 500 Internal Server Error.

Request /contact123/blah

/contact123/blah, on the other hand, results in a 404 because the REQUEST_FILENAME server variable becomes /path/to/document-root/contact123 and /path/to/document-root/contact123.html does not exist, so no rewrite occurs in the first place.

Solution

To "fix" this behaviour we need to make sure we are testing the same file/URL-path that we are ultimately rewriting to.

We can do this by constructing the absolute filename (to test) by concatenating the DOCUMENT_ROOT and REQUEST_URI server variables (or $1 backreference), which contains the root-relative URL-path. (Note that REQUEST_URI includes the slash prefix, whereas the $1 backreference does not.)

For example:

# Rewrite request to append ".html" extension to URL
RewriteCond %{DOCUMENT_ROOT}/$1.html -f
RewriteRule (.+) $1.html [L]

Now, the test condition is testing the same filesystem path that the request will be rewritten to (if successful).

There is no need to check that the request does not map to a directory and that it does map to a file (when appending the .html extension), unless you also have directories with the same name as the file basename (eg. basename.html and basename/). But if that is the case then one or other is not going to be inaccessible anyway, so the situation is best avoided.

A request for /contact/, /contact/blah or /contact123/blah all now result in a 404 as expected.

Note that there's no need to backslash escape the literal dot in the RewriteCond TestString since this is not a regex.

Minor points... the ^ and $ anchors on ^(.*)$ (and ^(.+)$) are unnecessary since the * (and +) quantifier is greedy by default (although some users do still seem to like them for readability?). You should also include the L (last) flag on the RewriteRule. Whilst this is not necessary if this is the only (or last) rule in the .htaccess file, if you should add more rules later then it probably is (and having to remember to modify existing rules in this way is prone to error).

With the use of the $1 backreference in the RewriteCond directive, this does assume that the .htaccess file is in the document root, otherwise, the filesystem check as written will be incorrect. If the .htaccess file is located in a subdirectory then change the RewriteCond directive to use the REQUEST_URI server variable instead. For example:

RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.html -f
RewriteRule (.+) $1.html [L]

Optimisation

You could avoid unnecessarily checking all requests that already contain a file extension (ie. all your static resources) by restricting the regex to URLs that do not contain what looks-like a file extension. For example:

RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI}.html -f
RewriteRule !\.\w{2,4}$ %{REQUEST_URI}.html [L]