Can't serve static files with ampersand in name

I guess that the web server is misinterpreting the & character as a parameter separator, or something.

It might be worth re-downloading the site with wget --restrict-file-names=windows, to make wget convert & to @ in the file names. Or maybe just bulk rename the files?

find -name '*&*' | while read name ; do
  newname=$(echo $name | sed -e 's:&:@:g')
  mv "$name" "$newname"
done

When I try to read this file in web browser, I get "File not found":

https://xxxx/components/com_flexicontent/librairies/phpthumb/phpThumb.php?src=/images/piekny-wschod/festiwal-globtroterski-lublin2020-karuzela.jpg&w=290&h=177&aoe=1&q=95

If the actual filename on disk is phpThumb.php?src=%2Fimages%2Fpiekny-wschod%2Ffestiwal-globtroterski-lublin2020-karuzela.jpg&w=290&h=177&aoe=1&q=95 then it's not just the & (ampersand) that is causing a problem, but ? also and the encoded slash %2F - the entire query string. %2F is being URL-decoded to /... but the filename contains the literal characters %2F (so this needs to be doubly encoded in the request).

If you request that URL then Apache is looking for the file phpThump.php - which presumably does not exist.

Ideally, the URL would be correctly URL (percent) encoded in the initial request, otherwise we are going to have to manually URL these characters. For example, correctly URL this URL would be:

.../phpThumb.php%3Fsrc=%252Fimages%252Fpiekny-wschod%252Ffestiwal-globtroterski-lublin2020-karuzela.jpg%26w=290%26h=177%26aoe=1%26q=95

Note that since the actual file contains %2F (encoded /), these characters need to be doubly URL encoded in the requested URL so that they only decode to %2F and not /. Although a complication here is that there could seemingly be any number of encoded slashes in the src attribute.

There appears to be a fixed number of URL parameters (ie. &) so these are relatively trivial to replace, with a single condition.

There is also the added complication that since these are not .jpg files (ie. don't have the .jpg file extension) then Apache won't be sending the correct Content-Type header (which is determined by the file extension). This will need to be manually set.

Try the following:

# Manually encode all "%2F" in the query string as "%252F", ie. recursively search and replace
#  - This is not confined just to the "src" URL parameter value
#  - Backslash escape literal "%" in RewriteRule substitution string
RewriteCond %{QUERY_STRING} (.*)%2F(.*)
RewriteRule ^(.+/phpThumb\.php)$ $1?%1\%252F%2 [N]

# Manually encode "?" and "&" in the query string (occur at fixed points)
#  - Backslash escape literal "%" in RewriteRule substitution string
RewriteCond %{QUERY_STRING} ^(src=[^&]+)&(w=[^&]+)&(h=[^&]+)&(aoe=[^&]+)&(q=[^&]+)$
RewriteRule ^(.+/phpThumb\.php)$ $1\%3F%1\%26%2\%26%3\%26%4\%26%5 [T=image/jpg,L]

As noted in the code comments, literal % in the RewriteRule substitution string need to be backslash-escaped to avoid being interpreted as a backreference of the form %n (a backreference to the last matched CondPattern).

Note that this assumes that all images are of type image/jpg.


An alternative approach would be to "prettify" the URL in the source application (URL rewriting this in the source application).

For example, if the image source URLs were of the form:

.../phpthumb/290/177/1/95/images/piekny-wschod/festiwal-globtroterski-lublin2020-karuzela.jpg

You could then use a spot of URL rewriting in the source application:

RewriteRule ^(.+/phpthumb)/(\d+)/(\d+)/(\d+)/(\d+)/(.+\.jpg)$ $1/phpThumb.php?src=$6&w=$2&h=$3&aoe=$4&q=$5 [L]

You then wouldn't need to do anything special in your "mirrored" website to serve these images and you'll have reasonably sensible filenames.