How to serve a wget --mirror'ed directory of files with questionmarks in them

I'm trying to create a static mirror of a php application (an old php Gallery installation, specifically). The app produces URLs such as:

view_album.php?set_albumName=MyAlbum

wget downloads these directly to files named the same, complete with question marks. In order to not break inbound links, I'd like to keep those names. But how do I serve them? I'm running into two problems:

  1. Webservers (correctly) attempt to find "view_album.php", and pass it the query args, rather than a finding a file with a question mark in it. How do I tell a webserver to look for files with a question mark in them? Renaming the files isn't desirable, as it would break inbound links. I can't tell the inbound linkers to %-encode their URLs.

  2. The files don't end with HTML, so most webservers won't send an html content-type header. What configuration parameters should I look for to tell it to force a 'text/html' content-type for all files in a directory or matching a certain pattern?

I'm using lighttpd ultimately, but if you know what sort of configuration might get the desired results with apache/nginx I'd love to hear that too.


wget downloads these directly to files named the same, complete with question marks.

You can disable that behavior with --restrict-file-names=ascii,windows, this resolves your issue right on wget, without needing fancy server configs.


I think you can also fix this by changing the way wget downloads the php files:

wget -r --adjust-extension --convert-links 'http://example.com/index.php?foo=bar'

Option --adjust-extension makes wget save the PHP files with a .html extension, e.g. index.php?foo=bar.html

Option --convert-links makes wget convert the links in the downloaded files to the newly created .html files. Note that this conversion takes place after all files have been downloaded.

See also: http://fvue.nl/wiki/Wget_storing_files_with_question_marks