How to serve a wget --mirror'ed directory of files with questionmarks in them
I'm trying to create a static mirror of a php application (an old php Gallery installation, specifically). The app produces URLs such as:
view_album.php?set_albumName=MyAlbum
wget
downloads these directly to files named the same, complete with question marks. In order to not break inbound links, I'd like to keep those names. But how do I serve them? I'm running into two problems:
Webservers (correctly) attempt to find "view_album.php", and pass it the query args, rather than a finding a file with a question mark in it. How do I tell a webserver to look for files with a question mark in them? Renaming the files isn't desirable, as it would break inbound links. I can't tell the inbound linkers to %-encode their URLs.
The files don't end with HTML, so most webservers won't send an html content-type header. What configuration parameters should I look for to tell it to force a 'text/html' content-type for all files in a directory or matching a certain pattern?
I'm using lighttpd ultimately, but if you know what sort of configuration might get the desired results with apache/nginx I'd love to hear that too.
wget downloads these directly to files named the same, complete with question marks.
You can disable that behavior with --restrict-file-names=ascii,windows
, this resolves your issue right on wget, without needing fancy server configs.
I think you can also fix this by changing the way wget
downloads the php files:
wget -r --adjust-extension --convert-links 'http://example.com/index.php?foo=bar'
Option --adjust-extension
makes wget
save the PHP files with a .html
extension, e.g. index.php?foo=bar.html
Option --convert-links
makes wget
convert the links in the downloaded files to the newly created .html files. Note that this conversion takes place after all files have been downloaded.
See also: http://fvue.nl/wiki/Wget_storing_files_with_question_marks