Escaping query strings with wget --mirror

I'm using wget --mirror --html-extension --convert-links to mirror a site, but I end up with lots of filenames in the format post.php?id=#.html. When I try to view these in a browser it fails, because the browser ignores the query string when loading the file. Is there any way to replace the ? character in the filenames with something else?


The answer of --restrict-file-names=windows worked correctly. In conjunction with the flags --convert-links and --adjust-extension/-E (formerly named --html-extension, which also works but is deprecated) it produces a mirror that behaves as expected.

wget  --mirror --adjust-extension --convert-links --restrict-file-names=windows http://www.example

Solution 1:

See the --restrict-file-names option. While not exactly intended for this particular purpose, --restrict-file-names=windows will probably help you along:

--restrict-file-names=modes

Change which characters found in remote URLs must be escaped during generation of local filenames. [...]

When "windows" is given, Wget escapes the characters \, |, /, :, ?, ", *, <, >, and the control characters in the ranges 0--31 and 128--159. In addition to this, Wget in Windows mode uses + instead of : to separate host and port in local file names, and uses @ instead of ? to separate the query portion of the file name from the rest. Therefore, a URL that would be saved as www.xemacs.org:4300/search.pl?input=blah in Unix mode would be saved as www.xemacs.org+4300/search.pl@input=blah in Windows mode.

Solution 2:

Your browser will view it fine if you use an URL like

file:///tmp/example.com/post.php%3Fid=1.html

instead of

file:///tmp/example.com/post.php?id=1.html

Note: if you're having trouble with internal links from downloaded files, it'd be because you terminated wget before it was done with the downloading. Since you specified --convert-links and --html-extension (only applies when those are given), wget would normally fix the links to use %3F instead of ?; however, it does this at the end, after it's finished downloading; if it has been interrupted, it will not have fixed any of the links, and you're left in this predicament. Of course, you can always write a script to go through and fix the links, but...