How to download with wget without following links with parameters

I'm trying to download two sites for inclusion on a CD:

http://boinc.berkeley.edu/trac/wiki
http://www.boinc-wiki.info

The problem I'm having is that these are both wikis. So when downloading with e.g.:

wget -r -k -np -nv -R jpg,jpeg,gif,png,tif http://www.boinc-wiki.info/

I do get a lot of files because it also follows links like ...?action=edit ...?action=diff&version=...

Does somebody know a way to get around this?

I just want the current pages, without images, and without diffs etc.

P.S.:

wget -r -k -np -nv -l 1 -R jpg,jpeg,png,gif,tif,pdf,ppt http://boinc.berkeley.edu/trac/wiki/TitleIndex

This worked for berkeley but boinc-wiki.info is still giving me trouble :/

P.P.S:

I got what appears to be the most relevant pages with:

wget -r -k -nv  -l 2 -R jpg,jpeg,png,gif,tif,pdf,ppt http://www.boinc-wiki.info

wget --reject-regex '(.*)\?(.*)' http://example.com

(--reject-type posix by default). Works only for recent (>=1.14) versions of wget though, according to other comments.

Beware that it seems you can use --reject-regex only once per wget call. That is, you have to use | in a single regex if you want to select on several regex :

wget --reject-regex 'expr1|expr2|…' http://example.com

The documentation for wget says:

Note, too, that query strings (strings at the end of a URL beginning with a question mark (‘?’) are not included as part of the filename for accept/reject rules, even though these will actually contribute to the name chosen for the local file. It is expected that a future version of Wget will provide an option to allow matching against query strings.

It looks like this functionality has been on the table for awhile and nothing has been done with it.

I haven't used it, but httrack looks like it has a more robust filtering feature set than wget and may be a better fit for what you're looking for (read about filters here http://www.httrack.com/html/fcguide.html).


The new version of wget (v.1.14) solves all these problems.

You have to use the new option --reject-regex=.... to handle query strings.

Note that I couldn't find the new manual that includes these new options, so you have to use the help command wget --help > help.txt