Multiple domains into wget --accept-regex?

Solution 1:

Normally --accept-regex uses the POSIX Extended Regular Expression syntax, where a single | is used for alternative branches. (The same applies if you tell wget to use PCRE syntax, which is a superset of POSIX ERE.)

Note that POSIX Extended regexp syntax (used by egrep or sed -E) is different from the POSIX Basic regexp syntax (used by grep or sed). For example, BRE uses \| for alternative branches and | for a literal pipe symbol, but ERE does the exact opposite. The same goes for parentheses and many other special characters which have to be backslash-prefixed in BRE but not in ERE.

In any case the regexp would look like this:

  • de.wikipedia.org|upload.wikimedia.org

    (de|upload).wikimedia.org

  • More correct (dots are special in regex syntax as well):

    de\.wikipedia\.org|upload\.wikimedia\.org

    (de|upload)\.wikimedia\.org

Note that the | character is special in most interactive shells (it is the pipe operator), so any parameter containing it needs to be quoted:

wget --accept-regex "(de|upload).wikimedia.org"