wget does not recurse when piping the output to stdout [closed]
I does not seem possible to achieve my goal with current versions of wget
.
After studying the source code for wget
version 1.18, I came to these conclusions:
-
wget
cannot recurse if it does not store the downloaded files, at least temporarily as for--spider
. -
When passed
-O filename
, it keeps appending tofilename
and reparses the whole file after each download, loading it completely in memory (or mapping it). This is very cumbersome and inefficient. -
When passed
-O-
, it pipes the downloaded file tostdout
and attempts to reload-
to look for more urls to fetch... Which causesstdin
to be read for this purpose. This is a side effect of the implementation.
I wrote a patch to add a more sensible piping option, relying on --spider
to download html and css files for recursive operation and piping only these files before they are removed. I will publish the patch when it is reasonably tested and documented.