Make wget not download files larger than X size
The only limitation option I know which wget
supports is the -Q
switch for quota. This is not what you want though, as it will stop after a combined limit of all files you've downloaded, not individually. Piping each link to it seperately with the -Q
switch won't work either, as explained in the man page.
I don't know what environment you're using, but crawler supports file size limitations with max-length-bytes and runs on the Java platform.
from their user manual:
- max-length-bytes
Maximum number of bytes to download per document. Will truncate file once this limit is reached.
By default this value is set to an extremely large value (in the exabyte range) that will never be reached in practice.
If its about "downloading 2MB max" rather than "download files with max 2MB" you could just limit the output saved to disk.
wget -O - $url |head -c 1024
(with an optional > $SaveAsFile
)
-> saves the first KB and the rest gets truncated.
(enough to see a "OK:$Message", not killing my /tmp with tons of error messages from the remote ;-))
This possible with help of 3rd-party patches: http://yurichev.com/wget.html