how to force wget to ignore certain redirects
I am trying to wget a list of urls (images) some of them no longer exist and the host redirect to a generic "this image doesn't exist" page, of which I know the url of. I would like to wget the file unless it 302's to this domain, is it possible.
I can stop it getting the file if any redirects with the --max-redirect=0
flag, but this may stop getting real images if I hit a mirror
Solution 1:
The only (really hacky) way I can imagine to accomplish this is to implement an HTTP proxy in front of wget
, which can override the "image not found" with an error code so that you're not downloading it.
Any configurable proxy should be able to get this kind of behavior - for example, with Apache you could do something like:
ProxyRequests On
<Proxy http://example.com/path/to/image-not-found.jpg>
Order allow,deny
Deny from all
</Proxy>