Sites not accepting wget user agent header
When I run this command:
wget --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" http://yahoo.com
...I get this result (with nothing else in the file):
<!-- hw147.fp.gq1.yahoo.com uncompressed/chunked Wed Jun 19 03:42:44 UTC 2013 -->
But when I run wget http://yahoo.com
with no --user-agent
option, I get the full page.
The user agent is the same header that my current browser sends. Why does this happen? Is there a way to make sure the user agent doesn't get blocked when using wget?
Solution 1:
It seems Yahoo server does some heuristic based on User-Agent
in a case Accept
header is set to */*
.
Accept: text/html
did the trick for me.
e.g.
wget --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" http://yahoo.com
Note: if you don't declare Accept
header then wget
automatically adds Accept:*/*
which means give me anything you have.
Solution 2:
I created a ~/.wgetrc
file with the following content (obtained from askapache.com but with a newer user agent, because otherwise it didn’t work always):
header = Accept-Language: en-us,en;q=0.5
header = Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
header = Connection: keep-alive
user_agent = Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0
referer = /
robots = off
Now I’m able to download from most (all?) file-sharing (streaming video) sites.
Solution 3:
You need to set both the user-agent and the referer:
wget --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" --referer connect.wso2.com http://dist.wso2.org/products/carbon/4.2.0/wso2carbon-4.2.0.zip