How can i make wget download only pages not css images etc?

You've explicitly told wget to only accept files which have .html as a suffix.

Assuming that the php pages have .php, you can do this:

wget -bqre robots=off -A.html,.php example.com –user-agent=”Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6″

Note that this will downloaded the rendered html, not the source of the php. If the page is sufficiently dynamic, you might not get the rendered result you expect.

However, I'd suggest that another tool such as httrack may do a better job - it depends on exactly what you need to do.

-A takes a list, so -A.html,.php should fit the bill. You should also look in to -R (it also takes a reject list).

Yes, there is, and it's pretty simple. Take a look at this SO answer: https://superuser.com/questions/709702/how-to-crawl-using-wget-to-download-only-html-files-ignore-images-css-js

tl/dr; use --follow-tags=a which will follow only a tags.

How can i make wget download only pages not css images etc?

Related

Recent Posts