How to save all the webpages linked from one

I would like to save this webpage and all the pages it links to. and hope to have the same linking between the saved webpages.

Are there some ways instead of opening and saving each linked pages?


Solution 1:

You can do what you'd like with the wget command line utility. If you provide it with the -r option, it will recursively download web pages. For example:

wget -r http://mat.gsia.cmu.edu/orclass/integer/integer.html

This will download that webpage and anything it links to. You can also make it only recurse a certain number of levels, to do this, you simply provide -r with a number. Like such:

wget -r 5 http://mat.gsia.cmu.edu/orclass/integer/integer.html

Solution 2:

This thread is old now, but others might look at it. Thank you, Wuffers, for pointing me in the right direction but, to expand on Wuffers's answer: A modern version of wget has a number of useful options for recursing links and patching them to be local relative links so that you can navigate a local copy of a web site. Use the -r option to recurse, the -k option to patch local links, the -H option to traverse into domains other than the original one, the -D option to limit which domains you traverse into, the -l option to limit the depth of recursion, and the -p option to make sure that the leaves of your traversal have everything they need to display correctly. For example, the following will download a page and everything it immediately links to, making it locally browsable, the -p option ensures that if the linked-to-pages contain images, that they are downloaded, too:

wget -r -l 1 -p -k -H -D domain.com,relateddomain.com http://domain.com/page/in/domain

Using a command similar to the one above, I was able to download a chunk of a wiki page, with external links, onto my local disk without downloading megabytes of extraneous data. Now, when I open the root page in my browser, I can navigate the tree without an Internet connection. The only irritant was that the root page was buried in subdirectories and I had to create a top-level redirect page in order to make it convenient to display. It may take some trial-and-error to get it right. Read the wget man page and experiment.