Problem using wget to download an entire website

Solution 1:

If you want to use wget, you can use the mirror setting to make an offline copy of a website, although some websites might prevent it with their robots.txt settings that stops automated spidering. I have always had a few problems with wget (see my other suggestion below), but the following command does work for many sites. However, be aware that the addition of the -H switch allows it to access all links that are on other sites and to save those also. This command switch can obviously be removed if it is not required.

 wget --wait 1 -x -H -mk http://site.to.mirror/

The command to wait allows some gaps between wget's requests so that the site is not overwhelmed, and the -x command switch specifies that the site's directory structure should be exactly mirrored in a folder in your home folder. The -m switch obviously stands for mirror mode, which allows wget to download recursively through the site; and the -k switch means that after download the files referenced will be those in your mirror directory in your home folder and not those back at the site itself.

After man wget, perhaps the best listing and detailed explanation of wget commands is here.

If wget is unsuccessful and you can't grab as much as you want, I should try the command line program httrack or its web interface, webhttrack, which are available in the repositories. There are a large amount of options for this program, but it is better for downloading whole websites or parts of websites than wget. Webhttrack gives you a wizard to follow for downloading a site (it opens in your browser) as the screenshot below shows.

Httrack

Solution 2:

Its been a while since I used wget for this purpose:

I believe I had success with the - m flag.

wget -mk http://site.com/directory

This probably won't get everything - but it will get you close.

(Reference): This page

Problem using wget to download an entire website

Solution 1:

Solution 2:

Related

Recent Posts