Problem using wget to download an entire website
Solution 1:
If you want to use wget
, you can use the mirror setting to make an offline copy of a website, although some websites might prevent it with their robots.txt settings that stops automated spidering. I have always had a few problems with wget
(see my other suggestion below), but the following command does work for many sites. However, be aware that the addition of the -H
switch allows it to access all links that are on other sites and to save those also. This command switch can obviously be removed if it is not required.
wget --wait 1 -x -H -mk http://site.to.mirror/
The command to wait
allows some gaps between wget's
requests so that the site is not overwhelmed, and the -x
command switch specifies that the site's directory structure should be exactly mirrored in a folder in your home folder. The -m
switch obviously stands for mirror mode, which allows wget
to download recursively through the site; and the -k
switch means that after download the files referenced will be those in your mirror directory in your home folder and not those back at the site itself.
After man wget
, perhaps the best listing and detailed explanation of wget
commands is here.
If wget
is unsuccessful and you can't grab as much as you want, I should try the command line program httrack
or its web interface, webhttrack
, which are available in the repositories. There are a large amount of options for this program, but it is better for downloading whole websites or parts of websites than wget
. Webhttrack
gives you a wizard to follow for downloading a site (it opens in your browser) as the screenshot below shows.
Solution 2:
Its been a while since I used wget
for this purpose:
I believe I had success with the - m
flag.
wget -mk http://site.com/directory
This probably won't get everything - but it will get you close.
(Reference): This page