How to download HTTP directory with all files and sub-directories as they appear on the online files/folders list?
Solution 1:
Solution:
wget -r -np -nH --cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/
Explanation:
- It will download all files and subfolders in ddd directory
-
-r
: recursively -
-np
: not going to upper directories, like ccc/… -
-nH
: not saving files to hostname folder -
--cut-dirs=3
: but saving it to ddd by omitting first 3 folders aaa, bbb, ccc -
-R index.html
: excluding index.html files
Reference: http://bmwieczorek.wordpress.com/2008/10/01/wget-recursively-download-all-files-from-certain-directory-listed-by-apache/
Solution 2:
I was able to get this to work thanks to this post utilizing VisualWGet. It worked great for me. The important part seems to be to check the -recursive
flag (see image).
Also found that the -no-parent
flag is important, othewise it will try to download everything.
Solution 3:
you can use lftp, the swish army knife of downloading if you have bigger files you can add --use-pget-n=10
to command
lftp -c 'mirror --parallel=100 https://example.com/files/ ;exit'
Solution 4:
wget -r -np -nH --cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/
From man wget
‘-r’ ‘--recursive’ Turn on recursive retrieving. See Recursive Download, for more details. The default maximum depth is 5.
‘-np’ ‘--no-parent’ Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded. See Directory-Based Limits, for more details.
‘-nH’ ‘--no-host-directories’ Disable generation of host-prefixed directories. By default, invoking Wget with ‘-r http://fly.srk.fer.hr/’ will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior.
‘--cut-dirs=number’ Ignore number directory components. This is useful for getting a fine-grained control over the directory where recursive retrieval will be saved.
Take, for example, the directory at ‘ftp://ftp.xemacs.org/pub/xemacs/’. If you retrieve it with ‘-r’, it will be saved locally under ftp.xemacs.org/pub/xemacs/. While the ‘-nH’ option can remove the ftp.xemacs.org/ part, you are still stuck with pub/xemacs. This is where ‘--cut-dirs’ comes in handy; it makes Wget not “see” number remote directory components. Here are several examples of how ‘--cut-dirs’ option works.
No options -> ftp.xemacs.org/pub/xemacs/ -nH -> pub/xemacs/ -nH --cut-dirs=1 -> xemacs/ -nH --cut-dirs=2 -> .
--cut-dirs=1 -> ftp.xemacs.org/xemacs/ ... If you just want to get rid of the directory structure, this option is similar to a combination of ‘-nd’ and ‘-P’. However, unlike ‘-nd’, ‘--cut-dirs’ does not lose with subdirectories—for instance, with ‘-nH --cut-dirs=1’, a beta/ subdirectory will be placed to xemacs/beta, as one would expect.