How to exclude certain directories while using wget?
I'd like to download a directory from a FTP, which contains some source codes. Initially, I did this:
wget -r ftp://path/to/src
Unfortunately, the directory itself is a result of a SVN checkout, so there are lots of .svn directories, and crawling over them would take longer time. Is it possible to exclude those .svn directories?
wget -X directory_to_exclude[,other_directory_to_exclude] -r ftp://URL_ftp_server
SERVER
|-logs
|-etc
|-cache
|-public_html
|-images
|-videos ( want to exclude )
|-files
|-audio (want to exclude)
wget -X /public_html/videos,/public_html/audio ftp:SERVER/public_html/*
wget --exclude-directories=.svn -r ftp://path/to/src
I'd like to answer this a bit broader, because the subject of this question can be found via a search engine:
--exclude-directories=list
expects absolute paths [1]. This means with host.org/fu/bar/
you have to write --exclude-directories=/fu/bar
.
This can be a problem, if you always want to exclude a folder with a specific name, no matter where it is exactly (for example a 'thumbs' folder).
For this we can use --reject-regex
[2] like this: --reject-regex="/thumbs/"
. Given this is now regex and not a comma-separated string list, we can exclude multiple folders via regex1|regex2|regex3
: --reject-regex="/thumbs/|/css/"
. Keep in mind that certain characters like .
have a special meaning in regex and need to be escaped to be part of a folder name: "/\.svn/"
.