Getting all files from a web page using curl
Use wget
instead.
Install it with Homebrew: brew install wget
or MacPorts: sudo port install wget
For downloading files from a directory listing, use -r
(recursive), -np
(don't follow links to parent directories), and -k
to make links in downloaded HTML or CSS point to local files (credit @xaccrocheur).
wget -r -np -k http://www.ime.usp.br/~coelho/mac0122-2013/ep2/esqueleto/
Other useful options:
-
-nd
(no directories): download all files to the current directory -
-e robots.off
: ignore robots.txt files, don't download robots.txt files -
-A png,jpg
: accept only files with the extensionspng
orjpg
-
-m
(mirror):-r --timestamping --level inf --no-remove-listing
-
-nc
,--no-clobber
: Skip download if files exist
curl
can only read single web pages files, the bunch of lines you got is actually the directory index (which you also see in your browser if you go to that URL). To use curl
and some Unix tools magic to get the files you could use something like
for file in $(curl -s http://www.ime.usp.br/~coelho/mac0122-2013/ep2/esqueleto/ |
grep href |
sed 's/.*href="//' |
sed 's/".*//' |
grep '^[a-zA-Z].*'); do
curl -s -O http://www.ime.usp.br/~coelho/mac0122-2013/ep2/esqueleto/$file
done
which will get all the files into the current directory.
For more elaborated needs (including getting a bunch of files from a site with folders/directories), wget
(as proposed in another answer already) is the better option.
Ref: http://blog.incognitech.in/download-files-from-apache-server-listing-directory/
You can use following command:
wget --execute="robots = off" --mirror --convert-links --no-parent --wait=5 <website-url>
Explanation with each options
-
wget
: Simple Command to make CURL request and download remote files to our local machine. -
--execute="robots = off"
: This will ignore robots.txt file while crawling through pages. It is helpful if you're not getting all of the files. -
--mirror
: This option will basically mirror the directory structure for the given URL. It's a shortcut for-N -r -l inf --no-remove-listing
which means:-
-N
: don't re-retrieve files unless newer than local -
-r
: specify recursive download -
-l inf
: maximum recursion depth (inf or 0 for infinite) -
--no-remove-listing
: don't remove '.listing' files
-
-
--convert-links
: make links in downloaded HTML or CSS point to local files -
--no-parent
: don't ascend to the parent directory -
--wait=5
: wait 5 seconds between retrievals. So that we don't thrash the server. -
<website-url>
: This is the website url from where to download the files.
Happy Downloading :smiley:
You can use httrack available for Windows/MacOS and installable via Homebrew.
For those of us who would rather use an application with a GUI, there is the inexpensive shareware program DeepVacuum for Mac OS X, which implements wget
in a user-friendly manner, with a list of presets that can handle commonly-needed tasks. You can also save your own custom configurations as presets.