Getting all files from a web page using curl

Use wget instead.

Install it with Homebrew: brew install wget or MacPorts: sudo port install wget

For downloading files from a directory listing, use -r (recursive), -np (don't follow links to parent directories), and -k to make links in downloaded HTML or CSS point to local files (credit @xaccrocheur).

wget -r -np -k http://www.ime.usp.br/~coelho/mac0122-2013/ep2/esqueleto/

Other useful options:

-nd (no directories): download all files to the current directory
-e robots.off: ignore robots.txt files, don't download robots.txt files
-A png,jpg: accept only files with the extensions png or jpg
-m (mirror): -r --timestamping --level inf --no-remove-listing
-nc, --no-clobber: Skip download if files exist

curl can only read single web pages files, the bunch of lines you got is actually the directory index (which you also see in your browser if you go to that URL). To use curl and some Unix tools magic to get the files you could use something like

for file in $(curl -s http://www.ime.usp.br/~coelho/mac0122-2013/ep2/esqueleto/ |
                  grep href |
                  sed 's/.*href="//' |
                  sed 's/".*//' |
                  grep '^[a-zA-Z].*'); do
    curl -s -O http://www.ime.usp.br/~coelho/mac0122-2013/ep2/esqueleto/$file
done

which will get all the files into the current directory.

For more elaborated needs (including getting a bunch of files from a site with folders/directories), wget (as proposed in another answer already) is the better option.

Ref: http://blog.incognitech.in/download-files-from-apache-server-listing-directory/

You can use following command:

wget --execute="robots = off" --mirror --convert-links --no-parent --wait=5 <website-url>

Explanation with each options

wget: Simple Command to make CURL request and download remote files to our local machine.
--execute="robots = off": This will ignore robots.txt file while crawling through pages. It is helpful if you're not getting all of the files.
--mirror: This option will basically mirror the directory structure for the given URL. It's a shortcut for -N -r -l inf --no-remove-listing which means:
- -N: don't re-retrieve files unless newer than local
- -r: specify recursive download
- -l inf: maximum recursion depth (inf or 0 for infinite)
- --no-remove-listing: don't remove '.listing' files
--convert-links: make links in downloaded HTML or CSS point to local files
--no-parent: don't ascend to the parent directory
--wait=5: wait 5 seconds between retrievals. So that we don't thrash the server.
<website-url>: This is the website url from where to download the files.

Happy Downloading :smiley:

You can use httrack available for Windows/MacOS and installable via Homebrew.

For those of us who would rather use an application with a GUI, there is the inexpensive shareware program DeepVacuum for Mac OS X, which implements wget in a user-friendly manner, with a list of presets that can handle commonly-needed tasks. You can also save your own custom configurations as presets.

enter image description here

Getting all files from a web page using curl

Explanation with each options

Related

Recent Posts