How to download list of files from a file server?

Solution 1:

You can specify what file extensions wget will download when crawling pages:

wget -r -A zip,rpm,tar.gz www.site.com/startpage.html

this will perform a recursive search and only download files with the .zip, .rpm, and .tar.gz extensions.

Solution 2:

supposing you really just want a list of the files on the server without fetching them (yet):

%> wget -r -np --spider http://www.apache.org/dist/httpd/binaries/ 2>&1 | awk -f filter.awk | uniq

while 'filter.awk' looks like this

/^--.*--  http:\/\/.*[^\/]$/ { u=$3; }
/^Length: [[:digit:]]+/ { print u; }

then you possibly have to filter out some entries like

"http://www.apache.org/dist/httpd/binaries/?C=N;O=D"