Multithreaded downloading with shell script

Let's say I have a file with lots of URLs and I want to download them in parallel using arbitrary number of processes. How can I do it with bash?


Solution 1:

Have a look at man xargs:

-P max-procs --max-procs=max-procs

         Run  up  to max-procs processes at a time; the default is 1.  If
         max-procs is 0, xargs will run as many processes as possible  at
         a  time.

Solution:

xargs -P 20 -n 1 wget -nv <urs.txt

Solution 2:

If you just want to grab each URL(regardless of number) then the answer is easy:

#!/bin/bash
URL_LIST="http://url1/ http://url2/"

for url in $URL_LIST ; do
    wget ${url} & >/dev/null
done

If you want to only create a limited number of pulls, say 10. Then you would do something like this:

#!/bin/bash
URL_LIST="http://url1/ http://url2/"

function download() {
    touch /tmp/dl-${1}.lck
    wget ${url} >/dev/null
    rm -f /tmp/dl-${1}.lck
}

for url in $URL_LIST ; do
    while [ 1 ] ; do
        iter=0
        while [ $iter -lt 10 ] ; do
            if [ ! -f /tmp/dl-${iter}.lck ] ; then
                download $iter &
                break 2
            fi
            let iter++
        done
        sleep 10s
    done
done

Do note I haven't actually tested it, but just banged it out in 15 minutes. but you should get a general idea.

Solution 3:

You could use something like puf which is designed for that sort of thing, or you could use wget/curl/lynx in combination with GNU parallel.