Run several jobs parallelly and Efficiently

OS: Cent-OS

I have some 30,000 jobs(or Scripts) to run. Each job takes 3-5 Min. I have 48 CPUs(nproc = 48). I can use 40 CPUs to run 40 Jobs parallelly. please suggest some script or tools can handle 30,000 Jobs by running each 40 Jobs parallely.

What I had done:

  • I created 40 Different folders and executed the jobs parallely by creating a shell script for each directory.

  • I want to know better ways to handle this kind of jobs next time.


Solution 1:

As Mark Setchell says: GNU Parallel.

find scripts/ -type f | parallel

If you insists on keeping 8 CPUs free:

find scripts/ -type f | parallel -j-8

But usually it is more efficient simply to use nice as that will give you all 48 cores when no one else needs them:

find scripts/ -type f | nice -n 15 parallel

To learn more:

  • Watch the intro video for a quick introduction: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
  • Walk through the tutorial (man parallel_tutorial). You command line with love you for it.

Solution 2:

I have used REDIS to do this sort of thing - it is very simple to install and the CLI is easy to use.

I mainly used LPUSH to push all the jobs onto a "queue" in REDIS and BLPOP to do a blocking remove of a job from the queue. So you would LPUSH 30,000 jobs (or script names or parameters) at the start, then start 40 processes in the background (1 per CPU) and each process would sit in a loop doing BLPOP to get a job, run it and do the next.

You can add layers of sophistication to log completed jobs in another "queue".

Here is a little demonstration of what to do...

First, start a Redis server on any machine in your network:

./redis-server &    # start REDIS server in background

Or, you could put this in your system startup if you use it always.

Now push 3 jobs onto queue called jobs:

./redis-cli         # start REDIS command line interface
redis 127.0.0.1:6379> lpush jobs "job1"
(integer) 1
redis 127.0.0.1:6379> lpush jobs "job2"
(integer) 2
redis 127.0.0.1:6379> lpush jobs "job3"
(integer) 3

See how many jobs there are in queue:

redis 127.0.0.1:6379> llen jobs
(integer) 3

Wait with infinite timeout for job

redis 127.0.0.1:6379> brpop jobs 0
1) "jobs"
2) "job1"
redis 127.0.0.1:6379> brpop jobs 0
1) "jobs"
2) "job2"
redis 127.0.0.1:6379> brpop jobs 0
1) "jobs"
2) "job3"

This last one will wait a LONG time as there are no jobs in queue:

redis 127.0.0.1:6379> brpop jobs 0

Of course, this is readily scriptable:

Put 30,000 jobs in queue:

for ((i=0;i<30000;i++)) ; do
    echo "lpush jobs job$i" | redis-cli
done

If your Redis server is on a remote host, just use:

redis-cli -h <HOSTNAME>

Here's how to check progress:

echo "llen jobs" | redis-cli
(integer) 30000

Or, more simply maybe:

redis-cli llen jobs
(integer) 30000

And you could start 40 jobs like this:

#!/bin/bash
for ((i=0;i<40;i++)) ; do
    ./Keep1ProcessorBusy  $i &
done

And then Keep1ProcessorBusy would be something like this:

#!/bin/bash

# Endless loop picking up jobs and processing them
while :
do
    job=$(echo brpop jobs 0 | redis_cli)
    # Set processor affinity here too if you want to force it, use $1 parameter we were called with
    do $job
done

Of course, the actual script or job you want run could also be stored in Redis.


As a totally different option, you could look at GNU Parallel, which is here. And also remember that you can run the output of find through xargs with the -P option to parallelise stuff.