Parallel processing from a command queue on Linux (bash, python, ruby... whatever)

I have a list/queue of 200 commands that I need to run in a shell on a Linux server.

I only want to have a maximum of 10 processes running (from the queue) at once. Some processes will take a few seconds to complete, other processes will take much longer.

When a process finishes I want the next command to be "popped" from the queue and executed.

Does anyone have code to solve this problem?

Further elaboration:

There's 200 pieces of work that need to be done, in a queue of some sort. I want to have at most 10 pieces of work going on at once. When a thread finishes a piece of work it should ask the queue for the next piece of work. If there's no more work in the queue, the thread should die. When all the threads have died it means all the work has been done.

The actual problem I'm trying to solve is using imapsync to synchronize 200 mailboxes from an old mail server to a new mail server. Some users have large mailboxes and take a long time tto sync, others have very small mailboxes and sync quickly.


Solution 1:

On the shell, xargs can be used to queue parallel command processing. For example, for having always 3 sleeps in parallel, sleeping for 1 second each, and executing 10 sleeps in total do

echo {1..10} | xargs -d ' ' -n1 -P3 sh -c 'sleep 1s' _

And it would sleep for 4 seconds in total. If you have a list of names, and want to pass the names to commands executed, again executing 3 commands in parallel, do

cat names | xargs -n1 -P3 process_name

Would execute the command process_name alice, process_name bob and so on.

Solution 2:

I would imagine you could do this using make and the make -j xx command.

Perhaps a makefile like this

all : usera userb userc....

usera:
       imapsync usera
userb:
       imapsync userb
....

make -j 10 -f makefile

Solution 3:

Parallel is made exatcly for this purpose.

cat userlist | parallel imapsync

One of the beauties of Parallel compared to other solutions is that it makes sure output is not mixed. Doing traceroute in Parallel works fine for example:

(echo foss.org.my; echo www.debian.org; echo www.freenetproject.org) | parallel traceroute

Solution 4:

For this kind of job PPSS is written: Parallel processing shell script. Google for this name and you will find it, I won't linkspam.