shell script: run a batch of N commands in parallel, wait for all to finish, run next N
Task: run blocks consisting of 3-5 commands (in parallel/background). Example block:
dd if=/dev/urandom of=/mnt/1/x bs=1024 count=1024000000 &
dd if=/dev/urandom of=/mnt/2/x bs=1024 count=1024000000 &
dd if=/dev/urandom of=/mnt/3/x bs=1024 count=1024000000 &
When it's done, next block should run. I suppose, this can be done via lock files:
task1.sh:
real_task1 real_param1 ; rm /var/lock/myscript/task1.lock
task2.sh:
real_task2 real_param1 ; rm /var/lock/myscript/task2.lock
...
taskgen.sh:
# loop
# while directory isn't empty - wait...
gen_tasks.pl # build task files from some queue
for i in 1 2 3; do touch /var/lock/myscript/task$i.lock ; done
./task1.sh &
./task2.sh &
./task3.sh &
# if task1.sh doesn't exits then exit, else loop waits for files to be deleted
A number of methods to check if the directory is empty can be found here, don't sure which to use;
Question: any better way to implement this?
P.S. Possible status reporting method:
command && report_good_state.sh taskid ; report_state_done.sh taskid; rm /var/lock/myscript/taskN.lock
Solution 1:
This is exactly what gnu parallel is designed for, so I strongly recommend you use it. In particular, look at running it as a semaphore:
for i in {1..4}
do
echo running $i
sem -j3 df dd if=/dev/urandom of=/mnt/$i/x bs=1024 count=1024000000 ";" echo done
done
# sem --wait waits until all jobs are done.
sem --wait
Solution 2:
Perhaps some variation on this?
while true
do
./task1.sh&
pid1=$!
./task2.sh&
pid2=$!
./task3.sh&
pid3=$!
wait $pid1
wait $pid2
wait $pid3
done
Solution 3:
Do you have any particular reason not to use something like GNU parallel? If you must use bash, then consider methods like those described in this blog post (wait and named pipes are helpful here).
Solution 4:
"wait" waits for all background jobs to complete. Sample:
sleep 30 & sleep 40 & sleep 120 & wait
It waits till all commands are completed, i.e. at least 120 seconds for this example.
Hope this helps.