Probe seems to consume the CPU
Solution 1:
Yes; most MPI implementations, for the sake of performance, busy-wait on blocking operations. The assumption is that the MPI job is the only thing going on that we care about on the processor, and if the task is blocked waiting for communications, the best thing to do is to continually poll for that communication to reduce latency; so that there's virtually no delay between when the message arrives and when it's handed off to the MPI task. This typically means that CPU is pegged at 100% even when nothing "real" is being done.
That's probably the best default behaviour for most MPI users, but it isn't always what you want. Typically MPI implementations allow turning this off; with OpenMPI, you can turn this behaviour off with an MCA parameter,
mpirun -np N --mca mpi_yield_when_idle 1 ./a.out
Solution 2:
It sounds like there are three ways to wait for an MPI message:
- Aggressive busy wait. This will get the message into your receiving code as fast as possible. Some processor is doing nothing but checking for the incoming message. If you put all of your processors in this state, the rest of your system is going to be very slow. MPI uses aggressive mode by default.
- Degraded busy wait. This will yield to other processes while doing its busy wait. If the number of processes you ask for is more than the number of processors you have, MPI switches to degraded mode. You can also force aggressive or degraded mode with an MCA parameter.
- Polling. Even the degraded busy wait is still a busy wait, and it will keep one processor pegged at 100% for each process that is waiting. If you have other tasks on your system that you don't want to compete with, you can call
MPI_Iprobe()
in a loop with a sleep call before calling a blocking receive. I find a 100ms sleep is responsive enough for my tasks, and still keeps the CPU usage minimal when a worker is idle.
I did some searching and found that a busy wait is what you want if you are not sharing your processors with other tasks.