thread safety of MPI send using threads created with std::async
Solution 1:
Thread-safety in MPI doesn't work out of the box. First, you have to ensure that your implementation actually supports multiple threads making MPI calls at once. With some MPI implementations, for example Open MPI, this requires the library to be configured with special options at build time. Then you have to tell MPI to initialise at the appropriate thread support level. Currently the MPI standard defines four levels of thread support:
-
MPI_THREAD_SINGLE
- means that the user code is single threaded. This is the default level at which MPI is initialised ifMPI_Init()
is used; -
MPI_THREAD_FUNNELED
- means that the user code is multithreaded, but only the main thread makes MPI calls. The main thread is the one which initialises the MPI library; -
MPI_THREAD_SERIALIZED
- means that the user code is multithreaded, but calls to the MPI library are serialised; -
MPI_THREAD_MULTIPLE
- means that the user code is multithreaded and all threads can make MPI calls at any time with no synchronisation whatsoever.
In order to initialise MPI with thread support, one has to use MPI_Init_thread()
instead of MPI_Init()
:
int provided;
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);
if (provided < MPI_THREAD_MULTIPLE)
{
printf("ERROR: The MPI library does not have full thread support\n");
MPI_Abort(MPI_COMM_WORLD, 1);
}
Equivalent code with the obsoleted (and removed from MPI-3) C++ bindings:
int provided = MPI::Init_thread(argc, argv, MPI::THREAD_MULTIPLE);
if (provided < MPI::THREAD_MULTIPLE)
{
printf("ERROR: The MPI library does not have full thread support\n");
MPI::COMM_WORLD.Abort(1);
}
Thread support levels are ordered like this: MPI_THREAD_SINGLE
< MPI_THREAD_FUNNELED
< MPI_THREAD_SERIALIZED
< MPI_THREAD_MULTIPLE
, so any other provided level, different from MPI_THREAD_MULTIPLE
would have lower numerical value - that's why the if (...)
code above is written so.
MPI_Init(&argc, &argv)
is equivalent to MPI_Init_thread(&argc, &argv, MPI_THREAD_SINGLE, &provided)
. Implementations are not required to initialise exactly at the requested level - rather they could initialise at any other level (higher or lower), which is returned in the provided
output argument.
For more information - see §12.4 of the MPI standard, freely available here.
With most MPI implementations, the thread support at level MPI_THREAD_SINGLE
is actually equivalent to that provided at level MPI_THREAD_SERIALIZED
- exactly what you observe in your case.
Since you've not specified which MPI implementation you use, here comes a handy list.
I've already said that Open MPI has to be compiled with the proper flags enabled in order to support MPI_THREAD_MULTIPLE
. But there is another catch - its InfiniBand component is not thread-safe and hence Open MPI would not use native InfiniBand communication when initialised at full thread support level.
Intel MPI comes in two different flavours - one with and one without support for full multithreading. Multithreaded support is enabled by passing the -mt_mpi
option to the MPI compiler wrapper which enables linking with the MT version. This option is also implied if OpenMP support or the autoparalleliser is enabled. I am not aware how the InfiniBand driver in IMPI works when full thread support is enabled.
MPICH(2) does not support InfiniBand, hence it is thread-safe and probably most recent versions provide MPI_THREAD_MULTIPLE
support out of the box.
MVAPICH is the basis on which Intel MPI is built and it supports InfiniBand. I have no idea how it behaves at full thread support level when used on a machine with InfiniBand.
The note about multithreaded InfiniBand support is important since lot of compute clusters nowadays use InfiniBand fabrics. With the IB component (openib
BTL in Open MPI) disabled, most MPI implementations switch to another protocol, for example TCP/IP (tcp
BTL in Open MPI), which results in much slower and more latent communication.
Solution 2:
There are four levels of MPI thread safety, not all of them supported by every implementation: MPI_THREAD_SINGLE, MPI_THREAD_FUNNELED, MPI_THREAD_SERIALIZED and MPI_THREAD_MULTIPLE. The last one, which allows for a process to have multiple threads which may simultaneously call MPI functions, is probably the one you are interested in. So, first of all, you need to make sure your implementation supports MPI_THREAD_SERIALIZED.
The required level of thread safety must be specified by a call to MPI_Init_thread
. After you have called MPI_Init_thread
you should be able to safely call MPI functions in boost (POSIX) threads created on your own.