In my code I have loop in which I construct and over determined linear system and try to solve it:

#pragma omp parallel for
for (int i = 0; i < n[0]+1; i++) {
    for (int j = 0; j < n[1]+1; j++) {
        for (int k = 0; k < n[2]+1; k++) {
            arma::mat A(max_points, 2);
            arma::mat y(max_points, 1);
            // initialize A and y

            arma::vec solution = solve(A,y);
        }
    }
}

Sometimes, quite randomly the program hangs or the results in the solution vector are NaN. And if I put do this:

arma::vec solution;
#pragma omp critical 
{
    solution = solve(weights*A,weights*y);
}

then these problem don't seem to happen anymore.

When it hangs, it does so because some threads are waiting at the OpenMP barrier:

Thread 2 (Thread 0x7fe4325a5700 (LWP 39839)):
#0  0x00007fe44d3c2084 in gomp_team_barrier_wait_end () from /usr/lib64/gcc-4.9.2/lib64/gcc/x86_64-redhat-linux-gnu/4.9.2/libgomp.so.1
#1  0x00007fe44d3bf8c2 in gomp_thread_start () at ../.././libgomp/team.c:118
#2  0x0000003f64607851 in start_thread () from /lib64/libpthread.so.0
#3  0x0000003f642e890d in clone () from /lib64/libc.so.6

And the other threads are stuck inside Armadillo:

Thread 1 (Thread 0x7fe44afe2e60 (LWP 39800)):
#0  0x0000003ee541f748 in dscal_ () from /usr/lib64/libblas.so.3
#1  0x00007fe44c0d3666 in dlarfp_ () from /usr/lib64/atlas/liblapack.so.3
#2  0x00007fe44c058736 in dgelq2_ () from /usr/lib64/atlas/liblapack.so.3
#3  0x00007fe44c058ad9 in dgelqf_ () from /usr/lib64/atlas/liblapack.so.3
#4  0x00007fe44c059a32 in dgels_ () from /usr/lib64/atlas/liblapack.so.3
#5  0x00007fe44f09fb3d in bool arma::auxlib::solve_ud<double, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times> >(arma::Mat<double>&, arma::Mat<double>&, arma::Base<double, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times> > const&) () at /usr/include/armadillo_bits/lapack_wrapper.hpp:677
#6  0x00007fe44f0a0f87 in arma::Col<double>::Col<arma::Glue<arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times>, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times>, arma::glue_solve> >(arma::Base<double, arma::Glue<arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times>, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times>, arma::glue_solve> > const&) ()
at /usr/include/armadillo_bits/glue_solve_meat.hpp:39

As you can see from the stacktrace my version of Armadillo uses atlas. And according to this documentation atlas seems to be thread safe: ftp://lsec.cc.ac.cn/netlib/atlas/faq.html#tsafe

Update 9/11/2015

I finally got some time to run more tests, based on the suggestions of Vladimir F.

When I compile armadillo with ATLAS's BLAS, I'm still able to reproduce then hangs and the NaNs. When it hangs, the only thing that changes in the stacktrace is the call to BLAS:

#0  0x0000003fa8054718 in ATL_dscal_xp1yp0aXbX@plt () from /usr/lib64/atlas/libatlas.so.3
#1  0x0000003fb05e7666 in dlarfp_ () from /usr/lib64/atlas/liblapack.so.3
#2  0x0000003fb0576a61 in dgeqr2_ () from /usr/lib64/atlas/liblapack.so.3
#3  0x0000003fb0576e06 in dgeqrf_ () from /usr/lib64/atlas/liblapack.so.3
#4  0x0000003fb056d7d1 in dgels_ () from /usr/lib64/atlas/liblapack.so.3
#5  0x00007ff8f3de4c34 in void arma::lapack::gels<double>(char*, int*, int*, int*, double*, int*, double*, int*, double*, int*, int*) () at /usr/include/armadillo_bits/lapack_wrapper.hpp:677
#6  0x00007ff8f3de1787 in bool arma::auxlib::solve_od<double, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times> >(arma::Mat<double>&, arma::Mat<double>&, arma::Base<double, arma::Glue<arma::Mat<double>, arma::Mat<double>, arma::glue_times> > const&) () at /usr/include/armadillo_bits/auxlib_meat.hpp:3434

Compiling without ATLAS, only with netlib BLAS and LAPACK, I was able to reproduce the NaNs but not the hangs.

In both cases, surrounding solve() with #pragma omp critical I have no problems at all


Solution 1:

Are you sure your systems are over determined? solve_ud in your stack trace says otherwise. Though you have solve_od too, and probably that's nothing to do with the issue. But it doesn't hurt to find why that's happening and fix it if you think the systems should be od.

Is armadillo solve() thread safe?

That I think depends on your lapack version, also see this. Looking at the code of solve_od all the variables accessed seem to be local. Note the warning in the code:

NOTE: the dgels() function in the lapack library supplied by ATLAS 3.6 seems to have problems

Thus it seems only lapack::gels can cause trouble for you. If fixing lapack is not possible, a workaround is to stack your systems and solve a single large system. That probably would be even more efficient if your individual systems are small.

Solution 2:

The thread safety of Armadillo's solve() function depends (only) on the BLAS library that you use. The LAPACK implementations are thread safe when BLAS is. The Armadillo solve() function is not thread safe when linking to the reference BLAS library. However, it is thread safe when using OpenBLAS. Additionally, ATLAS provides a BLAS implementation that also mentions it is thread safe, and the Intel MKL is thread safe as well, but I have no experience with Armadillo linked to those libraries.

Of course, this only applies when you run solve() from multiple threads with different data.