How to make these nested loops parallel using OpenMP?

I am trying to parallel this code using OpenMP but still the serial code is faster.

#define NUM_THREADS 4
        
void Calculate(double *N, double *M, double *K, const long length)
{
    #pragma omp parallel num_threads(NUM_THREADS)
      
    #pragma omp for collapse(2)
    for (long i = 0; i < length; i++)
        for (long j = 0; j < length; j++) {
            K[i * length+ j] = 0.0;
            for (long k = 0; k < length; k++)
                K[i * length+ j] += N[i * length+ k] * M[k * length+ j];
        }
}

The function argument length has the value 5.


Solution 1:

There is very little point in creating 3 additional threads in order to perform only 125 multiplication operations in total.

The overhead of creating these threads is significantly higher than performing the calculations serially. Also, even if the threads already exist, it is still not worth it, because thread synchronization is expensive, too.

The threads must be synchronized at the start and at the end of every #pragma omp for construct. This synchronization is likely also more expensive than performing the calculations serially.