Portable loop unrolling with template parameter in C++ with GCC/ICC
I am working on a high-performance parallel computational fluid dynamics code that involves a lot of lightweight loops and therefore gains approximately 30% in performance if all important loops are fully unrolled.
This can be done easily for a fixed number of loops by using compiler directives: #pragma GCC unroll (16)
is recognized by both compilers I am aiming for, the Intel C++ compiler ICC and GCC, while #pragma unroll (16)
is sadly ignored by GCC. I can also use template parameters or pre-preprocessor directives as as limits with ICC (similar to what you can do with nvcc), for instance
template <int N>
// ...
#pragma unroll (N)
for (int i = 0; i < N; ++i) {
// ...
}
or
#define N 16
#pragma unroll (N)
for (int i = 0; i < N; ++i) {
// ...
}
throw no error or warning with -Wall -w2 -w3
when compiling with ICC while the complementary syntax #pragma GCC unroll (N)
with GCC (-Wall -pedantic
) throws an error in GCC 9.2.1 20191102 in Ubuntu 18.04:
error: ‘#pragma GCC unroll’ requires an assignment-expression that evaluates to a non-negative integral constant less than 65535
#pragma GCC unroll (N)
Is somebody aware of a way to make loop unrolling based on a template parameter with compiler directives work in a portable way (at least working with GCC and ICC)? I actually only need full unrolling of the entire loop, so something like #pragma GCC unroll (all)
would already help me a lot.
I am aware that there exist more or less complex strategies to unroll loops with template meta-programming but as in my application the loops might be nested and can contain more complicated loop bodies, I feel like such a strategy would over-complicate my code and reduce readibility.
Solution 1:
Sadly currently there does not seem to be a consistent way to do so.
I ended up using a pre-processor macro in combination with _Pragma(string-literal)
and a stringification macro (similar to this) that decides whether to unroll depending on the template parameter if the Intel C++ Compiler ICC or Clang is available, to use a constant factor for unrolling for GCC and ignore it for any other compiler. Here a small example:
/// Helper macros for stringification
#define TO_STRING_HELPER(X) #X
#define TO_STRING(X) TO_STRING_HELPER(X)
// Define loop unrolling depending on the compiler
#if defined(__ICC) || defined(__ICL)
#define UNROLL_LOOP(n) _Pragma(TO_STRING(unroll (n)))
#elif defined(__clang__)
#define UNROLL_LOOP(n) _Pragma(TO_STRING(unroll (n)))
#elif defined(__GNUC__) && !defined(__clang__)
#define UNROLL_LOOP(n) _Pragma(TO_STRING(GCC unroll (16)))
#elif defined(_MSC_BUILD)
#pragma message ("Microsoft Visual C++ (MSVC) detected: Loop unrolling not supported!")
#define UNROLL_LOOP(n)
#else
#warning "Unknown compiler: Loop unrolling not supported!"
#define UNROLL_LOOP(n)
#endif
/// Example usage
template <int N>
void exampleContainingLoop() {
UNROLL_LOOP(N)
for (int i = 0; i < N; ++i) {
// ...
}
return;
}