How to constexpr initialize intrinsic SSE/AVX register?

Not that it would change anything semantically or performance-wise

Correct, just use const __m128i like most code does.
I don't see any benefit to constexpr for this use-case, just pain for no gain.

Maybe if there was a way, it would let you initialize vectors in static storage (global or static) without the usual mess you get if you use _mm_set, where the compiler reserves space in .bss and runs a constructor at run-time to copy from an anonymous constant in .rodata.

(Yes, it's really that bad with gcc/clang/MSVC; godbolt. Don't use static const __m128i or at global scope. Do const __m128i foo = _mm_set_epi32() or whatever inside functions; compilers + linkers will eliminate duplicates, like with string literals. Or use plain arrays with alignas(16) and _mm_load_si128 from them inside functions if that works better.)


just curious why in the year 2022 I can't declare constexpr __m128i

You can declare constexpr __m128i, you just can't portably initialize it1, because Intel intrinsics like _mm_set_* were defined before the year 2000 (for MMX and then SSE1), and aren't constexpr. (And later intrinsics still follow the same pattern established for SSE1.) Remember, in C / C++ terms they're actual functions that just happen to inline. (Or macros around __builtin functions to get a compile-time constant for an operand that becomes an immediate.)

Foonote 1: In C++20, GCC lets you use constexpr auto y = std::bit_cast<__m128i>(x);, as shown in https://godbolt.org/z/YGMGM69qs. Other compilers accept bit_cast<float> or whatever, but not __m128, so this may be an implementation detail of GCC. In any case, it doesn't save typing, and isn't useful for much even if it was portable to clang and MSVC.

There's little point to it because intrinsic functions like _mm_add_epi32 are also not constexpr, and you can't portably do v1 += v2; in GNU C/C++ that does compile (to a paddq).

Example with non-portable braced initializers; don't do this:

#include <immintrin.h>

__m128i foo() {
    // different meaning in GCC/clang vs. MSVC
    constexpr __m128i v = {1, 2};
    return v;
}

GCC11.2 -O3 asm output (Godbolt) - two long long halves, as per the way GCC/clang define __m128i as typdef long long __m128i __attribute__((vector_size(16),may_alias))

foo():
        movdqa  xmm0, XMMWORD PTR .LC0[rip]
        ret
.LC0:
        .quad   1
        .quad   2

MSVC 19.30 - the first two bytes of 16x int8_t - MSVC defines __m128i as a union of arrays of various element-widths, apparently with the char[16] first.

__xmm@00000000000000000000000000000201 DB 01H, 02H, 00H, 00H, 00H, 00H, 00H
        DB      00H, 00H, 00H, 00H, 00H, 00H, 00H, 00H, 00H

__m128i foo(void) PROC                   ; foo, COMDAT
        movdqa  xmm0, XMMWORD PTR __xmm@00000000000000000000000000000201
        ret     0
__m128i foo(void) ENDP                   ; foo

So you could initialize a vector to {0} and get the same result on gcc/clang as on MSVC, or I guess any {0..255}. But that's still taking advantage of implementation details on each specific compiler, not strictly using Intel's documented intrinsics API.

And MS says you should never directly access those fields of the union (the way MSVC defines __m128i).

GCC does define semantics for GNU C native vectors; GCC / clang implement the Intel intrinsics API (including __m128i) on top of their portable vector extensions which work like a struct or class with operators like + - & | * / [] and so on.

See also Is `reinterpret_cast`ing between hardware SIMD vector pointer and the corresponding type an undefined behavior? re: what a __m128i object is and how it works.


Terminology: __m128i isn't a register.

It's a C++ object like an int that can fit in a register, and normally compilers will keep the variable's value in a register across statements, if you enable optimization.

But you can still take its address, memcpy into / out of (parts of) it, and otherwise mess with its object representation, all of which works according to the rules of the C++ abstract machine (including the vector extensions). (The resulting asm might not be very efficient vs. using shuffle intrinsics, though!)

You can make an array or even std::vector<__m128i> (with C++17 for aligned allocation), and obviously those __m128i objects can't all be in registers.

Better terminology: "initialize an AVX intrinsic vector". These types represent a SIMD vector of data, which can be loaded into a vector register. Just like an int represents a fixed-width integer that can be loaded into an integer register. It's common to write code using __m128i in ways that all such objects are locals that actually can live in registers, hopefully not even getting spilled/reloaded, but that's due to how it's used, not what it is.

When you talk about initializing an int object, you talk about the object, not the register. (Especially for constexpr; there are no registers in the C++ abstract machine.)


Registers don't exist at compile-time. Whatever these AVX instructions are doing, the compile-time result is going to have to be loaded into a register at runtime. So you should just compute that compile-time value using normal C++ code (perhaps using if (std::is_constant_evaluated()) to fence off such blocks of code to allow you to put both in the same function) and then load that constexpr value into an AVX object.