New posts in avx

How to constexpr initialize intrinsic SSE/AVX register?

AVX scalar operations are much faster

Which versions of Windows support/require which CPU multimedia extensions? (How to check if SSE or AVX are fully usable?)

Get sum of values stored in __m256d with SSE/AVX

Is it useful to use VZEROUPPER if your program+libraries contain no SSE instructions?

Is vxorps-zeroing on AMD Jaguar/Bulldozer/Zen faster with xmm registers than ymm?

How to choose AVX compare predicate variants

AVX/SSE version of xorshift128+

Find largest element in matrix and its column and row indexes using SSE and AVX

cpu dispatcher for visual studio for AVX and SSE

Transpose an 8x8 float using AVX/AVX2

Fastest way to unpack 32 bits to a 32 byte SIMD vector

Optimizations for pow() with const non-integer exponent?

Disable AVX-optimized functions in glibc (LD_HWCAP_MASK, /etc/ld.so.nohwcap) for valgrind & gdb record

Fastest method to calculate sum of all packed 32-bit integers using AVX512 or AVX2

How to check if a CPU supports the SSE3 instruction set?

Loop unrolling to achieve maximum throughput with Ivy Bridge and Haswell

How to detect SSE/SSE2/AVX/AVX2/AVX-512/AVX-128-FMA/KCVI availability at compile-time?

How to sum __m256 horizontally?

L1 memory bandwidth: 50% drop in efficiency using addresses which differ by 4096+64 bytes