New posts in avx2

Get sum of values stored in __m256d with SSE/AVX

c++ optimization sse avx avx2

Can I use the AVX FMA units to do bit-exact 52 bit integer multiplications?

floating-point x86 simd avx2 fma

Load address calculation when using AVX2 gather instructions

x86 sse simd avx2

AVX2: Computing dot product of 512 float arrays

c++ simd avx2 dot-product fma

Find largest element in matrix and its column and row indexes using SSE and AVX

c++ matrix sse avx avx2

Transpose an 8x8 float using AVX/AVX2

Fastest way to unpack 32 bits to a 32 byte SIMD vector

x86 simd avx bitmask avx2

Fastest method to calculate sum of all packed 32-bit integers using AVX512 or AVX2

c intrinsics avx avx2 avx512

Loading 8 chars from memory into an __m256 variable as packed single precision floats

c++ sse simd avx avx2

How to perform the inverse of _mm256_movemask_epi8 (VPMOVMSKB)?

c x86 simd avx avx2

Efficient implementation of log2(__m256d) in AVX2

c++ algorithm floating-point logarithm avx2

is there an inverse instruction to the movemask instruction in intel avx2?

x86 intrinsics avx avx2 icc

AVX2 what is the most efficient way to pack left based on a mask?

c++ vectorization sse simd avx2