Newbetuts
.
New posts in avx2
Get sum of values stored in __m256d with SSE/AVX
c++
optimization
sse
avx
avx2
Can I use the AVX FMA units to do bit-exact 52 bit integer multiplications?
floating-point
x86
simd
avx2
fma
Load address calculation when using AVX2 gather instructions
x86
sse
simd
avx2
AVX2: Computing dot product of 512 float arrays
c++
simd
avx2
dot-product
fma
Find largest element in matrix and its column and row indexes using SSE and AVX
c++
matrix
sse
avx
avx2
Transpose an 8x8 float using AVX/AVX2
simd
avx
avx2
Fastest way to unpack 32 bits to a 32 byte SIMD vector
x86
simd
avx
bitmask
avx2
Fastest method to calculate sum of all packed 32-bit integers using AVX512 or AVX2
c
intrinsics
avx
avx2
avx512
Loading 8 chars from memory into an __m256 variable as packed single precision floats
c++
sse
simd
avx
avx2
How to perform the inverse of _mm256_movemask_epi8 (VPMOVMSKB)?
c
x86
simd
avx
avx2
Efficient implementation of log2(__m256d) in AVX2
c++
algorithm
floating-point
logarithm
avx2
is there an inverse instruction to the movemask instruction in intel avx2?
x86
intrinsics
avx
avx2
icc
AVX2 what is the most efficient way to pack left based on a mask?
c++
vectorization
sse
simd
avx2
Prev