Newbetuts
.
New posts in avx
How to constexpr initialize intrinsic SSE/AVX register?
c++
sse
constexpr
intrinsics
avx
AVX scalar operations are much faster
c
memory
x86
sse
avx
Which versions of Windows support/require which CPU multimedia extensions? (How to check if SSE or AVX are fully usable?)
windows
assembly
sse
avx
avx512
Get sum of values stored in __m256d with SSE/AVX
c++
optimization
sse
avx
avx2
Is it useful to use VZEROUPPER if your program+libraries contain no SSE instructions?
performance
assembly
x86
avx
micro-optimization
Is vxorps-zeroing on AMD Jaguar/Bulldozer/Zen faster with xmm registers than ymm?
assembly
x86
avx
micro-optimization
amd-processor
How to choose AVX compare predicate variants
simd
avx
AVX/SSE version of xorshift128+
c
performance
sse
avx
Find largest element in matrix and its column and row indexes using SSE and AVX
c++
matrix
sse
avx
avx2
cpu dispatcher for visual studio for AVX and SSE
c++
visual-studio
sse
avx
Transpose an 8x8 float using AVX/AVX2
simd
avx
avx2
Fastest way to unpack 32 bits to a 32 byte SIMD vector
x86
simd
avx
bitmask
avx2
Optimizations for pow() with const non-integer exponent?
c++
math
optimization
avx
exponent
Disable AVX-optimized functions in glibc (LD_HWCAP_MASK, /etc/ld.so.nohwcap) for valgrind & gdb record
linux
linker
gdb
glibc
avx
Fastest method to calculate sum of all packed 32-bit integers using AVX512 or AVX2
c
intrinsics
avx
avx2
avx512
How to check if a CPU supports the SSE3 instruction set?
c++
sse
instruction-set
avx
cpuid
Loop unrolling to achieve maximum throughput with Ivy Bridge and Haswell
c++
x86
intel
sse
avx
How to detect SSE/SSE2/AVX/AVX2/AVX-512/AVX-128-FMA/KCVI availability at compile-time?
gcc
clang
sse
avx
avx512
How to sum __m256 horizontally?
sse
vectorization
intrinsics
avx
L1 memory bandwidth: 50% drop in efficiency using addresses which differ by 4096+64 bytes
c
caching
memory
x86
avx
Prev
Next