Newbetuts
.
New posts in simd
What's the difference between logical SSE intrinsics?
c
sse
simd
intrinsics
sse2
Why is vectorization, faster in general, than loops?
performance
language-agnostic
vectorization
simd
low-level
Fastest way to compute absolute value using SSE
x86
vectorization
sse
simd
absolute-value
Simd matmul program gives different numerical results
c
floating-point
vectorization
simd
avx
Fast vectorized rsqrt and reciprocal with SSE/AVX depending on precision
performance
sse
simd
avx
Loading 8 chars from memory into an __m256 variable as packed single precision floats
c++
sse
simd
avx
avx2
practical BigNum AVX/SSE possible?
sse
biginteger
simd
avx
extended-precision
How to perform the inverse of _mm256_movemask_epi8 (VPMOVMSKB)?
c
x86
simd
avx
avx2
How to count character occurrences using SIMD
c
parallel-processing
character
intel
simd
SIMD prefix sum on Intel cpu
c++
sse
simd
prefix-sum
Header files for x86 SIMD intrinsics
x86
header-files
sse
simd
intrinsics
print a __m128i variable
c
assembly
sse
simd
intrinsics
How to implement atoi using SIMD?
c++
x86
sse
simd
atoi
What are the best instruction sequences to generate vector constants on the fly?
assembly
x86
sse
simd
avx
_mm_load_ps caused segment fault
c++
x86
sse
simd
memory-alignment
What is "vectorization"?
vectorization
simd
auto-vectorization
AVX2 what is the most efficient way to pack left based on a mask?
c++
vectorization
sse
simd
avx2
How to compile Tensorflow with SSE4.2 and AVX instructions?
tensorflow
x86
compiler-optimization
simd
compiler-options
Fastest way to do horizontal SSE vector sum (or other reduction)
assembly
optimization
floating-point
sse
simd
Is it possible to vectorize non-trivial loop in C with SIMD? (multiple length 5 double-precision dot products reusing one input)
arrays
c
performance
vectorization
simd
Prev