Newbetuts
.
New posts in simd
ARM Cortex-A8: Whats the difference between VFP and NEON
arm
simd
neon
cortex-a8
Compare 16 byte strings with SSE
c
gcc
x86
sse
simd
Can I use the AVX FMA units to do bit-exact 52 bit integer multiplications?
floating-point
x86
simd
avx2
fma
Load address calculation when using AVX2 gather instructions
x86
sse
simd
avx2
Getting started with Intel x86 SSE SIMD instructions
c
gcc
x86
sse
simd
AVX2: Computing dot product of 512 float arrays
c++
simd
avx2
dot-product
fma
SSE intrinsic functions reference
c++
c
gcc
sse
simd
How to choose AVX compare predicate variants
simd
avx
SSE SSE2 and SSE3 for GNU C++
c++
optimization
simd
sse
sse2
Count each bit-position separately over many 64-bit bitmasks, with AVX but not AVX2
c
optimization
x86
x86-64
simd
Do I get a performance penalty when mixing SSE integer/float SIMD instructions
c
assembly
sse
simd
intrinsics
Transpose an 8x8 float using AVX/AVX2
simd
avx
avx2
Fastest way to unpack 32 bits to a 32 byte SIMD vector
x86
simd
avx
bitmask
avx2
How to determine if memory is aligned?
c
optimization
memory
sse
simd
Micro Optimization of a 4-bucket histogram of a large array or list
c#
optimization
histogram
simd
micro-optimization
Sum reduction of unsigned bytes without overflow, using SSE2 on Intel
x86
sse
simd
sse2
sse3
Parallel for vs omp simd: when to use each?
c++
c
performance
openmp
simd
C++ error: ‘_mm_sin_ps’ was not declared in this scope
c++
optimization
sse
simd
intrinsics
Fastest Implementation of the Natural Exponential Function Using SSE
c
optimization
vectorization
sse
simd
Subtracting packed 8-bit integers in an 64-bit integer by 1 in parallel, SWAR without hardware SIMD
c++
c
bit-manipulation
simd
swar
Prev
Next