New posts in simd

ARM Cortex-A8: Whats the difference between VFP and NEON

arm simd neon cortex-a8

Compare 16 byte strings with SSE

c gcc x86 sse simd

Can I use the AVX FMA units to do bit-exact 52 bit integer multiplications?

floating-point x86 simd avx2 fma

Load address calculation when using AVX2 gather instructions

x86 sse simd avx2

Getting started with Intel x86 SSE SIMD instructions

c gcc x86 sse simd

AVX2: Computing dot product of 512 float arrays

c++ simd avx2 dot-product fma

SSE intrinsic functions reference

c++ c gcc sse simd

How to choose AVX compare predicate variants

SSE SSE2 and SSE3 for GNU C++

c++ optimization simd sse sse2

Count each bit-position separately over many 64-bit bitmasks, with AVX but not AVX2

c optimization x86 x86-64 simd

Do I get a performance penalty when mixing SSE integer/float SIMD instructions

c assembly sse simd intrinsics

Transpose an 8x8 float using AVX/AVX2

Fastest way to unpack 32 bits to a 32 byte SIMD vector

x86 simd avx bitmask avx2

How to determine if memory is aligned?

c optimization memory sse simd

Micro Optimization of a 4-bucket histogram of a large array or list

c# optimization histogram simd micro-optimization

Sum reduction of unsigned bytes without overflow, using SSE2 on Intel

x86 sse simd sse2 sse3

Parallel for vs omp simd: when to use each?

c++ c performance openmp simd

C++ error: ‘_mm_sin_ps’ was not declared in this scope

c++ optimization sse simd intrinsics

Fastest Implementation of the Natural Exponential Function Using SSE

c optimization vectorization sse simd

Subtracting packed 8-bit integers in an 64-bit integer by 1 in parallel, SWAR without hardware SIMD

c++ c bit-manipulation simd swar