New posts in simd

ARM Cortex-A8: Whats the difference between VFP and NEON

Compare 16 byte strings with SSE

Can I use the AVX FMA units to do bit-exact 52 bit integer multiplications?

Load address calculation when using AVX2 gather instructions

Getting started with Intel x86 SSE SIMD instructions

AVX2: Computing dot product of 512 float arrays

SSE intrinsic functions reference

How to choose AVX compare predicate variants

SSE SSE2 and SSE3 for GNU C++

Count each bit-position separately over many 64-bit bitmasks, with AVX but not AVX2

Do I get a performance penalty when mixing SSE integer/float SIMD instructions

Transpose an 8x8 float using AVX/AVX2

Fastest way to unpack 32 bits to a 32 byte SIMD vector

How to determine if memory is aligned?

Micro Optimization of a 4-bucket histogram of a large array or list

Sum reduction of unsigned bytes without overflow, using SSE2 on Intel

Parallel for vs omp simd: when to use each?

C++ error: ‘_mm_sin_ps’ was not declared in this scope

Fastest Implementation of the Natural Exponential Function Using SSE

Subtracting packed 8-bit integers in an 64-bit integer by 1 in parallel, SWAR without hardware SIMD