New posts in sse

Do I get a performance penalty when mixing SSE integer/float SIMD instructions

c assembly sse simd intrinsics

How to determine if memory is aligned?

c optimization memory sse simd

Sum reduction of unsigned bytes without overflow, using SSE2 on Intel

x86 sse simd sse2 sse3

Is SSE floating-point arithmetic reproducible?

.net floating-point sse ieee-754 x87

Fast method to copy memory with translation - ARGB to BGR

c x86 rgb sse micro-optimization

How to check if a CPU supports the SSE3 instruction set?

c++ sse instruction-set avx cpuid

C++ error: ‘_mm_sin_ps’ was not declared in this scope

c++ optimization sse simd intrinsics

Fastest Implementation of the Natural Exponential Function Using SSE

c optimization vectorization sse simd

Loop unrolling to achieve maximum throughput with Ivy Bridge and Haswell

c++ x86 intel sse avx

SSE, intrinsics, and alignment

c++ alignment sse intrinsics

How to detect SSE/SSE2/AVX/AVX2/AVX-512/AVX-128-FMA/KCVI availability at compile-time?

gcc clang sse avx avx512

How to sum __m256 horizontally?

sse vectorization intrinsics avx

How is a vector's data aligned?

c++ vector sse memory-alignment allocator

What's the difference between logical SSE intrinsics?

c sse simd intrinsics sse2

Fastest way to compute absolute value using SSE

x86 vectorization sse simd absolute-value

How to use Fused Multiply-Add (FMA) instructions with SSE/AVX

c sse cpu-architecture avx fma

Websocket transport reliability (Socket.io data loss during reconnection)

node.js websocket socket.io sse eventsource

Using AVX CPU instructions: Poor performance without "/arch:AVX"

c++ performance visual-studio-2010 sse avx

How do I enable SSE for my freestanding bootable code?

x86 sse instruction-set

Fast vectorized rsqrt and reciprocal with SSE/AVX depending on precision

performance sse simd avx